Building a Multi-Modal Book Writer Agent with UiPath & LlamaIndex

Missed DevCon Online?

Catch up now

UiPath Community blog

Tutorials

Community news

Developer Interviews

Community events

Academy

Forum

Community Blogs

Tutorials

Building a multi-modal book writer agent with UiPath and LlamaIndex

Adrian Tamas

•July 24, 2025

Delivering rich, end-to-end content pipelines—complete with chapter text, illustrations, and downloadable PDFs—has never been simpler. In this deep-dive “recipe,” we’ll show you how to build a Multi-Modal Book Writer Agent that:

Generates a structured book outline
Writes engaging chapter content
Creates AI-powered illustrations
Assembles each chapter into a formatted PDF
Uploads all files as UiPath job attachments

This pattern showcases how UiPath coded agents and LlamaIndex unite to orchestrate multi-file, multi-modal workflows at enterprise scale. Professional developers get full code-level control, while UiPath handles packaging, deployment, observability, and governance.

Introduction

Teams often struggle with manual content assembly – compiling text, images, and PDFs for documents like e-books or reports can be a slow, error-prone process. Generative AI offers a way to automate this, but wiring together multiple AI outputs (text, images) into polished documents still requires coordination. That’s where a multi-modal book writer agent comes in. In this deep-dive, we’ll show how to build an agent that automatically:

Generates a structured outline for a book given a topic
Writes each chapter’s text using an LLM (large language models)
Creates an AI-driven illustration for each chapter
Assembles chapters into a formatted PDF file per chapter
Uploads all files as attachments to a UiPath Orchestrator job for tracking

This solution leverages LlamaIndex and UiPath coded agents together. LlamaIndex is an open-source framework that simplifies building LLM applications (handling embeddings, vector stores, prompt templates, etc.), letting you focus on business logic. UiPath coded agents allow developers to write Python-based automations, package them via the UiPath CLI, and deploy to UiPath Orchestrator – benefiting from full code-level control, enterprise-grade governance (RBAC, audit logs, human-in-the-loop), and built-in observability. By marrying LlamaIndex’s intelligence at the edge with UiPath’s robust orchestration, we can automate end-to-end document generation without sacrificing flexibility or control.

Why this matters

Most script-based document generation stops at plain text or demands heavy manual assembly. In contrast, our Multi-Modal Book Writer Agent automates every step from outline to final PDF . This means you can scale effortlessly – generating full multi-chapter books on any topic with a consistent structure – all while maintaining enterprise compliance (every file is tracked as a job attachment for auditability). It’s easy to iterate and improve as well: update a prompt or template and redeploy via a single CLI command. Example use cases include on-demand e-learning content, whitepapers, or knowledge base articles generated with minimal person effort.

Architecture overview

At a high level, ‌automation works as follows:

Input: The process starts when you provide a topic (and a desired number of chapters). This is the only manual input needed.
Outline Generation: The agent uses an LLM (through LlamaIndex) to generate a structured book outline – essentially a JSON list of chapters with titles and short descriptions.
Chapter Generation: For each chapter in the outline, the agent:
- Drafts the chapter text by prompting the LLM with the chapter title and description.
- Creates an AI-generated illustration for the chapter (using the OpenAI image API in this example).
- Composes a PDF for the chapter by combining the text and image into a nicely formatted document.
- Saves the PDF (and can also save the image if needed) to disk or in memoryuipath-llamaindex..
Attachment Upload: After all chapters are processed, the agent uploads each chapter’s PDF to UiPath Orchestrator as job attachments. (Attachments make it easy to retrieve outputs later and provide an audit trail.)
Completion: The Orchestrator job ends, and you have a set of chapter PDFs (one per chapter) attached to the job, ready to download or share.

This workflow is orchestrated by a single UiPath coded agent (Python) using LlamaIndex for the heavy LLM lifting. The architecture ensures a clean separation of concerns: outline retrieval → content generation → PDF packaging → file upload. Next, we’ll walk through how to set up and run the agent, then dive into the code for each step.

Prerequisites and setup

Before running the book writer agent, make sure you have the following in place:

Python 3.10+ installed (the UiPath Python SDK requires Python ≥ 3.10).
UiPath LlamaIndex SDK installed: pip install uipath-llamaindex . This provides the UiPathAgent class and integration with LlamaIndex.
ReportLab library installed for PDF generation: pip install reportlab.
An OpenAI API key for text and image generation (set as OPENAIAPIKEY). The example uses OpenAI’s GPT for text and the DALL-E API for images.
Access to a UiPath Orchestrator (Automation Cloud) and the UiPath CLI. You’ll need your Orchestrator account URL and an access token for the agent to upload attachments. You can generate a user access token in Orchestrator or use the CLI UiPath auth command to log in.

Project Code: The full source code is available in the UiPath LlamaIndex Python GitHub repo. It’s recommended to clone this repository (rather than copying code from here) so you have the correct file structure and dependencies. After cloning, navigate to the samples/multi-modal-book-writer-agent directory.

Configure environment variables: In the project folder, create a.env file (or use environment variables) with the required keys. For example:

UIPATH_URL=https://cloud.uipath.com/YourAccountName/YourTenantName

UIPATH_ACCESS_TOKEN=YOUR_UIPATH_ORCH_ACCESS_TOKEN

OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXX

This.env file will be read by the code to get your Orchestrator URL/token and OpenAI key. (If you used uipath auth, the CLI may have stored a token for packing/publishing, but the running agent itself will still need these values at runtime.)

Packaging and deployment

Once prerequisites are ready and the code is cloned and configured, follow these steps to deploy the agent to Orchestrator:

Initialize the project (if not already done): If you started from scratch, you can initialize a new agent project with the UiPath CLI. For example, run uipath init book_writer in the project directory to create a project manifest. (If you cloned the repo, this is likely already set up.)
Authenticate with Orchestrator: Ensure the CLI is connected to your Orchestrator tenant. Run uipath auth and follow the prompts to log in. This will allow the CLI to pack and publish to your Automation Cloud. As mentioned above, it will also populate the .env file with the UiPath environment data so it can run locally.
Pack the automation: Run UiPath pack. This will compile the Python project into a .nupkg package that includes all code and dependencies. You should see a success message indicating the package was created (e.g., book_writer.1.0.0.nupkg) .
Publish to Orchestrator: Run UiPath publish. This uploads the package to your Orchestrator feed so it can be executed by a Robot/Runner . After publishing, you should see the process (named “book_writer” if you used that name) available in Orchestrator. You will be prompted to select if to publish on the tenant's feed or in the personal workspace.

Run the agent: You can trigger the agent in multiple ways – via UiPath Orchestrator (start a job on the process, or schedule it), via UiPath Studio (using the Run ribbon if connected), or via the CLI. For example, using the CLI you could run:

UiPath run book_writer --input '{"topic": "Space Exploration", "num_chapters": 4}'

This command triggers the published bookwriter agent with a JSON input specifying the book's topic and number of chapters. The agent will execute on a UiPath Robot, and upon completion, you can inspect the job in Orchestrator to find the generated PDF attachments (one PDF per chapter). Each attachment file will be named according to the chapter (e.g., chapter1.pdf, chapter_2.pdf, etc.). Another method to run the agent is from Orchestrator directly, where once published on a tenant feed, you can deploy the agent package to a folder and provide the necessary setup, i.e. the environment settings should contain the same keys as inputs on your local .env file mentioned above, with the corresponding values. You can also specify default input values for the topic and number of chapters.

Agent code walkthrough

Now, let’s walk through the key sections of the Python code that powers this agent. We’ll break it down into logical parts: initialization, outline generation, chapter generation (text + image), PDF assembly, and attachment upload. The code below is a simplified version of what’s in the GitHub repo – for full details, refer to the repository, but the snippets here capture the core logic.

Configuration and initialization

First, we import the necessary libraries and set up the agent configuration:

import os, io, json

from reportlab.platypus import SimpleDocTemplate, Paragraph, Image, Spacer

from reportlab.lib.styles import getSampleStyleSheet

from uipath_llamaindex import UiPathAgent

from openai import OpenAI

# --- Configuration ---

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

UIPATH_CLOUD_URL = os.getenv("UIPATH_URL")

UIPATH_TOKEN = os.getenv("UIPATH_ACCESS_TOKEN")

# Initialize UiPath coded agent

agent = UiPathAgent(

llamaindex_config="llama_index.json",

orchestrator_url=UIPATH_CLOUD_URL,

access_token=UIPATH_TOKEN

)

Let’s unpack this: we load environment variables for the OpenAI API key and UiPath Orchestrator credentials. These will be used to authenticate our LLM calls and to allow the agent to send attachments back to Orchestrator. We then instantiate a UiPathAgent object.

This object comes from the uipath-llamaindex SDK and ties together the LlamaIndex framework with UiPath. We pass in a llamaindexconfig="llamaindex.json", which is a configuration file defining our LlamaIndex setup (e.g. which LLM to use, any retrieval settings, etc.). We also provide the orchestratorurl and accesstoken so that the agent knows how to connect to UiPath Orchestrator – this is what enables calling agent.attach_file() later to upload outputs. At this point, the agent is initialized and ready to run workflows using the LlamaIndex LLM interface (agent.llm) and to interact with Orchestrator.

Outline Generation

The first step in the workflow is generating a book outline based on the input topic. We define a function generate_outline for this:

# --- Step 1: Generate Outline ---

def generate_outline(topic, num_chapters):

prompt = (

f"Create a {num_chapters}-chapter book outline on '{topic}'.\n"

"Return JSON: [{ 'title': ..., 'description': ... }, ...]"

)

return json.loads(agent.llm.chat(prompt))

This function constructs a prompt asking the LLM to “Create a X-chapter book outline on ‘’” and explicitly instructs the LLM to return a JSON array of objects, where each object has a chapter title and description. We found that having the LLM output a machine-readable JSON makes it easy to parse and work with the outline structure in code. The agent.llm.chat(prompt) call sends the prompt to the LLM (as configured by LlamaIndex, typically an OpenAI GPT model) and returns the model’s response. We then use json.loads to parse the response into a Python list. For example, if the topic was “Space Exploration” and num_chapters=3, the LLM might return a JSON like:

[

{"title": "The Dawn of Spaceflight", "description": "How humanity first reached space..."},

{"title": "Exploring the Solar System", "description": "Robotic missions and their discoveries..."},

{"title": "The Future of Space Travel", "description": "Visions of interstellar exploration..."}

]

Our generate_outline function would parse this into a Python list of dicts for use in the next step.

Chapter generation (text and image)

With an outline in hand, the agent next generates the content for each chapter. This involves two modalities: text (chapter narrative) and an image (illustration). We handle both in a single function generate_chapter, which takes a chapter definition (title & description) and the chapter index:

# --- Step 2: Generate Chapter ---

def generate_chapter(chap, idx):

# (a) Draft text content

text_prompt = (

f"Write Chapter {idx+1}: '{chap['title']}'.\n"

f"Description: {chap['description']}\n\nChapter content:"

)

content = agent.llm.chat(text_prompt)

# (b) Create illustration using OpenAI Image API

img_resp = OpenAI(OPENAI_API_KEY).images.create(

prompt=f"Illustration for chapter titled '{chap['title']}'",

size="512x512"

)

img_data = io.BytesIO(img_resp.data[0].b64_json.encode())

# (c) Compose PDF with text and image

pdf_buffer = io.BytesIO()

doc = SimpleDocTemplate(pdf_buffer)

styles = getSampleStyleSheet()

story = [Paragraph(chap['title'], styles['Title']), Spacer(1, 12)]

story.append(Paragraph(content, styles['BodyText']))

story.append(Spacer(1, 12))

story.append(Image(img_data))

doc.build(story)

pdf_buffer.seek(0)

return pdf_buffer

Let’s break this down. In part (a), we format a text_prompt for the LLM that instructs it to “Write Chapter X: ‘’.\nDescription: \n\nChapter content:”. This prompt provides the chapter title and description from the outline to guide the LLM, and then asks it to produce the chapter content following that prompt. We call agent.llm.chat(text_prompt) to get the chapter text. The result is stored in content – this is the raw text of the chapter.

In part (b), we generate an image to accompany the chapter. We use OpenAI’s image generation API via the Python SDK. We create a new OpenAI client with our API key and call images.create with a prompt like “Illustration for chapter titled ‘’” and a desired size of 512x512 pixels. The OpenAI API returns an image (by default, as a base64-encoded JSON field, which we access as imgresp.data[0].b64json). We wrap that base64 string into a bytes buffer (img_data) for ReportLab. (Note: in practice, you might decode the base64 into binary – here we use io.BytesIO on the encoded string for simplicity. ReportLab’s Image can accept a file-like object containing the image data.)

With both the text content and image img_data ready, part (c) handles PDF composition. We create a SimpleDocTemplate using ReportLab, which will write into an in-memory pdf_buffer . We get a sample stylesheet for some default text styles. Then we build a story list – which is a sequence of elements to add to the PDF. We add a title paragraph (using the chapter title and a Title style), a spacer for some space, then a paragraph for the chapter text (BodyText style), another spacer, and finally the image.could ReportLab’s doc.build(story) takes this list of flowable elements and renders them into the PDF buffer. We seek back to the start of the buffer and return it. At this point, the function returns an in-memory PDF (as a BytesIO object) containing the fully formatted chapter, including text and the illustration.

This generate_chapter function is called for every chapter in the outline. Each call produces one PDF buffer. We deliberately kept text and image generation together so that we can intermix or modify them together per chapter (e.g., you could imagine the text content influencing the image prompt or vice versa).

Uploading PDFs as attachments

After generating all chapters, the final step is to assemble the results and upload them to Orchestrator. The agent’s main workflow function (book_writer) orchestrates this:

# --- Main Workflow ---

@agent.workflow

def book_writer(topic: str, num_chapters: int = 3):

outline = generate_outline(topic, num_chapters)

attachments = []

for idx, chap in enumerate(outline):

pdf_buf = generate_chapter(chap, idx)

filename = f"chapter_{idx+1}.pdf"

attachments.append(("application/pdf", filename, pdf_buf.getvalue()))

# Upload each PDF as job attachment

for mime, name, content in attachments:

agent.attach_file(filename=name, content=content, mime_type=mime)

return {"status": "complete", "chapters": len(outline)}

The @agent.workflow decorator designates bookwriter as a workflow entrypoint that can be triggered (this is the name we used in the CLI run command). When executed, it first calls generateoutline(topic, numchapters) to get the list of chapters. Then it iterates over each chapter definition, calls generatechapter(chap, idx) to get a PDF, and collects these into an attachments list. Each attachment is a tuple of MIME type, file name, and file content bytes. We name each PDF as "chapter_<i>.pdf" for clarity.

Once all chapters are processed, we loop through the attachments and use agent.attachfile(...) to upload each file to the Orchestrator job as an attachment. The agent.attachfile method is provided by the UiPathAgent SDK and under the hood it uses the Orchestrator API (requiring the URL and token we configured earlier) to stream the file up. We pass the file name, content (as bytes), and MIME type ("application/pdf" in this case) . If we wanted to also upload the raw images or any other files, we could similarly call attach_file for those (for example, adding PNG image bytes with "image/png" MIME type). Finally, the workflow returns a simple dictionary result – here just indicating completion status and how many chapters were generated.

When the workflow completes, you can navigate to the job details in Orchestrator and you will find all the uploaded PDFs listed as attachments. They can be downloaded directly from there. Storing outputs as attachments is a convenient way to ensure all result files are captured and auditable by the automation platform, rather than just left on a local filesystem .

Tips and next steps

Our Multi-Modal Book Writer Agent can be just a starting point to your agentification goals. There are many ways you can extend and refine this pattern to make it even more useful in an enterprise setting. Here are some ideas and next steps:

Store outputs in Storage Buckets: Instead of (or in addition to) attachments, you can save the generated PDFs to a UiPath Storage Bucket for longer-term storage or to be consumed by other automations. This would allow other processes or users to easily access the files without needing the original job context.
Add a review loop with a second agent: Consider adding an automated editorial step. For example, after the chapters are generated, have a second “Reviewer” agent that takes each chapter text and provides improvement suggestions or score. This could be another LlamaIndex agent (or even the same agent with a different workflow) that uses a prompt like “Proofread and suggest improvements for this chapter…”. The reviewer’s feedback could then be applied or presented for human approval.
Human-in-the-loop via Action Center: Not everything should be fully automated. You can integrate UiPath Action Center to handle cases where the AI’s output might need human validation. For instance, if the AI’s confidence in the content is below a certain threshold (say 90%), the automation can create an Action Center task for a human reviewer to approve or edit the chapter before final PDF assembly. This ensures quality control for critical content.
Prompt versioning and reuse: As you refine your prompts for outline or chapter generation, keep versions of them and track which version was used for each generated document. This is important in enterprise settings for traceability – if a particular output had an issue, you can trace it back to the prompt version. You can also build a library of chapter templates or reusable content. For example, if certain chapters (like “Introduction” or “Conclusion”) tend to be similar across books, the agent could reuse or retrieve pre-approved text for those, using LlamaIndex’s retrieval capabilities to fetch relevant content.
Multi-language support: Extend the agent to generate content in different languages. This could be as simple as adding a language parameter to the workflow. You can then adjust the prompts to instruct the LLM to write in that language (or translate the output). For instance, generating a Spanish version of the book would involve prompting “Create an outline in Spanish about …” and similarly translating each chapter prompt. With UiPath, you could even have the agent produce one PDF per language for broader audience reach .
Advanced PDF formatting: The current solution uses basic ReportLab templates. You can enhance the PDF output by using custom page layouts, adding a cover page, page numbers, or a table of contents that hyperlinks to each chapter. You could also swap in an HTML-to-PDF converter (like using a browser or a service) if you prefer designing content in HTML/CSS for richer styling.

By implementing some of the above enhancements, you can evolve this book-writer into a general-purpose content automation agent. The combination of UiPath’s automation platform with LlamaIndex (and LLMs) provides a powerful foundation for such solutions – giving you the creativity of AI-generated content with the reliability and governance of enterprise automation. We encourage you to explore and customize the sample to your own use cases, whether it’s generating training manuals, assembling research reports, or creating multilingual marketing content.

Visit the UiPath LlamaIndex Python GitHub repo for full source code.

Topics:

Coded Agents AI Center

Adrian Tamas

Principal Product Manager, UiPath