Lab For AI

Lab For AI

Share this post

Lab For AI
Lab For AI
How to Build RAG Agents in the OpenAI's Swarm Framework

How to Build RAG Agents in the OpenAI's Swarm Framework

A Guide for Integrating LlamaIndex into Swarm's Agents

Yeyu Huang's avatar
Yeyu Huang
Oct 31, 2024
∙ Paid
1

Share this post

Lab For AI
Lab For AI
How to Build RAG Agents in the OpenAI's Swarm Framework
Share
Image by Author

In our last tutorial, we implemented a web UI for a Swarm application right after the framework release from OpenAI. Now, let’s delve deeper into the nature of Swarm. Swarm is a lightweight framework that controls “routine” and “handoff”, and as an open platform, it does not integrate many complex promptings or logical strategies. If you are wondering whether it supports RAG (Retrieval Augmented Generation), the answer is that RAG should stand at a higher layer in Swarm’s architecture that can be implemented easily by the interface through further development. This allows you to create powerful RAG-based agents that can generate content based on external documents, provide answers to human users and deliver these answers to other agents in the workflow, orchestrated by your handoff designs.

In this tutorial, we will create a Swarm application which can respond to user queries on external PDF files by an RAG agent using LlamaIndex. To easily demonstrate the development, we will first implement a minimal Swarm application, which only includes a triage agent and a RAG agent. Then, we will implement more agents to work with the content generated by the RAG agent and deliver further outputs to the task from human users. Moreover, as an optimization, the local PDF will be embedded into a local vector store to save the time and cost of repeatedly querying the same document.

Swarm

If you are unfamiliar with Swarm from OpenAI’s experimental project, let’s quickly review it.

Swarm is designed to make agent coordination and execution lightweight, highly controllable, and easily testable. It achieves this through two core concepts: Agents and handoffs. An Agent includes instructions and functions, and can transfer a conversation to another Agent at any time. Compared to other multi-agent frameworks, due to their lightweight nature, its orchestration is more controllable as there are very few system prompts forcing into the framework, and it is very open for adding functions and behaviours to the agents without much effort on the learning curve.

Image from Swarm Repository

LlamaIndex

Swarm has no built-in RAG assistant or interface, but we can easily implement a RAG process by using LlamaIndex and registering that process in the Swarm agent as a function. If you have designed some document retrieval/search features in your LLM application, you must have known about the LlamaIndex, which is a powerful framework for building context-augmented LLM applications. It provides comprehensive tools, including data connectors for ingesting data from various sources like PDF, SQL and external APIs; data indexes for efficient connection of embedding and storage; query and chat engines; observability features for monitoring and evaluation; and flexible workflows to combine all these components. This makes it an ideal choice for implementing RAG functionality within our Swarm agents.

The quickest process of a complete RAG flow with the default OpenAI GPT model needs only 5 lines of code:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Some question about the data should go here")
print(response)

This code will load a local PDF file, extract the content, embed the content into a vector store in the memory, and then use the content to answer a question.

Code Walkthrough

Now, let’s walk through the code implementation of a RAG agent in Swarm framework.

In this code, I will explicitly specify the GPT-4o for QA and the embedding model from FireworksAI for embedding and retrieval. Here, I choose Fireworks because it provides lower cost and equivalent or higher efficiency embedding API service compared to OpenAI. If you don’t own a Fireworks account, you can easily get one or just use the OpenAI embedding model as a default setting in LlamaIndex. 

The price of the embedding models on FireworksAI

First, we should install the LlamaIndex package (including the Fireworks embedding module) and Swarm via pip.

pip install llama-index llama-index-embeddings-fireworks
pip install git+https://github.com/openai/swarm.git

Now, we need to import the necessary libraries and set up the environment with your OpenAI and Fireworks API keys.

from swarm import Swarm, Agent, Result
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.fireworks import FireworksEmbedding
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

os.environ["OPENAI_API_KEY"] = "sk-your-openai-api-key"
os.environ["FIREWORKS_API_KEY"] = "fw-your-fireworks-api-key"

Before creating any agents, we should build a function of the RAG process.

Keep reading with a 7-day free trial

Subscribe to Lab For AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Yeyu Huang
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share