Building a Realtime Voice RAG Agents with ADK Web UI and Gemini File Search
A Quick Tutorial of Developing ADK + RAG
Google recently released two tools that make building realtime-voice-enabled document search assistants significantly simpler. First, the File Search Tool—a fully managed RAG system where you upload documents and it handles chunking, embeddings, and retrieval automatically. Second, the ADK Web UI—a built-in interface that provides text chat, voice interaction, file uploads, and debugging tools without writing any frontend code.
This tutorial shows how to combine these tools to build a production-ready RAG assistant. We’ll start with a basic agent, add file upload handling, integrate document search, and finish with voice interaction. By the end, you’ll have a system where users can upload a PDF, click the microphone, and ask “What’s in this document?”—all in about 500 lines of Python.
Let’s see the live demo first:
Before we get started, you’ll need ADK installed and a Gemini API key ready.
Understanding the Tools
File Search API
The File Search Tool is a managed RAG pipeline built into the Gemini API. You create a store, upload documents, and it handles everything under the hood—chunking, embeddings with gemini-embedding-001, vector search, and retrieval.
The pricing model is interesting. Storage is free. Query-time embeddings are free. You only pay once when you first index a file—$0.15 per million tokens. So if you upload a 100-page PDF, you pay the indexing cost once, then all searches are free.
Here’s a quick example of how simple it is:
from google import genai
from google.genai import types
client = genai.Client(api_key=’your-key’)
store = client.file_search_stores.create()
# Upload a document
client.file_search_stores.upload_to_file_search_store(
file_search_store_name=store.name,
file=’document.pdf’
)
# Search it
response = client.models.generate_content(
model=’gemini-2.5-flash’,
contents=’What does this document say?’,
config=types.GenerateContentConfig(
tools=[types.Tool(
file_search=types.FileSearch(
file_search_store_names=[store.name]
)
)]
)
)
print(response.text)
# Access citations
grounding = response.candidates[0].grounding_metadata
sources = [c.retrieved_context.title for c in grounding.grounding_chunks]
That’s it. No chunking logic, no embedding models to manage, no vector database setup. File Search handles it all.
ADK Web UI
The ADK Web UI is what makes rapid development possible. You define an agent, run adk web, and you get a full interface with text chat, voice streaming, video input, file uploads, session state viewer, and a trace debugger.
Here’s the simplest possible agent:
from google.adk.agents import LlmAgent
root_agent = LlmAgent(
name=”MyAgent”,
model=”gemini-2.5-flash-native-audio-preview-09-2025”,
instruction=”You are a helpful assistant.”,
)
Save that as agent.py, run adk web, open
http://localhost:8080
, and you’re done. The UI automatically provides:
Text input for testing conversations
Microphone button for voice interaction
File upload button for documents
Session state panel showing what’s stored
Trace viewer for debugging tool calls
Compare that to building your own web app. You’d need a FastAPI server, WebSocket handlers, audio encoding/decoding, session management, and a custom frontend. The adk web command gives you all of that for free.
Implementation
Step 1. Multi-Agent Design for RAG
Now let’s get to the interesting part—building a RAG system that handles file uploads and document search by using these two enablers.
The Three-Agent Design
Our system uses three specialized agents working together. Here’s the structure:
# Agent 1: File management
file_manager = LlmAgent(
name=”FileManagerAgent”,
tools=[list_files, index_files],
instruction=”Your job is to index uploaded files.”
)
# Agent 2: Document search
search_assistant = LlmAgent(
name=”SearchAssistantAgent”,
tools=[search_documents],
instruction=”Your job is to answer questions from documents.”
)
# Agent 3: Orchestrator
orchestrator = RAGOrchestrator(
file_manager=file_manager,
search_assistant=search_assistant
)
Agent Roles and Responsibilities
Let me break down each agent’s role.
FileManagerAgent handles uploads. When the UI’s upload button is used, files come in as inline_data in message parts. This agent has two tools:
list_uploaded_files- Checks what’s availableindex_uploaded_file- Uploads to File Search store
SearchAssistantAgent handles queries. It has one tool:
search_documents- Queries the File Search store
RAGOrchestrator routes requests. It’s a custom BaseAgent that detects file uploads and routes to FileManager, or routes questions to SearchAssistant. No tools itself, just coordination logic.
Keep reading with a 7-day free trial
Subscribe to Lab For AI to keep reading this post and get 7 days of free access to the full post archives.


