Lab For AI

Lab For AI

Building a Realtime Voice RAG Agents with ADK Web UI and Gemini File Search

A Quick Tutorial of Developing ADK + RAG

Yeyu Huang's avatar
Yeyu Huang
Nov 17, 2025
∙ Paid

Google recently released two tools that make building realtime-voice-enabled document search assistants significantly simpler. First, the File Search Tool—a fully managed RAG system where you upload documents and it handles chunking, embeddings, and retrieval automatically. Second, the ADK Web UI—a built-in interface that provides text chat, voice interaction, file uploads, and debugging tools without writing any frontend code.

This tutorial shows how to combine these tools to build a production-ready RAG assistant. We’ll start with a basic agent, add file upload handling, integrate document search, and finish with voice interaction. By the end, you’ll have a system where users can upload a PDF, click the microphone, and ask “What’s in this document?”—all in about 500 lines of Python.

Let’s see the live demo first:

Before we get started, you’ll need ADK installed and a Gemini API key ready.

Understanding the Tools

File Search API

The File Search Tool is a managed RAG pipeline built into the Gemini API. You create a store, upload documents, and it handles everything under the hood—chunking, embeddings with gemini-embedding-001, vector search, and retrieval.

The pricing model is interesting. Storage is free. Query-time embeddings are free. You only pay once when you first index a file—$0.15 per million tokens. So if you upload a 100-page PDF, you pay the indexing cost once, then all searches are free.

Here’s a quick example of how simple it is:

from google import genai
from google.genai import types

client = genai.Client(api_key=’your-key’)
store = client.file_search_stores.create()

# Upload a document
client.file_search_stores.upload_to_file_search_store(
    file_search_store_name=store.name,
    file=’document.pdf’
)

# Search it
response = client.models.generate_content(
    model=’gemini-2.5-flash’,
    contents=’What does this document say?’,
    config=types.GenerateContentConfig(
        tools=[types.Tool(
            file_search=types.FileSearch(
                file_search_store_names=[store.name]
            )
        )]
    )
)

print(response.text)
# Access citations
grounding = response.candidates[0].grounding_metadata
sources = [c.retrieved_context.title for c in grounding.grounding_chunks]

That’s it. No chunking logic, no embedding models to manage, no vector database setup. File Search handles it all.

ADK Web UI

The ADK Web UI is what makes rapid development possible. You define an agent, run adk web, and you get a full interface with text chat, voice streaming, video input, file uploads, session state viewer, and a trace debugger.

Here’s the simplest possible agent:

from google.adk.agents import LlmAgent

root_agent = LlmAgent(
    name=”MyAgent”,
    model=”gemini-2.5-flash-native-audio-preview-09-2025”,
    instruction=”You are a helpful assistant.”,
)

Save that as agent.py, run adk web, open

http://localhost:8080

, and you’re done. The UI automatically provides:

  • Text input for testing conversations

  • Microphone button for voice interaction

  • File upload button for documents

  • Session state panel showing what’s stored

  • Trace viewer for debugging tool calls

Compare that to building your own web app. You’d need a FastAPI server, WebSocket handlers, audio encoding/decoding, session management, and a custom frontend. The adk web command gives you all of that for free.

Implementation

Step 1. Multi-Agent Design for RAG

Now let’s get to the interesting part—building a RAG system that handles file uploads and document search by using these two enablers.

The Three-Agent Design

Our system uses three specialized agents working together. Here’s the structure:

# Agent 1: File management
file_manager = LlmAgent(
    name=”FileManagerAgent”,
    tools=[list_files, index_files],
    instruction=”Your job is to index uploaded files.”
)

# Agent 2: Document search
search_assistant = LlmAgent(
    name=”SearchAssistantAgent”, 
    tools=[search_documents],
    instruction=”Your job is to answer questions from documents.”
)

# Agent 3: Orchestrator
orchestrator = RAGOrchestrator(
    file_manager=file_manager,
    search_assistant=search_assistant
)

Agent Roles and Responsibilities

Let me break down each agent’s role.

FileManagerAgent handles uploads. When the UI’s upload button is used, files come in as inline_data in message parts. This agent has two tools:

  • list_uploaded_files - Checks what’s available

  • index_uploaded_file - Uploads to File Search store

SearchAssistantAgent handles queries. It has one tool:

  • search_documents - Queries the File Search store

RAGOrchestrator routes requests. It’s a custom BaseAgent that detects file uploads and routes to FileManager, or routes questions to SearchAssistant. No tools itself, just coordination logic.

Keep reading with a 7-day free trial

Subscribe to Lab For AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Yeyu Huang
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture