Lab For AI

Lab For AI

Share this post

Lab For AI
Lab For AI
How to Build a Serverless Android App with Realtime Streaming by Gemini 2.0 Multimodal Live API

How to Build a Serverless Android App with Realtime Streaming by Gemini 2.0 Multimodal Live API

Gemini Development Tutorial V8

Yeyu Huang's avatar
Yeyu Huang
Feb 10, 2025
∙ Paid
2

Share this post

Lab For AI
Lab For AI
How to Build a Serverless Android App with Realtime Streaming by Gemini 2.0 Multimodal Live API
1
Share
Image by Author

In this tutorial, we’ll explore how to construct a serverless, real-time streaming Android app using Google’s Gemini 2.0 and its Multi-modal Live API. Many users have requested a mobile app version of the real-time streaming interactions investigated in previous web applications, including voice and video chatting, screen sharing, drawing canvas, RAG-based QA, and an initial version of a real-time chat app on Android. This new Android app replicates these features and improves data streaming efficiency and the real-time multi-modal experience.

Here is the demo video:

This article will guide you through the app's main components, focusing on how it eliminates the need for a backend server and directly communicates with the Gemini API via a raw WebSocket implementation.

Gemini 2.0 Updates

Recently, Google announced that Gemini 2.0 Flash is now commonly accessible through the Gemini API in Google AI Studio and Vertex AI. This is a vital step in transitioning it from experimental to production-ready. They’ve also released an experimental version of Gemini 2.0 Pro, which they’re calling their best model yet for coding and complex prompts, and a new cost-effective model, Gemini 2.0 Flash-Lite.

Model list from Google DeepMind

However, no recent releases support the Multimodal Live API. The Multimodal Live API is still labeled as experimental in the list of Gemini capabilities. The 2.0 Flash card, shows that the audio output is “coming soon”, and the flash-lite-preview model will never support output types other than text. The 2.0 pro model in the experimental list is also not yet compatible with Multimodal Live API. Let’s concentrate on the flash preview model at this time.

Architecture: Serverless Approach

Our last Android projects required a backend server to manage communication with the Gemini API, which increased complexity and overhead. A WebSocket server is better for commercial use if you consider backend controls and management; however, maintaining a server is an added cost for a personal app for individual assistants. The new app completely eliminates the backend server, communicating directly with the Gemini API via raw WebSocket interface execution. This serverless method simplifies deployment and cuts maintenance costs.

System Architecture

Key Improvements

  • Serverless Architecture: Eliminates the need for a backend server, decreasing complexity and cost.

  • Real-time Image Capture: Consistently captures images and sends them to the model, offering a more natural and intuitive interaction.

Code Walkthrough: MainActivity.kt

The MainActivity.kt file is the key file for the Android app. Let’s inspect the main components.

Keep reading with a 7-day free trial

Subscribe to Lab For AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Yeyu Huang
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share