A real-time voice AI agent powered by LiveKit and Google's Gemini Realtime API, enabling natural conversational interactions through browser-based voice interface.
# Voice Agent Prototype A real-time voice AI agent using LiveKit and Google's Gemini Realtime API for natural conversation. ## What It Does - Real-time bidirectional voice conversation with AI - Natural speech processing and response generation - Web-based interface for easy access - Continuous conversation flow (not just single responses) ## Built With - **LiveKit** - Real-time audio streaming - **Google Gemini API** - AI conversation model - **Flask** - Web backend for token generation - **HTML/JavaScript** - Browser-based voice interface ## Quick Start ### Option 1: Docker (Recommended) **Prerequisites:** Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) 1. **Set up environment**: ```bash cp env.example .env ``` Edit `.env` with your LiveKit and Google Cloud credentials. 2. **Run with Docker**: ```bash docker-compose up --build ``` 4. **Start conversation**: - Open http://localhost:5000 - Click "Join Conversation" - Allow microphone access - Start talking with the AI agent ### Option 2: Local Development 1. **Install dependencies**: ```bash pip install -r requirements.txt ``` 2. **Set up environment**: ```bash cp env.example .env ``` Edit `.env` with your LiveKit and Google Cloud credentials. 3. **Run the application**: ```bash python run_webui.py ``` 4. **Start conversation**: - Open http://localhost:5000 - Click "Join Conversation" - Allow microphone access - Start talking with the AI agent ## Environment Variables ``` LIVEKIT_URL=wss://your-livekit-server.livekit.cloud LIVEKIT_API_KEY=your_livekit_api_key LIVEKIT_API_SECRET=your_livekit_api_secret GOOGLE_API_KEY=your_google_api_key ```
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.