A real-time voice/call AI agent that lets you talk to a LangGraph agent over LiveKit — similar to "voice mode" experiences in ChatGPT Voice, OpenAI Realtime API sessions, and Gemini Live. This repo demonstrates adapting any LangGraph agent into a full-duplex, low-latency voice assistant using LiveKit Agents.
# LangGraph Voice Call Agent A real-time voice/call AI agent that lets you talk to a LangGraph agent over LiveKit, similar to "voice mode" experiences in Gemini Voice, OpenAI Realtime API sessions, and Gemini Live. This repo demonstrates adapting any LangGraph agent into a full-duplex, low-latency voice assistant using LiveKit's real-time communication infrastructure. This backend provides the core voice processing and AI agent functionality, built to work seamlessly with LiveKit's real-time infra and any frontend that supports LiveKit client connections. ## Features - **Real-time voice interaction** with LangGraph agents - **Full-duplex communication** with low-latency audio processing - **Flexible LangGraph integration** - works with any LangGraph agent - **Comprehensive audio pipeline** including VAD, STT, TTS, and turn detection - **Thread-based conversation continuity** via participant metadata ## Project Structure ``` langgraph-voice-call-agent/ ├── src/ # Main source code │ ├── livekit/ # LiveKit agent implementation │ │ ├── agent.py # Main agent entrypoint │ │ └── adapter/ # LangGraph integration │ │ └── langgraph.py # LangGraph adapter for LiveKit │ └── langgraph/ # LangGraph Agent Sdefinitions │ └── agent.py # An example agent ├── compose.yml # Docker Compose for local LiveKit server ├── pyproject.toml # Python project configuration ├── uv.lock # uv dependency lock file └── Makefile # Development commands ``` ## How it works (high level) 1. **Agent Initialization** → LiveKit agent connects to room and waits for participants 2. **Audio Pipeline Setup** → VAD, STT, TTS, and turn detection models are loaded and configured 3. **LangGraph Integration** → Connect to LangGraph server 4. **Voice Processing** → Real-time audio is processed through the pipeline: -
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.