Real-time voice agents with parallel async background sub-agents — conversations continue naturally while tasks run • Join the builders → https://discord.gg/mqxKaN3UKC
<p align="center">
<img src="docs/images/icon_graphic_only.png" alt="Bodhi Agent" width="120" />
</p>
# Bodhi: High-Performance, Multi-Agent realtime Voice Stack
[](https://www.npmjs.com/package/bodhi-realtime-agent)
[](https://opensource.org/licenses/MIT)
**Conversational voice agents with non-blocking background execution.**
(No other framework does this out of the box.)
---
Most voice agent frameworks block conversation while tools execute. User says "generate me a video" and the agent goes silent for two minutes. Bodhi splits the work: **main agents** handle the live conversation (Gemini Live API by default, OpenAI Realtime also supported), while **background subagents** (Vercel AI SDK) run long tasks in parallel. When a task finishes, the agent naturally announces it.
```
User: "Make me a video of a sunset AND search for weather in Tokyo"
Main Agent (Gemini/OpenAI Live — realtime voice):
"I'm generating your video and looking up the weather..."
│
├─ Subagent 1: Video generation (2 min, Veo API)
├─ Subagent 2: Web search (3 sec, Gemini)
│
"What else can I help with?" ← keeps talking
│
[3s] "The weather in Tokyo is..." ← search result arrives
[2m] "Your video is ready!" ← video arrives
```
### Claude Code Demo — Voice-Driven Coding Assistant
<p align="center">
<a href="https://youtube.com/watch?v=n3kNK-4expo">
<img src="https://img.youtube.com/vi/n3kNK-4expo/maxresdefault.jpg" alt="Watch the Claude Code demo" width="700" />
</a>
<br />
<em>Click to watch the demo</em>
</p>
Speak coding requests naturally — Claude Code reads, edits, creates files, runs commands, and searches your codebase in the background while you keep talking. See [examples/claude_code](examples/claude_code/) for setup.
### OpenClaw Integration — Voice-Driven AI Agent
<p align="center">
<a href="httGoogle's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.