AI Agents

Building Voice-Enabled AI Agents with Claude + Deepgram Speech-to-Text

Claude Directory January 10, 2026

1 views

Build voice-enabled AI agents with Claude AI and Deepgram for real-time speech-to-text transcription and natural responses. Transform text chats into immersive voice conversations.

# Why Build Voice-Enabled Agents with Claude + Deepgram? Voice interactions are the future of AI agents. Users expect hands-free, natural conversations for tasks like customer support, virtual assistants, or smart home control. However, Claude excels at reasoning and context but starts with text. Pairing it with Deepgram's ultra-low-latency speech-to-text (STT) unlocks real-time voice agents. **Problem:** Traditional STT services lag or lack accuracy, breaking conversation flow. Claude needs clean transcripts for optimal responses. **Solution:** Deepgram's Nova-2 model offers 99%+ accuracy, <300ms latency, and punctuates transcripts smartly. Stream audio from your mic, transcribe live, feed to Claude's Messages API for contextual replies, and add text-to-speech (TTS) for output. This guide builds a Python CLI agent handling continuous conversations. It's beginner-friendly yet scalable for production. ## Architecture Overview ``` Microphone (PyAudio) → Deepgram Live STT (WebSocket) → Transcript → Claude Messages API → Response → TTS (pyttsx3) → Speakers ``` Key benefits: - **Real-time:** No waiting for full utterances. - **Contextual:** Claude maintains conversation history. - **Claude-specific:** Uses Anthropic SDK with system prompts for agent behavior. - **Offline TTS:** pyttsx3 works without internet. ## Prerequisites 1. **API Keys:** - [Deepgram](https://console.deepgram.com/signup): Free tier (200 min/month). - [Anthropic](https://console.anthropic.com/): Claude API key. 2. **Python Environment:** Python 3.10+. 3. **Hardware:** Microphone and speakers. ## Step 1: Install Dependencies Create a virtual environment and install: ```bash pip install anthropic deepgram-sdk pyaudio pyttsx3 ``` **Notes:** - `pyaudio` may need system deps (e.g., `apt install portaudio19-dev` on Ubuntu). - Deepgram SDK handles WebSocket + mic streaming. - `pyttsx3` for cross-platform TTS. ## Step 2: Deepgram Live Transcription Setup Deepgram's Python SDK simplifies mic → WS streaming. Here's a basic live listener: ```python import asyncio import json from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions from dotenv import load_dotenv load_dotenv() # Load DEEPGRAM_API_KEY client = DeepgramClient(DEEPGRAM_API_KEY) def on_message(self, result, **kwargs): if result.channel and result.channel.alternatives: transcript = result.channel.alternatives[0].transcript if transcript: print(f"Transcript: {transcript}") # TODO: Send to Claude options = LiveOptions( model="nova-2", language="en", smart_format=True, interim_results=True # Partial transcripts for speed ) connection = client.listen.live.v('1').register_callback(on_message).with_config(options).create() # Run the connection (in practice, integrate with mic loop) connection.start() ``` This connects to `wss://api.deepgram.com/v1/listen` and processes utterances with punctuation. ## Step 3: Claude Integration with Conversation History Use Anthropic's SDK for streaming or synchronous messages. Maintain history for context: ```python from anthropic import Anthropic client = Anthropic(api_key=ANTHROPIC_API_KEY) # System prompt for agent behavior system_prompt = "You are a helpful voice assistant. Keep responses concise (under 100 words) for natural speech." messages = [{"role": "system", "content": system_prompt}] def get_claude_response(user_input): messages.append({"role": "user", "content": user_input}) response = client.messages.create( model="claude-3-5-sonnet-20240620", max_tokens=200, temperature=0.7, system=system_prompt, messages=messages ) reply = response.content[0].text messages.append({"role": "assistant", "content": reply}) return reply ``` **Prompt Tips for Voice:** - Concise: Limit tokens to avoid long speeches. - Punctuation: Deepgram adds it, so Claude generates speakable text. - Context Window: Trim history if >10 exchanges (Claude 200k tokens handles plenty). ## Step 4: TTS Output ```python import pyttsx3 engine = pyttsx3.init() engine.setProperty('rate', 180) # Speed engine.setProperty('volume', 0.9) def speak(text): engine.say(text) engine.runAndWait() ``` Adjust voice with `voices = engine.getProperty('voices')`; set `engine.setProperty('voice', voices[1].id)` for alternatives. ## Step 5: Full Voice Agent Implementation Combine with threading for non-blocking audio/STT/response. Use a queue for transcripts: ```python import queue import threading import pyaudio # For custom mic if needed, but Deepgram handles from deepgram import DeepgramClient, LiveOptions # ... other imports transcript_queue = queue.Queue() conversation_active = True class VoiceAgent: def __init__(self): self.anthropic_client = Anthropic(api_key=ANTHROPIC_API_KEY) self.system_prompt = "You are a helpful voice assistant..." self.messages = [{"role": "system", "content": self.system_prompt}] self.tts_engine = pyttsx3.init() self.tts_engine.setProperty('rate', 180) def process_transcript(self): while conversation_active: try: transcript = transcript_queue.get(timeout=1) if "exit" in transcript.lower(): global conversation_active conversation_active = False break response = self.get_response(transcript) print(f"Claude: {response}") self.speak(response) except queue.Empty: continue def get_response(self, user_input): self.messages.append({"role": "user", "content": user_input}) resp = self.anthropic_client.messages.create( model="claude-3-5-sonnet-20240620", max_tokens=200, messages=self.messages ) reply = resp.content[0].text self.messages.append({"role": "assistant", "content": reply}) # Trim history if too long if len(self.messages) > 20: self.messages = [self.messages[0]] + self.messages[-10:] return reply def speak(self, text): self.tts_engine.say(text) self.tts_engine.runAndWait() # Deepgram callback agent = VoiceAgent() def on_transcript(self, result, **kwargs): transcript = result.channel.alternatives[0].transcript.strip() if transcript: transcript_queue.put(transcript) # Start Deepgram client = DeepgramClient(DEEPGRAM_API_KEY) options = LiveOptions(model="nova-2", language="en", smart_format=True, interim_results=False) # Final results only connection = ( client.listen .live .v("1") .register_callback(on_transcript) .with_config(options) .create() ) # Threads processor_thread = threading.Thread(target=agent.process_transcript, daemon=True) processor_thread.start() connection.start() # Keep alive try: while conversation_active: time.sleep(0.1) except KeyboardInterrupt: conversation_active = False connection.finish() print("Agent offline.") ``` **Run it:** `python voice_agent.py`. Speak, listen to Claude reply via speakers. Say "exit" to stop. ## Troubleshooting - **Mic issues:** Test with `python -m sounddevice`. - **Latency:** Use `interim_results=True` for faster partials, but filter short ones. - **Accents:** Deepgram supports 30+ languages; set `language="es"`. - **Claude rate limits:** Handle with `try/except anthropic.RateLimitError` + retry. ## Enhancements for Production - **Wake Word:** Integrate Picovoice Porcupine ("Hey Claude" trigger). - **Streaming Claude:** Use `stream=True` in Messages API for progressive TTS. - **TTS Upgrade:** ElevenLabs or AWS Polly for natural voices (Claude can generate SSML). - **Web Deployment:** Use FastAPI + WebRTC for browser mic. Streamlit + streamlit-webrtc for demos. - **Multi-Turn Guardrails:** Add profanity filter or topic enforcement in system prompt. - **Logging:** Persist history to SQLite for long sessions. **Example Web Integration Prompt:** "Use Claude Projects for persistent memory across sessions. Link via API with `project_id`. ## Industry Use Cases | Industry | Application | |----------|-------------| | **HR** | Interview screening bots | | **Sales**| Lead qualification calls | | **Support** | 24/7 phone agents | | **Engineering** | Code review via voice | ## Conclusion You've built a fully functional voice agent with Claude + Deepgram. It solves real-time transcription pain points while leveraging Claude's superior reasoning. Fork the code, deploy to Vercel/Heroku, or extend with MCP servers for tools. **Next Steps:** - Compare Deepgram vs. Whisper: Deepgram wins on live speed. - Add vision: Stream video frames to Claude 3.5 Sonnet. Experiment and share your builds on Claude Directory forums! *(~1450 words)*

Comments

More Blog

View all

Claude for Developers

Building Voice Agents with Claude API and ElevenLabs: Conversational AI Guide

Build natural voice agents combining Claude API's superior reasoning with ElevenLabs' lifelike TTS. This end-to-end guide creates a conversational web app with STT, AI chat, and speech synthesis.

Claude Directory

Model Comparisons

Claude vs Mistral Large 2: 2025 Data Analysis Benchmarks and Use Cases

As data volumes explode in 2025, choosing between Claude's reasoning depth and Mistral Large 2's efficiency is critical. We benchmark SQL generation, visualizations, and large datasets to reveal the w

Claude Directory

Enterprise

Claude Enterprise for Cybersecurity: Threat Modeling and Incident Response

In the high-stakes world of cybersecurity, rapid threat modeling and incident response can mean the difference between containment and catastrophe. Discover how Claude Enterprise empowers security tea

Claude Directory

Claude Code

Claude Code in VS Code: Custom Commands for Refactoring Large Codebases

Refactoring sprawling codebases manually? Harness Claude Code's power in VS Code with custom commands to automate AI-driven refactors across TypeScript and Python projects—saving hours of drudgery.

Claude Directory

Claude for Developers

Claude SDK Rust for Blockchain: Smart Contract Auditing Agents

Build blazing-fast smart contract auditing agents in Rust using the Claude SDK. Harness Claude's reasoning to scan Solidity code for vulnerabilities like reentrancy and overflows.

Claude Directory

Claude Best Practices

Advanced Claude Artifacts: Collaborative Editing in Multi-User Sessions

Elevate team productivity with Claude Artifacts in multi-user projects—enable real-time iterative editing for code reviews and docs without leaving the interface.

Claude Directory

Building Voice-Enabled AI Agents with Claude + Deepgram Speech-to-Text

Tags

Comments

More Blog

Building Voice Agents with Claude API and ElevenLabs: Conversational AI Guide

Claude vs Mistral Large 2: 2025 Data Analysis Benchmarks and Use Cases

Claude Enterprise for Cybersecurity: Threat Modeling and Incident Response

Claude Code in VS Code: Custom Commands for Refactoring Large Codebases

Claude SDK Rust for Blockchain: Smart Contract Auditing Agents

Advanced Claude Artifacts: Collaborative Editing in Multi-User Sessions