# Why Build Voice-Enabled Agents with Claude + Deepgram?
Voice interactions are the future of AI agents. Users expect hands-free, natural conversations for tasks like customer support, virtual assistants, or smart home control. However, Claude excels at reasoning and context but starts with text. Pairing it with Deepgram's ultra-low-latency speech-to-text (STT) unlocks real-time voice agents.
**Problem:** Traditional STT services lag or lack accuracy, breaking conversation flow. Claude needs clean transcripts for optimal responses.
**Solution:** Deepgram's Nova-2 model offers 99%+ accuracy, <300ms latency, and punctuates transcripts smartly. Stream audio from your mic, transcribe live, feed to Claude's Messages API for contextual replies, and add text-to-speech (TTS) for output.
This guide builds a Python CLI agent handling continuous conversations. It's beginner-friendly yet scalable for production.
## Architecture Overview
```
Microphone (PyAudio) → Deepgram Live STT (WebSocket) → Transcript → Claude Messages API → Response → TTS (pyttsx3) → Speakers
```
Key benefits:
- **Real-time:** No waiting for full utterances.
- **Contextual:** Claude maintains conversation history.
- **Claude-specific:** Uses Anthropic SDK with system prompts for agent behavior.
- **Offline TTS:** pyttsx3 works without internet.
## Prerequisites
1. **API Keys:**
- [Deepgram](https://console.deepgram.com/signup): Free tier (200 min/month).
- [Anthropic](https://console.anthropic.com/): Claude API key.
2. **Python Environment:** Python 3.10+.
3. **Hardware:** Microphone and speakers.
## Step 1: Install Dependencies
Create a virtual environment and install:
```bash
pip install anthropic deepgram-sdk pyaudio pyttsx3
```
**Notes:**
- `pyaudio` may need system deps (e.g., `apt install portaudio19-dev` on Ubuntu).
- Deepgram SDK handles WebSocket + mic streaming.
- `pyttsx3` for cross-platform TTS.
## Step 2: Deepgram Live Transcription Setup
Deepgram's Python SDK simplifies mic → WS streaming. Here's a basic live listener:
```python
import asyncio
import json
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions
from dotenv import load_dotenv
load_dotenv() # Load DEEPGRAM_API_KEY
client = DeepgramClient(DEEPGRAM_API_KEY)
def on_message(self, result, **kwargs):
if result.channel and result.channel.alternatives:
transcript = result.channel.alternatives[0].transcript
if transcript:
print(f"Transcript: {transcript}")
# TODO: Send to Claude
options = LiveOptions(
model="nova-2",
language="en",
smart_format=True,
interim_results=True # Partial transcripts for speed
)
connection = client.listen.live.v('1').register_callback(on_message).with_config(options).create()
# Run the connection (in practice, integrate with mic loop)
connection.start()
```
This connects to `wss://api.deepgram.com/v1/listen` and processes utterances with punctuation.
## Step 3: Claude Integration with Conversation History
Use Anthropic's SDK for streaming or synchronous messages. Maintain history for context:
```python
from anthropic import Anthropic
client = Anthropic(api_key=ANTHROPIC_API_KEY)
# System prompt for agent behavior
system_prompt = "You are a helpful voice assistant. Keep responses concise (under 100 words) for natural speech."
messages = [{"role": "system", "content": system_prompt}]
def get_claude_response(user_input):
messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=200,
temperature=0.7,
system=system_prompt,
messages=messages
)
reply = response.content[0].text
messages.append({"role": "assistant", "content": reply})
return reply
```
**Prompt Tips for Voice:**
- Concise: Limit tokens to avoid long speeches.
- Punctuation: Deepgram adds it, so Claude generates speakable text.
- Context Window: Trim history if >10 exchanges (Claude 200k tokens handles plenty).
## Step 4: TTS Output
```python
import pyttsx3
engine = pyttsx3.init()
engine.setProperty('rate', 180) # Speed
engine.setProperty('volume', 0.9)
def speak(text):
engine.say(text)
engine.runAndWait()
```
Adjust voice with `voices = engine.getProperty('voices')`; set `engine.setProperty('voice', voices[1].id)` for alternatives.
## Step 5: Full Voice Agent Implementation
Combine with threading for non-blocking audio/STT/response. Use a queue for transcripts:
```python
import queue
import threading
import pyaudio # For custom mic if needed, but Deepgram handles
from deepgram import DeepgramClient, LiveOptions
# ... other imports
transcript_queue = queue.Queue()
conversation_active = True
class VoiceAgent:
def __init__(self):
self.anthropic_client = Anthropic(api_key=ANTHROPIC_API_KEY)
self.system_prompt = "You are a helpful voice assistant..."
self.messages = [{"role": "system", "content": self.system_prompt}]
self.tts_engine = pyttsx3.init()
self.tts_engine.setProperty('rate', 180)
def process_transcript(self):
while conversation_active:
try:
transcript = transcript_queue.get(timeout=1)
if "exit" in transcript.lower():
global conversation_active
conversation_active = False
break
response = self.get_response(transcript)
print(f"Claude: {response}")
self.speak(response)
except queue.Empty:
continue
def get_response(self, user_input):
self.messages.append({"role": "user", "content": user_input})
resp = self.anthropic_client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=200,
messages=self.messages
)
reply = resp.content[0].text
self.messages.append({"role": "assistant", "content": reply})
# Trim history if too long
if len(self.messages) > 20:
self.messages = [self.messages[0]] + self.messages[-10:]
return reply
def speak(self, text):
self.tts_engine.say(text)
self.tts_engine.runAndWait()
# Deepgram callback
agent = VoiceAgent()
def on_transcript(self, result, **kwargs):
transcript = result.channel.alternatives[0].transcript.strip()
if transcript:
transcript_queue.put(transcript)
# Start Deepgram
client = DeepgramClient(DEEPGRAM_API_KEY)
options = LiveOptions(model="nova-2", language="en", smart_format=True, interim_results=False) # Final results only
connection = (
client.listen
.live
.v("1")
.register_callback(on_transcript)
.with_config(options)
.create()
)
# Threads
processor_thread = threading.Thread(target=agent.process_transcript, daemon=True)
processor_thread.start()
connection.start()
# Keep alive
try:
while conversation_active:
time.sleep(0.1)
except KeyboardInterrupt:
conversation_active = False
connection.finish()
print("Agent offline.")
```
**Run it:** `python voice_agent.py`. Speak, listen to Claude reply via speakers. Say "exit" to stop.
## Troubleshooting
- **Mic issues:** Test with `python -m sounddevice`.
- **Latency:** Use `interim_results=True` for faster partials, but filter short ones.
- **Accents:** Deepgram supports 30+ languages; set `language="es"`.
- **Claude rate limits:** Handle with `try/except anthropic.RateLimitError` + retry.
## Enhancements for Production
- **Wake Word:** Integrate Picovoice Porcupine ("Hey Claude" trigger).
- **Streaming Claude:** Use `stream=True` in Messages API for progressive TTS.
- **TTS Upgrade:** ElevenLabs or AWS Polly for natural voices (Claude can generate SSML).
- **Web Deployment:** Use FastAPI + WebRTC for browser mic. Streamlit + streamlit-webrtc for demos.
- **Multi-Turn Guardrails:** Add profanity filter or topic enforcement in system prompt.
- **Logging:** Persist history to SQLite for long sessions.
**Example Web Integration Prompt:**
"Use Claude Projects for persistent memory across sessions. Link via API with `project_id`.
## Industry Use Cases
| Industry | Application |
|----------|-------------|
| **HR** | Interview screening bots |
| **Sales**| Lead qualification calls |
| **Support** | 24/7 phone agents |
| **Engineering** | Code review via voice |
## Conclusion
You've built a fully functional voice agent with Claude + Deepgram. It solves real-time transcription pain points while leveraging Claude's superior reasoning. Fork the code, deploy to Vercel/Heroku, or extend with MCP servers for tools.
**Next Steps:**
- Compare Deepgram vs. Whisper: Deepgram wins on live speed.
- Add vision: Stream video frames to Claude 3.5 Sonnet.
Experiment and share your builds on Claude Directory forums!
*(~1450 words)*