Claude for Developers

Building Voice Agents with Claude API and ElevenLabs: Conversational AI Guide

Claude Directory January 15, 2026

2 views

Build natural voice agents combining Claude API's superior reasoning with ElevenLabs' lifelike TTS. This end-to-end guide creates a conversational web app with STT, AI chat, and speech synthesis.

Introduction

Voice agents are transforming how users interact with AI, enabling hands-free, natural conversations in web apps. By integrating the Claude API from Anthropic with ElevenLabs' state-of-the-art text-to-speech (TTS), you can create responsive voice assistants that leverage Claude's exceptional reasoning capabilities—especially Claude 3.5 Sonnet—for context-aware responses.

This tutorial provides a complete, production-ready example: a web app where users speak via microphone, speech is transcribed using the browser's Web Speech API, processed by Claude for intelligent replies, and voiced back via ElevenLabs. We'll use Node.js for the backend to securely handle API keys and maintain conversation history.

Key features:

Real-time speech-to-text (STT) with Web Speech API
Stateful conversations with Claude API
Lifelike TTS streaming from ElevenLabs
Session-based chat history
Deployable to platforms like Vercel or Render

Expect low latency (~1-2 seconds end-to-end) and natural prosody. Perfect for developers building customer support bots, virtual assistants, or interactive demos.

(Word count so far: ~150)

Prerequisites

Before starting:

Node.js 18+ installed
Anthropic API key: Sign up at console.anthropic.com and generate a key (Claude 3.5 Sonnet recommended)
ElevenLabs account and API key: Register at elevenlabs.io, get your key, and note a voice ID (e.g., '21m00Tcm4TlvDq8ikWAM' for 'Adam')
Basic JavaScript knowledge
Text editor (VS Code) and terminal

We'll use free tiers: Anthropic offers $5 credit, ElevenLabs has generous limits for prototyping.

(Word count: ~280)

Step 1: Project Setup

Create a new directory and initialize the project:

mkdir claude-voice-agent
cd claude-voice-agent
npm init -y
npm install express @anthropic-ai/sdk elevenlabs cors dotenv uuid

Create a .env file for secrets:

ANTHROPIC_API_KEY=your_anthropic_key_here
ELEVENLABS_API_KEY=your_elevenlabs_key_here
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM  # Replace with your preferred voice
PORT=3000

These packages provide:

express: Web server
@anthropic-ai/sdk: Official Claude API client
elevenlabs: JS SDK for TTS
cors: Enable browser requests
dotenv: Load env vars
uuid: Generate session IDs

(Word count: ~420)

Step 2: Backend Implementation

Create server.js for the Express server. It handles two endpoints: /chat for Claude conversations and /tts for speech synthesis. We use an in-memory Map for session history (use Redis for production).

const express = require('express');
const cors = require('cors');
const { Anthropic } = require('@anthropic-ai/sdk');
const ElevenLabs = require('elevenlabs');
const { v4: uuidv4 } = require('uuid');
require('dotenv').config();

const app = express();
app.use(cors());
app.use(express.json());
app.use(express.static('public'));  // Serve frontend

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const elevenlabs = new ElevenLabs({ apiKey: process.env.ELEVENLABS_API_KEY });

const sessions = new Map();  // sessionId -> [{role, content}]

// System prompt optimized for voice: concise, engaging
const SYSTEM_PROMPT = "You are a helpful voice assistant. Respond concisely (under 80 words), naturally, and engagingly. Use simple language.";

app.post('/chat', async (req, res) => {
  const { message, sessionId } = req.body;
  let history = sessions.get(sessionId) || [];

  history.push({ role: 'user', content: message });

  // Trim history to last 10 exchanges to fit context window
const recentHistory = history.slice(-20);  // 10 turns

  try {
    const response = await anthropic.messages.create({
      model: 'claude-3-5-sonnet-20240620',
      max_tokens: 500,
      system: SYSTEM_PROMPT,
      messages: recentHistory,
    });

    const assistantMessage = response.content[0].text;
    history.push({ role: 'assistant', content: assistantMessage });
    sessions.set(sessionId, history);

    res.json({ reply: assistantMessage, sessionId });
  } catch (error) {
    console.error(error);
    res.status(500).json({ error: 'Claude API error' });
  }
});

app.post('/tts', async (req, res) => {
  const { text, voiceId = process.env.ELEVENLABS_VOICE_ID } = req.body;

  try {
    const audio = await elevenlabs.generate({
      voice: voiceId,
      text,
      model_id: 'eleven_monolingual_v1',
      output_format: 'mp3_44100_128',  // Web-friendly
    });

    res.set({
      'Content-Type': 'audio/mpeg',
      'Cache-Control': 'no-cache',
    });
    res.send(audio);
  } catch (error) {
    console.error(error);
    res.status(500).json({ error: 'TTS error' });
  }
});

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => console.log(`Server running on http://localhost:${PORT}`));

Key Claude-specific notes:

Use claude-3-5-sonnet-20240620 for best instruction-following and low hallucination in conversations.
System prompt ensures brevity—critical for voice (avoids long pauses).
History management prevents context overflow (Claude's 200k token window).

Run with node server.js.

(Word count: ~950)

Step 3: Frontend Implementation

Create a public folder with index.html:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Claude Voice Agent</title>
  <style>
    body { font-family: Arial, sans-serif; max-width: 600px; margin: 0 auto; padding: 20px; }
    button { padding: 10px 20px; font-size: 16px; margin: 10px; }
    #status { color: #666; margin: 10px 0; }
    #conversation { border: 1px solid #ddd; height: 300px; overflow-y: scroll; padding: 10px; }
  </style>
</head>
<body>
  <h1>🤖 Claude Voice Agent</h1>
  <button id="startBtn">🎤 Start Listening</button>
  <button id="stopBtn" disabled>⏹️ Stop</button>
  <div id="status">Click start to speak!</div>
  <div id="conversation"></div>
  <audio id="audio" autoplay></audio>

  <script>
    const SERVER_URL = 'http://localhost:3000';
    let recognition, sessionId = crypto.randomUUID();
    const statusEl = document.getElementById('status');
    const convEl = document.getElementById('conversation');
    const audioEl = document.getElementById('audio');

    // Web Speech API (Chrome/Edge best support)
    const SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition;
    recognition = new SpeechRecognition();
    recognition.continuous = false;
    recognition.interimResults = false;
    recognition.lang = 'en-US';

    document.getElementById('startBtn').onclick = () => {
      recognition.start();
      statusEl.textContent = 'Listening... Speak now!';
    };

    document.getElementById('stopBtn').onclick = () => {
      recognition.stop();
    };

    recognition.onresult = async (event) => {
      const transcript = event.results[0][0].transcript;
      addMessage('You', transcript);

      const response = await fetch(`${SERVER_URL}/chat`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: transcript, sessionId }),
      });
      const { reply } = await response.json();
      addMessage('Claude', reply);

      // Generate and play TTS
const audioBlob = await fetch(`${SERVER_URL}/tts`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ text: reply }),
      }).then(r => r.blob());

      audioEl.src = URL.createObjectURL(audioBlob);
      statusEl.textContent = 'Playing response...';
    };

    recognition.onend = () => {
      statusEl.textContent = 'Processing complete. Click start to speak again.';
      document.getElementById('startBtn').disabled = false;
      document.getElementById('stopBtn').disabled = true;
    };

    function addMessage(speaker, text) {
      const div = document.createElement('div');
      div.innerHTML = `<strong>${speaker}:</strong> ${text}<br>`;
      convEl.appendChild(div);
      convEl.scrollTop = convEl.scrollHeight;
    }
  </script>
</body>
</html>

Frontend highlights:

Web Speech API: Free, browser-native STT. Handles interim results for fluid UX.
Session persistence: UUID ensures multi-turn context.
Audio playback: Blob URLs for seamless MP3 streaming.

Test: node server.js, visit http://localhost:3000. Speak a query like "What's the weather like?"—Claude responds contextually!

(Word count: ~1450)

Step 4: Customization and Best Practices

Claude Models: Swap to claude-3-haiku-20240307 for faster/cheaper responses.
ElevenLabs Voices: List via API or dashboard. Try multilingual models.
Prompt Engineering: Add tools to Claude for functions (e.g., weather API)—see Anthropic docs.
Error Handling: Add retries with exponential backoff.
Latency Tips: Use streaming (stream: true in Claude) for partial responses; pipe to ElevenLabs WebSocket TTS.
Privacy: STT stays client-side; no audio sent to servers.

Production Upgrades:

Redis for sessions
Authentication (JWT)
Deepgram/Whisper for better STT

Step 5: Deployment

Push to GitHub, deploy backend to Render/Vercel (env vars required). Frontend is static—host anywhere. For full-stack, use Vercel with vercel.json routing.

Example vercel.json:

{
  "rewrites": [{ "source": "/chat", "destination": "/api/chat" }]
}

(Word count: ~1620)

Conclusion

You've built a fully functional voice agent with Claude's reasoning edge and ElevenLabs' human-like speech. Experiment with prompts, voices, and integrations like Slack. For enterprise, scale with Claude Team plans. Share your builds in comments!

Resources:

Comments

More Blog

View all

Model Comparisons

Claude vs Mistral Large 2: 2025 Data Analysis Benchmarks and Use Cases

As data volumes explode in 2025, choosing between Claude's reasoning depth and Mistral Large 2's efficiency is critical. We benchmark SQL generation, visualizations, and large datasets to reveal the w

Claude Directory

Enterprise

Claude Enterprise for Cybersecurity: Threat Modeling and Incident Response

In the high-stakes world of cybersecurity, rapid threat modeling and incident response can mean the difference between containment and catastrophe. Discover how Claude Enterprise empowers security tea

Claude Directory

Claude Code

Claude Code in VS Code: Custom Commands for Refactoring Large Codebases

Refactoring sprawling codebases manually? Harness Claude Code's power in VS Code with custom commands to automate AI-driven refactors across TypeScript and Python projects—saving hours of drudgery.

Claude Directory

Claude for Developers

Claude SDK Rust for Blockchain: Smart Contract Auditing Agents

Build blazing-fast smart contract auditing agents in Rust using the Claude SDK. Harness Claude's reasoning to scan Solidity code for vulnerabilities like reentrancy and overflows.

Claude Directory

Claude Best Practices

Advanced Claude Artifacts: Collaborative Editing in Multi-User Sessions

Elevate team productivity with Claude Artifacts in multi-user projects—enable real-time iterative editing for code reviews and docs without leaving the interface.

Claude Directory

Industry Playbooks

Claude Haiku Embeddings for Recommendation Engines: E-Commerce Playbook

Unlock lightning-fast, cost-effective product recommendations for your e-commerce store using Claude 3 Haiku embeddings. This playbook delivers a complete Node.js tutorial to build personalized recomm

Claude Directory

Building Voice Agents with Claude API and ElevenLabs: Conversational AI Guide

Introduction

Prerequisites

Step 1: Project Setup

Step 2: Backend Implementation

Step 3: Frontend Implementation

Step 4: Customization and Best Practices

Step 5: Deployment

Conclusion

Tags

Comments

More Blog

Claude vs Mistral Large 2: 2025 Data Analysis Benchmarks and Use Cases

Claude Enterprise for Cybersecurity: Threat Modeling and Incident Response

Claude Code in VS Code: Custom Commands for Refactoring Large Codebases

Claude SDK Rust for Blockchain: Smart Contract Auditing Agents

Advanced Claude Artifacts: Collaborative Editing in Multi-User Sessions

Claude Haiku Embeddings for Recommendation Engines: E-Commerce Playbook