Turn any SIP call into a realtime AI voice agent (OpenAI Realtime / Deepgram/Gemini Live)
# SIP-to-AI
Why this project exists
Most voice agent frameworks:
- require WebRTC
- rely on heavy dependencies (C / ffmpeg)
- are not telephony-native
This project:
- pure Python SIP + RTP (no C deps)
- direct bridge to realtime AI
- built for call center / telephony scenarios
**SIP-to-AI** — stream RTP audio from **FreeSWITCH / OpenSIPS / Asterisk** directly to **end-to-end realtime voice models**:
- ✅ **OpenAI Realtime API** (gpt-realtime GA)
- ✅ **Deepgram Voice Agent**
- ✅ **Gemini Live** (Gemini 2.5 Flash)
Simple passthrough bridge: **SIP (G.711 μ-law @ 8kHz)** ↔ **AI voice models**. OpenAI and Deepgram support native G.711, Gemini requires PCM16 resampling (8kHz ↔ 16kHz/24kHz).
## Quick Start (OpenAI Realtime)
**Prerequisites:** Python 3.12+, UV package manager
**Pure Python, No External Dependencies:** This project uses a pure Python asyncio implementation of SIP+RTP. No C libraries or compilation required!
1. **Install dependencies:**
```bash
git clone <repository-url>
cd sip-to-ai
uv venv && source .venv/bin/activate
uv sync
```
2. **Configure environment:**
```bash
cp .env.example .env
```
Edit `.env` with your OpenAI API key:
```bash
# AI Service
AI_VENDOR=openai
OPENAI_API_KEY=sk-proj-your-key-here
OPENAI_MODEL=gpt-realtime
# Agent prompt
AGENT_PROMPT_FILE=agent_prompt.yaml
# SIP Settings (userless account - receive only)
SIP_DOMAIN=192.168.1.100
SIP_TRANSPORT_TYPE=udp
SIP_PORT=6060
```
**Optional:** Create `agent_prompt.yaml` for custom agent personality:
```yaml
instructions: |
You are a helpful AI assistant. Be concise and friendly.
greeting: "Hello! How can I help you today?"
```
3. **Run the server:**
```bash
uv run python -m app.main
```
The server will listen on `SIP_DOMAIN:SIP_PORT` for incoming calls. Each call creates an independent OpenAI Realtime WebSocket connection.
4. **Make a test call:**
```bash
# From FreeSWITCH/AsteGoogle's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.