I Built a 1.6M-Parameter Offline Text-to-Speech Engine for Node.js — Here's How — DeepSeek Blog

Hi everyone! I'm a developer passionate about making AI accessible on low-resource devices. I've been working on speech synthesis for a while, and I wanted to share a project I've been building: **TinyTTS**. The idea started from a simple frustration — I needed text-to-speech in a Node.js app, but every option either required Python, called a cloud API, or shipped a massive model. I thought: **what if TTS could be as easy as `npm install` and just work offline?** So I built one from scratch. ## TL;DR - **1.6M parameters** — smallest TTS model I know of that still sounds natural - **~3.4 MB** ONNX model (auto-downloaded on first use) - **44.1 kHz** output, **~53x real-time** on a laptop CPU - **Zero Python dependency** — pure Node.js + ONNX Runtime - **100% G2P match** with the Python version ```bash npm install tiny-tts ``` ```js const TinyTTS = require('tiny-tts'); const tts = new TinyTTS(); await tts.speak('Hello world!', { output: 'hello.wav' }); ``` --- ## The Problem Most TTS solutions for Node.js fall into one of these categories: | Approach | Downside | |----------|----------| | Cloud APIs (Google, AWS, Azure) | Requires internet, costs money, privacy concerns | | Python wrapper (Coqui, Bark, etc.) | Need Python installed, 100MB–1GB models | | System TTS (say.js, espeak) | Robotic quality, platform-dependent | | WebSocket to Python server | Extra infra, latency, complexity | I wanted something that's `npm install` and done. Run on a $5 VPS, a Raspberry Pi, or in a CI pipeline — no cloud, no Python, no hassle. --- ## The Architecture TinyTTS is an end-to-end VITS-based model compressed down to just **1.62 million parameters**: ```plaintext Text → G2P → Phoneme IDs → ONNX Model → 44.1kHz WAV ``` ### How small is 1.6M params? | Model | Parameters | Size | |-------|-----------|------| | **TinyTTS** | **1.6M** | **~3.4 MB** | | Piper | ~63M | ~63 MB | | Kokoro | 82M | ~330 MB | | Coqui XTTS | 467M | ~1.8 GB | ### Benchmark (CPU only, same machine) | Engine | Synthesis Time | Audio Duration | RTFx | |--------|---------------|----------------|------| | **TinyTTS (ONNX)** | **92 ms** | **4.88s** | **~53x** | | Piper (ONNX) | 112 ms | 2.91s | ~26x | | Kokoro ONNX | 933 ms | 3.16s | ~3x | --- ## Usage ### API ```js const TinyTTS = require('tiny-tts'); const tts = new TinyTTS(); await tts.speak('Hello world!', { output: 'hello.wav' }); await tts.speak('This is faster.', { output: 'fast.wav', speed: 1.5 }); await tts.dispose(); ``` ### CLI ```bash npx tiny-tts "The weather is nice today." -o weather.wav npx tiny-tts "Quick test" -o test.wav --speed 1.3 ``` ### Python Also available on PyPI with identical output: ```bash pip install tiny-tts ``` ```python from tiny_tts import TinyTTS tts = TinyTTS() tts.speak("Hello world!", output_path="hello.wav") ``` --- ## What's Next This is just the beginning. Here's what I'm working on: - **Improve voice quality** — better prosody, more natural intonation, reduce artifacts while keeping the model tiny - **More voices** — different speakers, genders, and speaking styles - **Multi-language support** — expanding beyond English to other languages --- ## Links - **npm**: [npmjs.com/package/tiny-tts](https://www.npmjs.com/package/tiny-tts) - **PyPI**: [pypi.org/project/tiny-tts](https://pypi.org/project/tiny-tts/) - **GitHub**: [github.com/tronghieuit/tiny-tts](https://github.com/tronghieuit/tiny-tts) - **Live Demo**: [huggingface.co/spaces/backtracking/tiny-tts-demo](https://huggingface.co/spaces/backtracking/tiny-tts-demo) --- If you've read this far — try it out and let me know what you think! I'm especially curious about edge use cases: IoT, CI/CD audio generation, accessibility tools, game dev, etc.

I Built a 1.6M-Parameter Offline Text-to-Speech Engine for Node.js — Here's How

Tags

Comments

More Blog

How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

Local AI Will Save Us All (The Math Says So, Trust Me)

Lost in the AI Hype, I Started Small

Building a Replay-Tested Interactive Brokers Client in Go

Playwright in Pictures: Fully Parallel Mode

Designing a CLI for Both Humans and Agents