Hi everyone! I'm a developer passionate about making AI accessible on low-resource devices. I've been working on speech synthesis for a while, and I wanted to share a project I've been building: **TinyTTS**.
The idea started from a simple frustration — I needed text-to-speech in a Node.js app, but every option either required Python, called a cloud API, or shipped a massive model. I thought: **what if TTS could be as easy as `npm install` and just work offline?**
So I built one from scratch.
## TL;DR
- **1.6M parameters** — smallest TTS model I know of that still sounds natural
- **~3.4 MB** ONNX model (auto-downloaded on first use)
- **44.1 kHz** output, **~53x real-time** on a laptop CPU
- **Zero Python dependency** — pure Node.js + ONNX Runtime
- **100% G2P match** with the Python version
```bash
npm install tiny-tts
```
```js
const TinyTTS = require('tiny-tts');
const tts = new TinyTTS();
await tts.speak('Hello world!', { output: 'hello.wav' });
```
---
## The Problem
Most TTS solutions for Node.js fall into one of these categories:
| Approach | Downside |
|----------|----------|
| Cloud APIs (Google, AWS, Azure) | Requires internet, costs money, privacy concerns |
| Python wrapper (Coqui, Bark, etc.) | Need Python installed, 100MB–1GB models |
| System TTS (say.js, espeak) | Robotic quality, platform-dependent |
| WebSocket to Python server | Extra infra, latency, complexity |
I wanted something that's `npm install` and done. Run on a $5 VPS, a Raspberry Pi, or in a CI pipeline — no cloud, no Python, no hassle.
---
## The Architecture
TinyTTS is an end-to-end VITS-based model compressed down to just **1.62 million parameters**:
```plaintext
Text → G2P → Phoneme IDs → ONNX Model → 44.1kHz WAV
```
### How small is 1.6M params?
| Model | Parameters | Size |
|-------|-----------|------|
| **TinyTTS** | **1.6M** | **~3.4 MB** |
| Piper | ~63M | ~63 MB |
| Kokoro | 82M | ~330 MB |
| Coqui XTTS | 467M | ~1.8 GB |
### Benchmark (CPU only, same machine)
| Engine | Synthesis Time | Audio Duration | RTFx |
|--------|---------------|----------------|------|
| **TinyTTS (ONNX)** | **92 ms** | **4.88s** | **~53x** |
| Piper (ONNX) | 112 ms | 2.91s | ~26x |
| Kokoro ONNX | 933 ms | 3.16s | ~3x |
---
## Usage
### API
```js
const TinyTTS = require('tiny-tts');
const tts = new TinyTTS();
await tts.speak('Hello world!', { output: 'hello.wav' });
await tts.speak('This is faster.', {
output: 'fast.wav',
speed: 1.5
});
await tts.dispose();
```
### CLI
```bash
npx tiny-tts "The weather is nice today." -o weather.wav
npx tiny-tts "Quick test" -o test.wav --speed 1.3
```
### Python
Also available on PyPI with identical output:
```bash
pip install tiny-tts
```
```python
from tiny_tts import TinyTTS
tts = TinyTTS()
tts.speak("Hello world!", output_path="hello.wav")
```
---
## What's Next
This is just the beginning. Here's what I'm working on:
- **Improve voice quality** — better prosody, more natural intonation, reduce artifacts while keeping the model tiny
- **More voices** — different speakers, genders, and speaking styles
- **Multi-language support** — expanding beyond English to other languages
---
## Links
- **npm**: [npmjs.com/package/tiny-tts](https://www.npmjs.com/package/tiny-tts)
- **PyPI**: [pypi.org/project/tiny-tts](https://pypi.org/project/tiny-tts/)
- **GitHub**: [github.com/tronghieuit/tiny-tts](https://github.com/tronghieuit/tiny-tts)
- **Live Demo**: [huggingface.co/spaces/backtracking/tiny-tts-demo](https://huggingface.co/spaces/backtracking/tiny-tts-demo)
---
If you've read this far — try it out and let me know what you think! I'm especially curious about edge use cases: IoT, CI/CD audio generation, accessibility tools, game dev, etc.