---
title: TERSE — A New Serialization Format Built for LLMs
published: true
description: JSON carries ~47% structural noise on every LLM API call. TERSE is a new open-source serialization format that covers the full JSON data model with 30–55% fewer tokens. Formal spec, 4 reference implementations, live playground.
tags: llm, opensource, ai, token
---
*JSON is the default. But defaults were built for a different world.*
---
Every time you send structured data to a Large Language Model, you pay for it token by token. And if you're using JSON — which almost everyone is — you're paying for a lot of characters that carry no information.
Take this simple payload:
```json
{
"user_id": 1001,
"status": "active",
"data": ["feature_a", "feature_b"],
"verified": true
}
```
Count the noise: braces, quotes around every key and string value, commas, colons with spaces. Now imagine this multiplied across thousands of API calls per day. That's real money.
I built **TERSE** to address this.
---
## What is TERSE?
**TERSE** (Token-Efficient Recursive Serialization Encoding) is a text-based data serialization format designed to represent the complete JSON data model with substantially fewer tokens — making it significantly more cost-efficient for use as input to Large Language Models.
The same payload in TERSE:
```yaml
user_id: 1001
status: active
data: [feature_a feature_b]
verified: T
```
Same information. ~47% fewer tokens.
---
## How it compares
| Format | Token savings vs JSON | Full JSON coverage? |
|--------|----------------------|---------------------|
| JSON | baseline | ✓ |
| YAML | ~20% | ✓ (verbose arrays) |
| TOON | ~40% | ✗ (flat data only) |
| **TERSE** | **~47%** | **✓** |
YAML is a genuine improvement over JSON — it's more compact and covers the full data model. But it was designed for humans to write, not for LLMs to consume. Verbose arrays (`- item` per line), full-word booleans (`true`/`false`), and a notoriously complex parser spec limit its token savings.
TOON goes further on token reduction but falls apart with nested objects — it only works for flat, uniform tabular data. If your payload has any nesting, TOON can't represent it.
TERSE was designed to close that gap: full JSON data model coverage, with token efficiency as the primary design constraint.
---
## The five design principles
**1. Bare strings** — identifiers and common values require no quotation marks. `production` stays `production`, not `"production"`. Quotes are reserved for strings that actually need them — those containing spaces, reserved characters, or special syntax.
**2. Compact primitives** — `null`, `true`, and `false` become single characters: `~`, `T`, `F`. Three of the most common values in any payload, each reduced to one token.
**3. Implicit delimiters** — spaces separate values inside objects and arrays. No trailing commas, no colons between array elements.
**4. Schema arrays** — the biggest token win for tabular data. Uniform arrays of objects declare their fields once, then list values positionally:
```yaml
users:
#[id name role active]
1 Alice admin T
2 Bruno editor T
3 Carla viewer F
```
The equivalent JSON repeats `"id"`, `"name"`, `"role"`, `"active"` on every single row. For a 100-row dataset, that's 400 unnecessary key repetitions.
**5. Recursive structure** — all constructs nest arbitrarily. Objects inside arrays inside schema arrays — all valid, all compact. No flat-only limitations.
---
## A real example: nested order
**JSON** (~180 tokens):
```json
{
"orderId": "ORD-001",
"customer": {
"name": "Rafael Torres",
"email": "
[email protected]"
},
"items": [
{"sku": "A1", "qty": 2, "price": 9.99},
{"sku": "B3", "qty": 1, "price": 24.50}
],
"paid": true,
"notes": null
}
```
**TERSE** (~95 tokens):
```yaml
orderId: ORD-001
customer: {name:"Rafael Torres" email:
[email protected]}
items:
#[sku qty price]
A1 2 9.99
B3 1 24.50
paid: T
notes: ~
```
This is where TERSE separates itself from TOON and CSV — deeply nested structures work exactly as expected.
---
## You don't write TERSE by hand
The workflow is identical to JSON:
```plaintext
Your data (object/dict)
↓
serialize() ← terse-js or terse-py
↓
TERSE string ← sent to the LLM
↓
parse() ← if you need it back
↓
Your data again
```
Just like nobody writes `JSON.stringify()` output by hand — you call the function. TERSE works the same way. The format is optimized for the one reader that actually matters: the LLM.
---
## On design intent: why not compress further?
TERSE could go deeper — automatic key abbreviation, binary type encoding, dictionary compression. We deliberately stopped short of that.
The goal is a format that remains **human-auditable**: you can open a `.terse` file in any text editor and understand what you're looking at without tooling. In LLM pipelines, auditability is a safety property, not just a convenience. When an agent misbehaves, you need to inspect its inputs.
---
## Two questions that come up
**Can I use TERSE for REST API communication between microservices?**
You can, but it's not the primary use case. REST APIs are consumed by many clients across different teams and languages — JSON's universal support is a real advantage there. TERSE shines where you control both ends: serializing data before sending it to an LLM, and parsing the response on the other side.
**Can I use TERSE for application configuration, like YAML?**
Yes — the format supports everything YAML does for config files: nested objects, arrays, typed values, comments. Worth considering if your config is also consumed by an LLM as context.
---
## What's available today
The project includes:
- **Formal specification** (v0.7) with ABNF grammar, conformance rules, and security considerations — published on Zenodo with DOI: [10.5281/zenodo.19058364](https://doi.org/10.5281/zenodo.19058364)
- **Reference implementations** in TypeScript, Python, Java, and Go
- **Live playground** where you can paste JSON and see the TERSE output in real time
Everything is open source under MIT (implementations) and CC BY 4.0 (specification).
---
## Links
- 🌐 **Landing page + playground**: [rudsoncarvalho.github.io/terse-format](https://rudsoncarvalho.github.io/terse-format)
- 📦 **GitHub**: [github.com/RudsonCarvalho/terse-format](https://github.com/RudsonCarvalho/terse-format)
- 📄 **Spec (Zenodo DOI)**: [10.5281/zenodo.19058364](https://doi.org/10.5281/zenodo.19058364)
- `npm install terse-js`
- `pip install terse-py`
---
TERSE is still a draft — v0.7 is open for community review. If you work with LLM pipelines at scale, I'd love to hear whether this addresses a real pain point in your stack.
---
*Rudson Kiyoshi Souza Carvalho — Independent Researcher*