TERSE — A New Serialization Format Built for LLMs — DeepSeek Blog | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekBlogTERSE — A New Serialization Format Built for LLMs
    Back to Blog
    TERSE — A New Serialization Format Built for LLMs
    llm

    TERSE — A New Serialization Format Built for LLMs

    Rudson Kiyoshi Souza Carvalho March 31, 2026
    0 views

    JSON carries ~47% structural noise on every LLM API call. TERSE is a new open-source serialization format that covers the full JSON data model with 30–55% fewer tokens. Formal spec, 4 reference implementations, live playground.

    --- title: TERSE — A New Serialization Format Built for LLMs published: true description: JSON carries ~47% structural noise on every LLM API call. TERSE is a new open-source serialization format that covers the full JSON data model with 30–55% fewer tokens. Formal spec, 4 reference implementations, live playground. tags: llm, opensource, ai, token --- *JSON is the default. But defaults were built for a different world.* --- Every time you send structured data to a Large Language Model, you pay for it token by token. And if you're using JSON — which almost everyone is — you're paying for a lot of characters that carry no information. Take this simple payload: ```json { "user_id": 1001, "status": "active", "data": ["feature_a", "feature_b"], "verified": true } ``` Count the noise: braces, quotes around every key and string value, commas, colons with spaces. Now imagine this multiplied across thousands of API calls per day. That's real money. I built **TERSE** to address this. --- ## What is TERSE? **TERSE** (Token-Efficient Recursive Serialization Encoding) is a text-based data serialization format designed to represent the complete JSON data model with substantially fewer tokens — making it significantly more cost-efficient for use as input to Large Language Models. The same payload in TERSE: ```yaml user_id: 1001 status: active data: [feature_a feature_b] verified: T ``` Same information. ~47% fewer tokens. --- ## How it compares | Format | Token savings vs JSON | Full JSON coverage? | |--------|----------------------|---------------------| | JSON | baseline | ✓ | | YAML | ~20% | ✓ (verbose arrays) | | TOON | ~40% | ✗ (flat data only) | | **TERSE** | **~47%** | **✓** | YAML is a genuine improvement over JSON — it's more compact and covers the full data model. But it was designed for humans to write, not for LLMs to consume. Verbose arrays (`- item` per line), full-word booleans (`true`/`false`), and a notoriously complex parser spec limit its token savings. TOON goes further on token reduction but falls apart with nested objects — it only works for flat, uniform tabular data. If your payload has any nesting, TOON can't represent it. TERSE was designed to close that gap: full JSON data model coverage, with token efficiency as the primary design constraint. --- ## The five design principles **1. Bare strings** — identifiers and common values require no quotation marks. `production` stays `production`, not `"production"`. Quotes are reserved for strings that actually need them — those containing spaces, reserved characters, or special syntax. **2. Compact primitives** — `null`, `true`, and `false` become single characters: `~`, `T`, `F`. Three of the most common values in any payload, each reduced to one token. **3. Implicit delimiters** — spaces separate values inside objects and arrays. No trailing commas, no colons between array elements. **4. Schema arrays** — the biggest token win for tabular data. Uniform arrays of objects declare their fields once, then list values positionally: ```yaml users: #[id name role active] 1 Alice admin T 2 Bruno editor T 3 Carla viewer F ``` The equivalent JSON repeats `"id"`, `"name"`, `"role"`, `"active"` on every single row. For a 100-row dataset, that's 400 unnecessary key repetitions. **5. Recursive structure** — all constructs nest arbitrarily. Objects inside arrays inside schema arrays — all valid, all compact. No flat-only limitations. --- ## A real example: nested order **JSON** (~180 tokens): ```json { "orderId": "ORD-001", "customer": { "name": "Rafael Torres", "email": "[email protected]" }, "items": [ {"sku": "A1", "qty": 2, "price": 9.99}, {"sku": "B3", "qty": 1, "price": 24.50} ], "paid": true, "notes": null } ``` **TERSE** (~95 tokens): ```yaml orderId: ORD-001 customer: {name:"Rafael Torres" email:[email protected]} items: #[sku qty price] A1 2 9.99 B3 1 24.50 paid: T notes: ~ ``` This is where TERSE separates itself from TOON and CSV — deeply nested structures work exactly as expected. --- ## You don't write TERSE by hand The workflow is identical to JSON: ```plaintext Your data (object/dict) ↓ serialize() ← terse-js or terse-py ↓ TERSE string ← sent to the LLM ↓ parse() ← if you need it back ↓ Your data again ``` Just like nobody writes `JSON.stringify()` output by hand — you call the function. TERSE works the same way. The format is optimized for the one reader that actually matters: the LLM. --- ## On design intent: why not compress further? TERSE could go deeper — automatic key abbreviation, binary type encoding, dictionary compression. We deliberately stopped short of that. The goal is a format that remains **human-auditable**: you can open a `.terse` file in any text editor and understand what you're looking at without tooling. In LLM pipelines, auditability is a safety property, not just a convenience. When an agent misbehaves, you need to inspect its inputs. --- ## Two questions that come up **Can I use TERSE for REST API communication between microservices?** You can, but it's not the primary use case. REST APIs are consumed by many clients across different teams and languages — JSON's universal support is a real advantage there. TERSE shines where you control both ends: serializing data before sending it to an LLM, and parsing the response on the other side. **Can I use TERSE for application configuration, like YAML?** Yes — the format supports everything YAML does for config files: nested objects, arrays, typed values, comments. Worth considering if your config is also consumed by an LLM as context. --- ## What's available today The project includes: - **Formal specification** (v0.7) with ABNF grammar, conformance rules, and security considerations — published on Zenodo with DOI: [10.5281/zenodo.19058364](https://doi.org/10.5281/zenodo.19058364) - **Reference implementations** in TypeScript, Python, Java, and Go - **Live playground** where you can paste JSON and see the TERSE output in real time Everything is open source under MIT (implementations) and CC BY 4.0 (specification). --- ## Links - 🌐 **Landing page + playground**: [rudsoncarvalho.github.io/terse-format](https://rudsoncarvalho.github.io/terse-format) - 📦 **GitHub**: [github.com/RudsonCarvalho/terse-format](https://github.com/RudsonCarvalho/terse-format) - 📄 **Spec (Zenodo DOI)**: [10.5281/zenodo.19058364](https://doi.org/10.5281/zenodo.19058364) - `npm install terse-js` - `pip install terse-py` --- TERSE is still a draft — v0.7 is open for community review. If you work with LLM pipelines at scale, I'd love to hear whether this addresses a real pain point in your stack. --- *Rudson Kiyoshi Souza Carvalho — Independent Researcher*

    Tags

    llmopensourceaitoken

    Comments

    More Blog

    View all
    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠ai

    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

    Hi everyone! 👋 I’m Tara, a Senior Software Engineer and Consultant. Over the years, I've jumped...

    T
    tworrell
    Local AI Will Save Us All (The Math Says So, Trust Me)ai

    Local AI Will Save Us All (The Math Says So, Trust Me)

    Every few weeks a take goes viral in tech circles making the case for ditching cloud AI and running...

    S
    Sebastian Schürmann
    Lost in the AI Hype, I Started Smallai

    Lost in the AI Hype, I Started Small

    And it helped me get back into tech without drowning TL;DR at the end Coming back to...

    R
    Rohini Gaonkar
    Building a Replay-Tested Interactive Brokers Client in Gogo

    Building a Replay-Tested Interactive Brokers Client in Go

    I wanted an IBKR library that felt like Go and had testing I could trust. So I wrote one.

    T
    Thomas Marcelis
    Playwright in Pictures: Fully Parallel Modeplaywright

    Playwright in Pictures: Fully Parallel Mode

    Playwright’s fullyParallel mode is often treated as a simple performance switch. In practice, it...

    V
    Vitaliy Potapov
    Designing a CLI for Both Humans and Agentscli

    Designing a CLI for Both Humans and Agents

    Learn how Alpic designed its CLI for both human developers and AI agents — covering tradeoffs like polling, context windows, interactivity, and statelessness.

    J
    Julien Vallini

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.