TOON vs JSON for LLMs: Don't Rip Out JSON Just Yet

TOON (Token-Oriented Object Notation) has been getting a lot of attention. It’s a compact, human-readable format designed specifically for passing structured data to LLMs as a drop-in replacement for JSON.
The core claim is simple: TOON can save ~30–60% tokens vs formatted JSON for structured, tabular data.
Where It Gets Interesting (and Messy)
- Most pre-training corpora and tool-usage traces the big models have seen are JSON, XML, YAML, CSV, HTML — not TOON. You’re effectively asking the model to generalize to a new, sparsely-seen notation.
- Benchmarks are mixed: some “official” tests show small gains vs JSON, while independent experiments report equal or worse accuracy, especially on more nested data.
- There are no peer-reviewed papers or formal workshop studies yet that compare TOON vs JSON at scale.
My Current Stance
Treat TOON as a niche optimization, not a universal replacement. Be extra cautious where accuracy matters more than a 30–40% context reduction. And as always: evals and A/B tests are your best friends here.