YAML vs JSON for LLM Token Efficiency - The Minification Truth

Pretty-printed JSON uses 101 tokens, YAML uses 133, but minified JSON only uses 41 - the comparison everyone gets wrong

2 minute read

Core Insight

The "YAML saves tokens over JSON" claims are misleading because they compare pretty-printed JSON (with whitespace) against compact YAML. Minified JSON is just as efficient or more efficient than YAML.

The Original Claim

Medium article claims JSON uses 96 tokens vs YAML's lower count: https://medium.com/better-programming/yaml-vs-json-which-is-more-efficient-for-language-models-5bc11dd0f6df

The Real Numbers

Using OpenAI Tokenizer on identical data:

Pretty-printed JSON: 101 tokens
YAML: 133 tokens
Minified JSON: 41 tokens

Why This Matters

JSON doesn't require whitespace - spaces and newlines are optional noise that inflate token counts artificially.

YAML is whitespace-dependent by design - indentation and newlines carry semantic meaning. You can't "minify" YAML the same way.

Comparing pretty-printed JSON to YAML isn't a fair test. The only valid comparison is minified JSON vs YAML.

Test Data Example

YAML (133 tokens)

user:
 id: 123
 name: Alice
 email: alice@example.com
 is_active: true
 roles:
 - admin
 - editor

project:
 id: 456
 title: Knowledge Base Migration
 status: in_progress
 tags:
 - migration
 - docs
 - backend
 tasks:
 - id: 1
 description: Export old database
 completed: true
 - id: 2
 description: Transform schema
 completed: false
 - id: 3
 description: Import into new system
 completed: false

Minified JSON (41 tokens)

{"user":{"id":123,"name":"Alice","email":"alice@example.com","is_active":true,"roles":["admin","editor"]},"project":{"id":456,"title":"Knowledge Base Migration","status":"in_progress","tags":["migration","docs","backend"],"tasks":[{"id":1,"description":"Export old database","completed":true},{"id":2,"description":"Transform schema","completed":false},{"id":3,"description":"Import into new system","completed":false}]}}

Counter-Argument: Training Data Similarity

Noah Larratt's point: "LLMs perform well when content looks like training data" - using whitespace as heuristic for quality output.

This is worth testing, but token efficiency and output quality are separate concerns. If you're optimizing for tokens, minify. If you're optimizing for output quality based on training distribution, test both.

Bottom Line

Don't cargo-cult optimization advice. If you're using pretty-printed JSON, minify it - you'll save 30+ tokens in examples like this. Test your specific use case rather than accepting blanket claims about format efficiency.

Notes

If you're asking in a browser want to be able to read it, ask whatever LLM you're using for it back minified in a markdown block and then copy/paste that into something to prettify it
If you're asking in a CLI or a programmatic way, just pretty print it for yourself. Don't ask the LLM to give you back pretty printed JSON data just because you want to be able to read it.

First Cohort

No Coding Experience Required

Build Your Website with AI—No Code Required

Learn to create and deploy professional websites using ChatGPT and Claude. Go from complete beginner to confident website builder.

Start Building Today

Why This Matters

JSON doesn't require whitespace - spaces and newlines are optional noise that inflate token counts artificially.

YAML is whitespace-dependent by design - indentation and newlines carry semantic meaning. You can't "minify" YAML the same way.

Comparing pretty-printed JSON to YAML isn't a fair test. The only valid comparison is minified JSON vs YAML.

Test Data Example

YAML (133 tokens)

user: id: 123 name: Alice email: alice@example.com is_active: true roles: - admin - editor project: id: 456 title: Knowledge Base Migration status: in_progress tags: - migration - docs - backend tasks: - id: 1 description: Export old database completed: true - id: 2 description: Transform schema completed: false - id: 3 description: Import into new system completed: false

Minified JSON (41 tokens)

{"user":{"id":123,"name":"Alice","email":"alice@example.com","is_active":true,"roles":["admin","editor"]},"project":{"id":456,"title":"Knowledge Base Migration","status":"in_progress","tags":["migration","docs","backend"],"tasks":[{"id":1,"description":"Export old database","completed":true},{"id":2,"description":"Transform schema","completed":false},{"id":3,"description":"Import into new system","completed":false}]}}

Counter-Argument: Training Data Similarity

Noah Larratt's point: "LLMs perform well when content looks like training data" - using whitespace as heuristic for quality output.

Notes

If you're asking in a browser want to be able to read it, ask whatever LLM you're using for it back minified in a markdown block and then copy/paste that into something to prettify it

If you're asking in a CLI or a programmatic way, just pretty print it for yourself. Don't ask the LLM to give you back pretty printed JSON data just because you want to be able to read it.

YAML vs JSON for LLM Token Efficiency - The Minification Truth

Core Insight

The Original Claim

The Real Numbers

Why This Matters

Test Data Example

YAML (133 tokens)

Minified JSON (41 tokens)

Counter-Argument: Training Data Similarity

Bottom Line

Notes

Build Your Website with AI—No Code Required

Ask me anything

YAML vs JSON for LLM Token Efficiency - The Minification Truth

Core Insight

The Original Claim

The Real Numbers

Why This Matters

Test Data Example

YAML (133 tokens)

Minified JSON (41 tokens)

Counter-Argument: Training Data Similarity

Bottom Line

Notes

Build Your Website with AI—No Code Required

Ask me anything