BETA Hydrate is in beta. Register during beta and lock $5/mo Pro forever - free during beta + 1 month after v1 launches. What this means →
← Blog

Tokenmaxxing isn't productivity - and we have the receipts.

TechCrunch ran a piece on April 17 calling out tokenmaxxing: measuring developer productivity by AI budget rather than work shipped. The piece is right. We have the empirical answer, and it's in our public benchmark table.

The article, in one paragraph

Tokenmaxxing is the practice of measuring developer productivity by how many AI tokens someone gets to spend, rather than how much working code they ship. The piece cites three data sources:

Translation: AI acceptance rates that look like 80-90% on the first commit collapse to 10-30% after the inevitable revisions a few weeks later. The token budget went up, the shipped-work throughput barely did, and a lot of the "productivity" is actually churn paid for at premium rates.

What the article doesn't say

The article diagnoses the disease without prescribing a cure beyond "measure outcomes, not inputs." That's true and unhelpful in equal measure. Two questions remain:

  1. What metric replaces token spend?
  2. What do you change in the workflow so the metric improves?

The metric

Cost per shipped session. Take total dollar (or token) spend across N sessions; divide by the number of sessions that produced merged-quality work. A session that ran for 30 turns, burned $4, and ended in "I think we should clarify the requirements" counts as zero in the denominator.

This is the column we report in the benchmarks table. It's the only column that survives contact with reality.

The workflow change

Most of the churn from AI assistants comes from the same source: the model has to re-discover your project's architecture every session, then makes a different choice this time than it did last time, then a future session edits it back to the original. Multiply by ten sessions and you have an 861% churn figure with nothing to show for it.

Hydrate's response is two-part:

  1. Episodic memory. Two Claude Code hooks (UserPromptSubmit + Stop) capture every session and inject the relevant atomic facts into the next one. Architecture, naming, decisions - all carried forward, not re-derived.
  2. Authoritative canon. hydrate sync writes pinned architectural rules directly into CLAUDE.md as project instructions - the channel Claude Code treats as authoritative, not soft context. We measured a 40% → 80% canon-resistance flip just from moving rules into this channel, with no model upgrade.

The receipts

Numbers from our published benchmark (/benchmarks, run on the same Anthropic models any reader can hit):

Cell Sessions shipped Total cost Per shipped session
Haiku 4.5 baseline 2 / 7 $0.60 $0.30
Haiku 4.5 + Hydrate 7 / 7 $1.08 $0.154
Opus 4.7 baseline (simple) 7 / 7 $11.49 $1.64
Opus 4.7 + Hydrate (simple) 7 / 7 $4.92 $0.703
Hybrid (Sonnet seed → Haiku+Hydrate) 7 / 7 $1.41 $0.20

The Hybrid cell is the direct counter to the Waydev finding. Where tokenmaxxing produces "2x throughput at 10x cost," the Hybrid cell produces identical 7/7 shipping at 12% of the raw-Opus cost-per-session. Same Anthropic models; different memory layer; one architectural sync at the start of the run.

Caveats (read these)

Try it

Hydrate is in beta. Install + register locks the $5/mo Pro rate forever - and Pro features stay free during beta and for 30 days after v1 launches:

curl -fsSL gethydrate.dev/install | sh
hydrate register --edition=free --email=you@example.com

Free tier covers 2 active projects forever. The dashboard at localhost:8089/dashboard reports your own cost-per-shipped-session on your own ledger - so you can stop taking benchmark numbers (ours or anyone else's) on faith.