THE INFORMED LAYER
Thought Experiment

The $500m token bill, as a distribution

Axios reported that one company accidentally spent half a billion dollars on tokens in a single month, after leaving Claude licences uncapped. What size of company would that take, and how much would they have to use? For an org of 10,000 engineers, that's $50k per engineer per month — but of course in every org that size there are power users and slower users. What might the distribution look like?

Assumptions

Floor and skew set the per-engineer shape. The total sets the headcount and the peak.

Mean sits at percentile (n)85
Sets the tail index parameter α = 1.30. A Pareto mean sits above the 63rd percentile, so n starts there. The higher the percentile, the more right-skewed the distribution, i.e. the more powerful the top power user compared to the rest.
Floor — low-tail user$100/wk
The minimum monthly token spend of any engineer.
Total monthly spend$500m
The figure we're testing. Log scale, $1m → $2b.
Implied headcount
269,632
engineers required at this total & distribution
Mean / engineer
$1.9k
per month
Median / engineer
$0.7k
the typical user
Implied top user
$6.3m
per month, the expected biggest spender
Hard to reconcile
Monthly spend per engineer, log axis. The curve starts at the floor and trails right; the mean lands far above the median.
Floor (assumed min) Median Mean Implied top user $150k (highest reported)

Token conversion

Opus 4.7 at standard API rates: $5/M input, $25/M output. On the Claude for Enterprise plan the seat fee buys access only — every token (chat, Claude Code, Cowork) bills at these rates on top, with no cap unless an admin sets one. The blended rate depends on how much of the spend reads context versus writes output.

Input : output mix8 : 1
Token volume ratio. Agentic coding leans on input: large context, re-sent codebases. So 8:1 to 20:1 is the realistic band. Blended rate $7.22/M.
Tokens this month
69 trillion
at this spend and mix
Run rate
27M
tokens / second, all month
Blended price
$7.2
per million tokens

Against some reference corpora

Log-scaled bars, so the multiples matter more than the lengths. Counts aren't strictly comparable across tokenisers — Opus 4.7's differs from the one behind the training-corpus figure — so treat them as orders of magnitude. Wikipedia at ~5.2 billion words; training corpus per Llama 3.1 ~15 trillion tokens.

Per developer, in Linux kernels

The kernel source runs to about 40 million lines, near 350 million tokens. Converted at the blended rate, this is what the distribution's typical and heaviest users get through each month.

Typical developer
29%
of the kernel / month
Heaviest user
2,488×
the whole kernel / month