The $500m token bill, as a distribution

Axios reported that one company accidentally spent half a billion dollars on tokens in a single month, after leaving Claude licences uncapped. What size of company would that take, and how much would they have to use? For an org of 10,000 engineers, that's $50k per engineer per month — but of course in every org that size there are power users and slower users. What might the distribution look like?

Assumptions

Floor and skew set the per-engineer shape. The total sets the headcount and the peak.

Mean sits at percentile (n)85

Sets the tail index parameter α = 1.30. A Pareto mean sits above the 63rd percentile, so n starts there. The higher the percentile, the more right-skewed the distribution, i.e. the more powerful the top power user compared to the rest.

Floor — low-tail user$100/wk

The minimum monthly token spend of any engineer.

Total monthly spend$500m

The figure we're testing. Log scale, $1m → $2b.

Implied headcount

269,632

engineers required at this total & distribution

Mean / engineer

$1.9k

per month

Median / engineer

$0.7k

the typical user

Implied top user

$6.3m

per month, the expected biggest spender

Hard to reconcile

Monthly spend per engineer, log axis. The curve starts at the floor and trails right; the mean lands far above the median.

Floor (assumed min) Median Mean Implied top user $150k (highest reported)

The same dollars, in Opus 4.7 tokens

Token conversion

Opus 4.7 at standard API rates: $5/M input, $25/M output. On the Claude for Enterprise plan the seat fee buys access only — every token (chat, Claude Code, Cowork) bills at these rates on top, with no cap unless an admin sets one. The blended rate depends on how much of the spend reads context versus writes output.

Input : output mix8 : 1

Token volume ratio. Agentic coding leans on input: large context, re-sent codebases. So 8:1 to 20:1 is the realistic band. Blended rate $7.22/M.

Tokens this month

69 trillion

at this spend and mix

Run rate

27M

tokens / second, all month

Blended price

$7.2

per million tokens

Against some reference corpora

Log-scaled bars, so the multiples matter more than the lengths. Counts aren't strictly comparable across tokenisers — Opus 4.7's differs from the one behind the training-corpus figure — so treat them as orders of magnitude. Wikipedia at ~5.2 billion words; training corpus per Llama 3.1 ~15 trillion tokens.

Per developer, in Linux kernels

The kernel source runs to about 40 million lines, near 350 million tokens. Converted at the blended rate, this is what the distribution's typical and heaviest users get through each month.

Typical developer

29%

of the kernel / month

Heaviest user

2,488×

the whole kernel / month