Microsoft killed Claude Code for 100,000 engineers. But it's not an AI problem.
The token bills are the symptom. The real problem is something that nobody is talking about.
On May 23, Microsoft told its Experiences and Devices division to stop using Claude Code by June 30. That’s the group that builds Windows, Office 365, Teams, and Surface. Around 100,000 engineers lost access.
The official reason was toolchain consolidation. Move to GitHub Copilot CLI instead.
The real reason came out through The Verge, Windows Central, and subsequent reporting: the token bills were unsustainable. Microsoft had rolled out Claude Code to this division in December 2025 as a “learning exercise.” Six months later the annual AI coding budget was gone.
This is not a Microsoft story. It’s an industry story that Microsoft made visible.
In April, Uber’s CTO disclosed that Uber burned through its entire 2026 AI coding budget in four months. Claude Code adoption inside Uber went from 32% to 84% of roughly 5,000 engineers between December and March. Engineers were spending $500 to $2,000 a month each on tokens. His quote: “I’m back to the drawing board because the budget I thought I would need is blown away already.”
GitHub paused new sign-ups for Copilot Pro and Pro+ in November 2025 because its own heaviest users were costing more than the subscription price covered. NVIDIA’s VP of Applied Deep Learning told Axios in April: “For my team, the cost of compute is far beyond the costs of the employees.”
Why the budget math broke
Enterprise software has always been priced per seat. Token pricing doesn’t work that way. The cost is determined by how much the model reasons, not how many people use it.
A single Claude Code session averages 47 tool calls. A regular chat exchange is one. That’s roughly 50x the inference demand. And the same task can cost 30x more or less depending on how the agent decides to approach it. You can’t build a quarterly forecast on that kind of variance.
Token prices fell roughly 280x over the past two years. Enterprise AI spend went up 320% over the same period. Cheaper tokens didn’t lower the bill. They enabled more loops, longer sessions, more parallel work.
The cost problem is a symptom
DORA 2025 surveyed roughly 5,000 respondents and found that individual developer productivity with AI tools is up significantly. Organizational delivery metrics (deployment frequency, change failure rate, time to restore) are flat or declining. A 25% increase in AI adoption correlated with a 1.5% drop in delivery throughput.
Developers are faster. Organizations are not shipping faster. That gap is where the money is actually being wasted.
The core problem is not token pricing. It’s giving thousands of engineers expensive autonomous tools with no coordination layer between what those tools produce and what the organization ships. Each engineer runs their own agentic sessions in isolation. The agents don’t know what other agents are doing. They don’t know the architecture. They don’t know what broke last time someone touched this service. They start every session as a stranger.
The token waste is a side effect. Agents loop because they lack context. They redo work other agents already did. They produce changes that get rejected in review because they didn’t know the constraints. Every wasted cycle is billed.
Moving 100,000 engineers from Claude Code to GitHub Copilot doesn’t fix this. The agents are still stateless. The engineers are still uncoordinated. You just moved the bill from Anthropic to Azure.
What Stripe figured out
Stripe merges over 1,300 AI-written pull requests per week. No budget crisis. No flat delivery metrics.
The difference is not the model. The difference is that Stripe spent years building coordination infrastructure before turning on the AI firehose: 10-second reproducible devboxes, 3 million automated tests, 400+ internal MCP tools that give agents organizational context, a merge queue that keeps main green under high volume.
Their agents don’t start as strangers. They get context before they write a line of code. Their output is validated by infrastructure, not just human review. The whole thing is coordinated.
Most organizations are trying to get Stripe’s AI velocity by giving engineers better AI tools. That’s the wrong lever. Without the coordination layer, better tools just produce more uncoordinated output faster.
The real math
AI coding tools give you 30-50% individual velocity gains. But AI-generated PRs create 1.7x more issues per review, wait 4.6x longer before a human picks them up, and have a 32.7% acceptance rate. Code churn has more than doubled since 2020.
The velocity gains are consumed by rework, review overhead, and duplication. The net delivery improvement at the org level is close to zero.
The fix is not cheaper compute. It’s a coordination layer that gives agents organizational context, routes work by risk, prevents duplication, and enforces policy at the infrastructure level. That is what turns individual productivity gains into organizational delivery. Without it, you are paying for velocity that never arrives as shipped software.
The coding tool makes the engineer faster. The coordination layer makes the organization actually ship what the engineer produced. The second problem is bigger, and it’s what the Microsoft and Uber stories are really about.
I wrote a follow-up on what the solution looks like, and why it can save organizations significantly more than the AI coding tools themselves generate: The AI engineering coordination layer: why it’s worth more than your coding tools

