CROSSPOST: NOAH SMITH: How Much More Software Do We Really Need?
“Not much of the kinds that we are used to”, says Noah Smith. His subhead: probably a lot, but not necessarily the kinds people have made money on so far. It is not that scaling laws are nearly...
“Not much of the kinds that we are used to”, says Noah Smith. His subhead: probably a lot, but not necessarily the kinds people have made money on so far. It is not that scaling laws are nearly exhausted for the machines. It is that the scaling law by which the machine’s being able to do more meant that it was worthwhile to let it command our attention and take over our work has hit rapidly diminishing returns..
CROSSPOST: NOAH SMITH: How much more software do we really need?
Probably a lot, but not necessarily the kinds people have made money on so far.
Jun 02, 2026
∙ Paid
So, Anthropic is going to IPO! The company is valued at almost $1 trillion, so this is going to be one of the biggest IPOs in history — the only other competitor being SpaceX, which is also set to go public soon. It’ll be one of the largest wealth creation events in history — the company’s seven founders are each going to be worth almost $20 billion, and regular employees will be worth in the millions to tens of millions. So much for my chances of buying a house in San Francisco!
Whether Anthropic is worth this valuation is not the topic of this post, but I guess it’s interesting to touch on. Anthropic is showing more impressive revenue growth than any company in history, having recently blown past OpenAI to an annualized rate of about $45 billion per year. Worries that the company would be unprofitable have been blown away by this hypergrowth — Anthropic is about to turn its first operating profit.
In fact, I think the price being offered for Anthropic is pretty conservative. A multiple of 20x annualized revenue really isn’t that expensive for a company growing at 130% a quarter. Obviously that’s going to level out at some point soon, but it would take only a little over one more year of that sort of growth for Anthropic to be priced like a value stock. The cautious pricing probably reflects the danger of competition, both from OpenAI and from the cheap Chinese open-source models perpetually nipping at the leaders’ heels.
The reason for Anthropic’s meteoric rise, of course, is the success of coding agents. For years, OpenAI had struggled to find a market for its state-of-the-art chatbots; everyone was wowed by the technology, and everyone used it, but people couldn’t figure out how to get it to produce lots of economic value. Anthropic basically solved that problem by being the first to invent usable coding agents — AIs that write software on their own. Claude Code, Anthropic’s agentic software, gained a huge amount of brand value, even though OpenAI’s Codex product is competitive in terms of quality.
This was true product-market fit. AI had already proved that it worked in terms of the underlying technology — probably around 2024, when reasoning models cut down on the hallucination problem. Now it had found its killer app — the equivalent of e-commerce and search for the internet, or spreadsheets and word processing for computers. Suddenly, everyone in the world was “tokenmaxxing” — trying to use coding agents as much as humanly possible.1
I first encountered this trend at a dinner event on the economics of AI (I go to a lot of those dinners these days). An entrepreneur at the dinner breathlessly told me and a couple of other attendees that he ordered his employees to “spend their salary in tokens” — that is, to create so much code with Claude Code and Codex that it cost as much as their entire paycheck. I remember asking him: “What are they using all those tokens to create?” I don’t think I got a straight answer; I’m not sure he knew.
He wasn’t alone, though. Plenty of companies encouraged their employees to use AI coding agents as much as possible. Meta even briefly had a leaderboard for who could use the most tokens. One company reportedly spent half a billion dollars on Claude Code — equal to one percent of Claude’s annualized revenue!
Reading these reports, I just kept wondering: What are all these tokens actually producing? Just like with that guy at dinner, there never seemed to be a clear answer. Were Amazon and Meta and other software companies rolling out new features? Not that I’ve seen. A lot more apps are being submitted to the App Store, but I’ve only heard of one good one (Refine.ink). I’m sure there are more out there, but so far it’s nothing like the early days of the smartphone, where I was hearing about cool new apps every couple of weeks.
Maybe it was all on the back end? I’m not a software guy, so I don’t have a proper grasp of how hard it is to make a website like Instagram run, or optimize the cloud servers at AWS. Sites and apps aren’t loading faster or obviously more reliable. Was advertising getting better? Are click-through rates improving? Were companies fixing their long-standing problems, taking care of “tech debt” so they can avoid paying large costs in the future? Maybe!
I kept quiet about these questions, since it’s not really my area of expertise. But I saw a lot of other people — people who know a lot more than I do about software engineering — asking similar things. John Loeber wrote:
The stuff I’m hearing is just insane. People are spending hundreds of thousands of dollars a month on tokens? Guys, what are you shipping?…I am seeing people fully enraptured by illusions of productivity. They have swarms of agents coordinated by Byzantine Octopus harnesses. They’re munging thousands of tokens a second. They’re doing all this stuff, churning unfinished marginalia faster than ever before. Spinning their wheels and shipping absolutely jack shit for their customers…[W]e’re getting a lot of utility from AI for engineering at our company. I think we would really struggle to burn more than $5K per engineer per month.
Uber COO Andrew Macdonald said it wasn’t yet possible to draw a link between raw AI usage and useful products actually being shipped:
“That link is not there yet, right?” [Macdonald] said. “I think maybe implicitly there is more that is getting shipped, but it’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25% more useful consumer features.’”...He said that the trade-off costs from AI are harder to justify because he can’t draw a direct link.
Microsoft, meanwhile, began canceling Claude Code licenses. Salesforce started redesigning their employee targets to measure real output instead of AI input. And people who looked into the matter basically confirmed the suspicion that a lot of this AI coding wasn’t going into actual products being shipped:
For companies using advanced AI coding tools, only 18% of spending on tokens is translating into shipped coding products that reach real users, according to EntelligenceAI, a startup that aggregated data on more than 2,000 companies using advanced AI tools for coding.
Jellyfish, a company that tracks AI usage, found rapidly diminishing returns in terms of converting tokens to actual software.
You should absolutely NOT take this to mean that AI is a bubble, or that the tech doesn’t actually work, or that Anthropic’s IPO is overpriced, etc. A lot of this is perfectly normal. When a very capable new general-purpose technology bursts onto the scene — steam power, electricity, computing, the internet, etc. — a ton of people play around with it to see how it works and experiment with how they might be able to use it. That experimentation is healthy, and we shouldn’t expect it to last forever.
It’s also reasonable for companies to push their software engineers to try something radically new. Most professionals who have written code by hand all their lives will naturally be reluctant to switch over to letting a machine take the first crack at it. Rewarding AI usage for its own sake is silly in the long run — it’s just as subject to Goodhart’s Law as anything else, and it predictably resulted in people checking the weather with AI just to hit their targets. But in the short run, it could be good to shove stodgy old engineers out of their comfort zone.
But I also think there are two more interesting things that are potentially going on here:
Companies are finding out, once again, that turning task-level productivity into economic productivity is a lot harder than it looks. This has implications for the big “AI and jobs” debate, upon which the shape of our future society could hinge.
It’s very possible that the software industry as we know it is a mature industry, like steelmaking or internal combustion. If AI creates major improvements in software, it’s possible — even likely — that it’ll be in new types of software industries instead of just “better Facebook and Amazon”.
Tokenmaxxing versus bottlenecks…
Brad here: That is where the free portion cuts off. A teaser of bullet points assembled from the rest of the article, most of which is very smart:
AI-driven automation faces “weak links” & computing power is not productivity: output is ultimately constrained by the last, least-automated tasks: having 100 million times 1970s computing power has not made individual workers 100 million times more productive.
Within firms, task automation hits the same ceiling: even spectacular acceleration of coding does not explode total corporate productivity because downstream tasks remain human bottlenecks.
The “consumer internet” frontier looks saturated: with both users and their daily attention largely maxed out, new consumer software mostly displaces incumbents rather than expanding total usage—and delivers marginal benefits (at best, given the costs of attention-hacking - b.) in terms of real human utility.
Robotics & radically reconfigured business processes could justify tokenmaxxxing: controlling the physical world and reshaping firms around what AI can do offer far higher long-term upside than another social app.
Brad here: I think Noah is broadly right. Natural-language interfaces to structured and unstructured databases are wonderful—but marginal in the scale of the economy as a whole. very large-scale, very big-data, very high-dimension, very flexible-function classification, regression, and prediction analyses are also wonderful, and when ill and able, valuable things we cannot now understand. How valuable we do not really know. The key role played by electricity in the shift from the applied science to the mass production mode of societal organization was not something that anyone could have predicted in 1890.
In many ways, history tends to fool us with respect to our ability to predict the future. It is a fact that the things that happen almost always happen in the least unlikely way possible: hence retrodiction is relatively easy. But it is also a fact that the things that do manage to happen are almost always very unlikely things: hence prediction is impossible, and the lesson of history is that long-term success requires prioritizing robustness and optionality.
Back up: Back when ChatGPT emerged, my view quickly became that I had to stay on top of these technologies. Precisely because the fact that natural language interfaces fit all of our psychological affordances so well meant that they would be a very, very big deal, and yet that people would have a very, very, very hard time successfully coming to terms with their powers and limitations. So far so good—that is how it is going. Programming simulations and broad search-and-summarization and database transformations and indexations is now five times easier for me than it was back before LLMs, hence I find myself doing a lot more in the way of quick-and-dirty simulations, broad search-and-summarization, and launching database transformation and indexation tasks. Otherwise, however, the current state of the MAMLMs is a lot like the current state of the driver-assist features of the Volkswagen. I am:
always terrified that the automatic braking and lane keeping and distance following features are going to turn themselves off,
and driving has become a much more cognitive intensive and challenging task than it used to be when I let the systems take the lead,
because I have to: (1) figure out how I would be driving in this situation, (2) compare that to what the machine is doing, and (3) figure out which of us is actually more right.
It makes driving at the speed limit much more cognitively demanding and interesting. But is a slight minus in that I am actually less able to think about other things, even in the background, while driving. Similarly, nursing and maintaining the MAMLMs so that they stay on task and do not go bonkers at ,midnight:
All that churns up about as much time and cognitive load as they save in terms of things that I no longer have to spend time doing—with, as I said, the exceptions of quick-and-dirty simulation, broad search-and-summarization, and database transformation and indexation tasks. Now I have to take a look and see at what the machine actually did with the 84,468 URLs that its LLM heart decided that it was worth querying the WayBack Machine for over the past week. I probably am going to be happy I used my dark silicon underneath the dining room side table to do that, but it will take some work on my part to see.
That is the current state of things. It will change. But how much will it change, and in what directions?







What I left on Noah’s substack - a conversation with Gemini about this topic: https://g.co/gemini/share/61515dbc43fd