We've been focused on how AI is accelerating the creation (building) of software, substituting expensive development tools for more expensive human experts. The next looming economic challenge is at runtime – where our customers rack up huge token costs for us as they take advantage of our latest AI features. As activation scales up, we will owe our AI vendors hefty operational fees. This raises some classic product/engineering challenges.
(Shardul Mehta provided a really good prompt here.)
Recent non-AI examples are MoviePass, which lost more money as its customers went to see more movies. And Spotify, which pays most of its revenue to artists – even at scale.
Example: We're a company with portfolio management software for professional investors. Traditionally, we've focused on administration and record-keeping: tracking trades and price histories, profit/loss calculations, tax impact estimation. Our R&D team is now using AI tools to build our (traditional, deterministic) products faster and maybe cheaper. We pay Anthropic and Amazon and Google and Microsoft for that.
But we also want to add some AI-driven analytic features to let our users make smarter investment choices. Such as "do detailed historical return comparison on these different portfolio models and ETF bundles." Now we'll incur runtime costs each time they do analysis within our product. Very smart (or very inexperienced!) users could easily spend millions of our tokens. Which we also buy from our AI model vendors.
This will chop our traditional SaaS margins, and we don't want to go bankrupt giving away expensive tokens to fixed-price subscribers. So it's worth some operating-cost thoughts.
An Old Lesson
During the early 90's database wars, I was at Sybase – which had an elegant technical solution to an emerging systems problem. Queries could be very slow, expensive, and poorly written – sometimes running for hours and not deliver the desired results. While consuming lots of system resources. Sybase pioneered "stored procedures" which let our customers "compile" frequently used queries for more speed and efficiency. For instance, stored procedures enabled programmatic stock market trades – letting algorithms make buy/sell choices much faster and more precisely than humans at screens. SQL databases were trendy and exciting, but dramatically faster queries saved massive system resources. (And created new opportunities.)
Pulling that observation forward… we should see classic engineering and product approaches reappear as ways to manage AI runtime costs.
Context-Specific Leverage
If we're focused on specific customers solving a narrow set of hard problems, we can provide AI content that reduces the need for generalized learning. Our portfolio software company should pre-computing decades of ROI data on the top 5000 securities (to save tokens on doing it again); create skills or scripts for portfolio design and correlation computations; and pre-load the most important academic papers about risk-adjusted return. What else would make our context smarter and cheaper to operate?
Optimizing Happy Paths
If well-bounded, our new features should lead users to a few high-volume activities that add lots of value. That could include "calculate five-year correlation and covariance between these two investment types." Clarity here lets our brilliant engineers address the slowest steps of our most frequent asks.
"Claude: representing a software architect, look at the last 50k requests and make suggestion for ways we could reduce the token cost of calculating correlations among asset classes. Include ideas around caching, initial data values, precomputed results, parallel computation, logarithmic estimation, and Monte Carlo simulations… explicitly consider accuracy versus token cost trade-offs. Cite instances where these strategies have improved performance in previous generations of commercial application software."
Throttling Usage
Naive users may run into predictable (and expensive) problems. Some legitimate corner cases may be a bad fit for our app – or be very expensive to run. And some activities may be inappropriate, so we'll ban them our Terms & Conditions.
It's easy for new users to spin up unbounded-and-useless research requests. "Do a correlation analysis of all publicly trade stocks and commodities worldwide against the top 100 US federally provided economic indicators. Group by industry and geography and ownership structure and profitability and years since founding." That query will run forever. So we'll add a throttle that pauses requests at 90k tokens and notifies the user.
We don't intend this platform to execute real-time trades, especially since we're not registered with the SEC or licensed to do that. So let's explicitly exclude requests to trade. Likewise, let's block inquiries for non-public information that would get everyone into regulatory trouble.
Like mobile carriers, we can set a very high price on huge overages. If we expect reasonable users to consume 2M tokens/month, an exorbitant price >100M tokens/month would punish the few massive abusers (or drive them away) without upsetting our best customers.
Launch an agent to identify abuse thresholds, estimate elasticity, and model pricing strategies.
Anticipating Problems (and Features)
Some of our users are more creative than we are. With better tools, we can now use that to our advantage. What should we be productizing next quarter?
"Claude, group this month's AI requests from our 25 most active clients into themes. Summarize each theme including likely reasons for those requests, identify those that are trending positive, and estimate average token cost per request. Suggest which would be good additions to our offering."
Sound Byte
For the first time in a generation, the cost of running software will start to dominate the cost of building software. Let's act strategically instead of being surprised.