Why the model bill is only part of the cost
The model bill is rarely the biggest cost of running an agent in production. It's often not even in the top three. That sounds unreasonable right up until a team does the math properly, at which point it becomes uncomfortable.
On a recent MLOps Community podcast episode, Rani Radhakrishnan from PwC made a version of this point, and it's worth pulling apart because the pre-launch ROI math for most production agents is wrong in a specific, predictable way.
The calculation teams usually do before deploying runs something like this. Estimate the headcount the agent replaces or avoids hiring. Subtract the projected model spend. Report the difference as savings. If the number comes out positive the project gets funded, and for a lot of deployments it does come out positive on paper.
The calculation six months later is harder to run because the costs are scattered across cost centres that were never designed to be added together. Total monthly LLM provider bill. Vector DB line on the AWS bill. Retrieval traffic. A percentage of a platform engineer's time. Some fraction of the support lead's week spent reviewing agent outputs. The eval harness running nightly. The Langfuse or Logfire seat. The occasional sprint of work to update prompts and tool definitions when a downstream system changes.
Of those, the reviewer time is usually the most consequential and the least tracked. An agent that handles seventy percent of inbound support tickets reduces support volume by seventy percent, which is roughly what it says on the tin. What it doesn't do is eliminate a person. It redirects one or two people from handling tickets to reviewing agent output for the edge cases where getting it wrong is costly, plus routing escalations, plus keeping the eval set current when new ticket categories emerge, plus running the weekly review of where the agent is drifting. Those people don't disappear off the cost line. They move from being support staff to being AI ops staff, usually at similar or higher loaded cost. The pre-launch spreadsheet assumed they'd either leave or be redeployed to revenue-producing work. In practice they mostly do neither, because the agent creates work that didn't exist before it shipped, and that work has to be done by someone who understands both the business process and the system. That person is the one who was doing the business process.
This is the point Rani was getting at when she said the ROI math isn't headcount in minus agent cost out. It's the cost of building the system, plus the reviewer, plus the retrieval and storage on the back end, plus the change management around the process, all compared to what the work cost before. None of the bottom three line items existed on the previous cost line when humans did the whole job.
There's a second problem that compounds with reviewer time, which is that the per-task cost of running an agent isn't usually what teams think it is. The sticker price for a model call is public - a few cents for a Haiku-class model, tens of cents for a Sonnet or GPT-class model once the context window gets moderately loaded. Production agent tasks don't run as single model calls. A realistic agent task triggers retrieval, often a reranker, a planning call, two or three tool calls with their own context windows, and then a synthesis call. Per-task token consumption at P99 can be ten to twenty times the P50. Monthly cost tracking usually reports the mean, which hides the tail, and the tail is where the month's budget goes.
Teams that catch this usually catch it by accident. A specific workflow spikes, someone investigates, and the trace shows a handful of edge cases looping on tool calls or retrying degraded responses. The fix is usually cheap - recursion caps, tool call budgets, a routing layer that sends simpler tasks to smaller models. The controls all exist. None of them get built if no one was watching the right number in the first place, and the right number isn't aggregate spend.
The implication isn't that agents don't pay off. Plenty do. It's that the ROI case that gets a project funded is almost never the ROI case that survives six months of production, and the gap between the two is mostly about work that moved rather than work that disappeared.
The more useful question to answer before shipping a production agent is narrower. For the specific process being agentified, what was the end-to-end cost before, fully loaded, including the parts of the existing process that wouldn't show up on a timesheet - and what is it after, including all the parts of the new system that exist because the agent is now in the loop. That's a harder number to produce, and it's usually smaller than the pre-launch version by some margin. It's also the one the CFO will eventually ask for.
Explored further with Rani Radhakrishnan on the MLOps Community podcast and Spotify MLOps Community Podcast.

.png)
.png)
.png)
.png)
.png)