You Won’t See Failure First. You’ll See Cost

The Alarm That Didn’t Go Off

In the early 2000s, during my first week as the sole Oracle DBA at BuildOnline – a document management SaaS company serving the construction industry – I discovered what a production database crisis actually looks like. The pagers went off. The monitoring screen in the corner of the room turned crimson. Engineers converged. My boss’s face cycled through its own alarming colour palette.

The problem was quickly identified: application users had been hitting search timeouts and, in frustration, had started hammering the refresh button – resubmitting the same failing queries over and over until the service buckled completely under the accumulated load. Everyone in that room knew exactly what was happening. The dashboard said so. The pagers had said so. The instinct to converge, to diagnose, to act – it was all already running before anyone had fully understood the problem.

As it happened, the automatic clustering software eventually rebooted the database onto its standby node, killing every in-progress query at a stroke. This forced all the users to log back in simultaneously – and since many of them were construction workers who had to retrieve their passwords from a post-it note on the side of the terminal, the frantic search traffic simply stopped. The database problem resolved itself. I was briefly celebrated as a hero by colleagues who didn’t know enough to ask questions. My ops colleagues said nothing. There is an unspoken rule in operations teams: never let anyone outside the team know just how fragile things actually are.

That image – the crimson screen, the converging team, the unmistakable signal demanding immediate response – is the mental model most of us carry for how complex systems fail. Something breaks. Something announces it. People respond.

Agentic AI is introducing a different failure mode. Nothing breaks. Nothing announces itself. But something has changed – and the first signal is a line in the cloud bill.

The Wrong Cost Story

The current conversation about AI costs is dominated by a specific story: token consumption. Model API fees. The cost of inference at scale. That story is real – Uber reportedly burned through its entire 2026 AI budget in four months, driven by engineering teams adopting agentic coding tools faster than anyone had forecast. That is a genuine and significant cost challenge.

It is also not the story this article is about.

The costs this series has been tracking land somewhere different. As examined in the hidden cost of letting AI agents query live systems, the pressure agentic workloads place on data infrastructure doesn’t show up in the AI budget at all. It appears in the infrastructure bill – the database tier, the compute layer, the autoscaling events. These costs are driven by changed load profiles, application layer pressure and overprovisioning. They are not token costs. They are owned by a different team, reviewed by a different process, investigated through a different lens.

That separation matters, because it means the signal arrives in the wrong place.

The Green Dashboard Problem

The load shape that agentic workloads impose on database infrastructure has been a thread running through this series since AI agents don’t just add load – they change its shape. Agents don’t query databases the way applications do. They query for context, for history, for state – repeatedly, unpredictably and without the natural rate-limiting that human interaction imposes. The application layer was designed around human pacing: the pauses, the abandoned workflows, the natural gaps between requests. Agents remove those gaps. Under sustained agentic load, that layer doesn’t fail cleanly. It throttles erratically. It passes pressure downstream. It absorbs demand without surfacing errors.

None of that registers as an incident. Queries complete. Latency holds within SLA. The monitoring dashboard stays green, for now. But underneath that green surface, the capacity model is being violated in ways it wasn’t designed to detect – as explored in detail when looking at what happens when agents connect to infrastructure sized for humans.

The responsible infrastructure response to an uncertain load envelope is overprovisioning. Reserve more capacity than you currently need. Scale earlier than the thresholds strictly require. That response is correct – and it is also expensive. Overprovisioning shows up as cost before it shows up as anything else.

The bill rises. Nothing is broken.

The Signal in the Wrong Queue

There is a second mechanism at work, quieter still. Agents operating on inherited or ambient credentials don’t trigger identity alerts – but they do generate database activity at patterns that drive autoscaling events. Without human intervention in the loop, feedback loops that would once have been damped by human hesitation sustain load in ways that short-burst human sessions never do. The audit trail accumulates gaps that nobody has flagged as a problem. Aurora read replicas scale out. Connection pools expand. Provisioned IOPS quietly breach their baseline.

Billing systems capture all of this before monitoring systems do. The cost anomaly is the earliest signal available.

The dangerous property of this situation is not the cost itself. It is how cost anomalies are investigated. A performance alert triggers incident response – immediate, cross-functional, focused on restoration. A cost anomaly triggers a finance review: a tagging audit, a conversation with the cloud account team, a question about whether someone left a development environment running. The response cadence is calibrated for misconfigurations and runaway jobs. It is not calibrated for shifts in the underlying architecture that are operating exactly as intended.

By the time the investigation arrives at the actual cause – agentic load patterns that have quietly reshaped the infrastructure envelope – the cost baseline has typically already reset upward. The conditions producing the anomaly have become the new baseline. The anomaly is no longer anomalous.

The Budget That Doesn’t See the Whole Picture

There is a third property worth naming, because it determines why this problem persists rather than self-correcting.

The cost and the benefit land in different parts of the organisation. The team experiencing rising database infrastructure costs is not the team celebrating AI productivity gains. Those gains are real – development velocity is up, cycle times are shorter, engineers are more productive. The cost of that productivity is landing in infrastructure budgets, measured by infrastructure teams, investigated through infrastructure processes. Nobody is looking at the full picture, because it spans organisational boundaries that cost accounting rarely crosses.

This is also where the economics of agentic AI become distinct from everything that came before. Ben Thompson, writing about the infrastructure implications of AI inference, has identified a meaningful distinction between two workload classes: answer inference (a model responding to a human query, latency-sensitive and human-paced) and agentic inference (agents doing work autonomously, scaling with compute rather than with people, and running without the natural rate-limiting that human interaction provides). Thompson arrives at this distinction from the compute architecture side, examining what the shift means for chip design and cloud economics. This series has been arriving at the same distinction from the data infrastructure side – examining what that workload class does to the systems of record that agents depend on for state, context and history. Both paths identify the same fundamental break: agentic inference doesn’t scale like anything enterprises have priced, provisioned or monitored for before.

The economic consequence of that workload class is now arriving – not as an AI line item, but as a database infrastructure anomaly that nobody’s incident playbook was written to handle.

What the Cost Is Actually Telling You

FinOps tooling is evolving. Cost-aware observability platforms are beginning to correlate infrastructure spend with workload attribution in ways that were difficult even two years ago. Those developments are real and worth tracking.

But the harder problem is not tooling. It is conceptual. Cost anomaly investigation processes were designed around a specific failure model: something misconfigured, something runaway, something that should not be happening. The costs this article describes are not that. They are the cost of agentic AI doing exactly what it was deployed to do – querying systems of record for context, maintaining state, operating continuously without human intervention.

The cost isn’t a bug. It’s what the new normal costs.

That distinction – between a cost anomaly that signals something broken and a cost anomaly that signals something fundamental – is what finance teams, platform owners and the people responsible for systems of record need to understand. Because by the time the monitoring dashboard tells them something is wrong, the architecture will already have shifted around them.

At BuildOnline, the crisis announced itself loudly and all at once. The pagers, the crimson screen, the converging team – none of that subtlety was required. The system had the decency to fail visibly.

Agentic AI will not extend that courtesy.

This article is part of the Databases in the Age of AI series.