There is a moment, familiar to anyone who has run infrastructure at scale, when the numbers stop making sense. A cloud cost review lands on your desk and the database line is up thirty percent quarter-on-quarter. Nobody changed the schema. No new product launched. The application team says traffic is flat. The platform team points at autoscaling events. Everyone agrees it shouldn’t look like this, and nobody can explain why it does.
That moment is arriving in enterprise data teams everywhere. The explanation isn’t a bug. It isn’t misconfig. It’s that the sizing assumptions the database was built around no longer describe the load it is running. Earlier pieces in this series examined what agent load does to database behaviour. This one is about what it does to the forecast.
The assumptions nobody wrote down
Every enterprise database capacity model rested on a set of assumptions so self-evident that nobody stated them explicitly. Load arrived from human users. Humans have think-time – the pause between seeing a screen, understanding it, deciding what to do and pressing the next button. That load arrived in predictable patterns, peaking at business hours and collapsing overnight. Even when human actions triggered workflows, the initiating pattern was still human-scale: one person, one decision, some amount of intent behind the write. Fan-out was bounded – a user touching one part of the system rarely triggered cascading reads across a dozen related tables.
These assumptions weren’t documented in your sizing exercises because they didn’t need to be. They were built into the methodology itself, derived from years of observed behaviour. Peak load was a worst-case scenario built from human-scale extremes. Connection pools were sized for the maximum number of concurrent human sessions the business could realistically expect. Locking and transaction management were designed around workloads where individual operations were paced by human cognition.
Nobody questioned those foundations because, for the entire history of enterprise computing, they were correct.
Agents remove the pacing
AI agents don’t arrive with think-time. They fan out across whatever data they need. They do it recursively, in parallel and at machine speed. The natural pacing that human behaviour provided simply disappears.
The mechanics are not mysterious. Batch work collapses into real-time requests. A single prompt fans out across tables, schemas and services. An agent queries, evaluates the answer, decides it needs more context and queries again. From the application interface, this may still look like one user action. At the database layer, it is something else entirely. The point is not to relitigate those patterns here – it is that each one invalidates the historical data used to forecast capacity.
Agent load does not sit at the edge of the old planning envelope. It sits outside it.
Concurrency assumptions face the same problem from a different direction. Connection pooling was designed around maximum concurrent human sessions. An agent framework under load doesn’t behave like a large number of humans; it behaves like software calling software, with no human pause in the loop. Connection pools sized for human concurrency can be exhausted by a modest agent workload before the capacity alarm triggers. Lock contention patterns change. Transaction pacing – the implicit assumption that operations arrive with gaps between them – disappears. The result is interference with every other workload on the system, not because the database is failing but because its concurrency design was never built for this.
The two failures don’t arrive separately. Agents violate both assumptions simultaneously.
The cloud turns failure into a monthly cost
The two failures converge in one place: the bill.
The shift from on-premises infrastructure to cloud services changed the relationship between capacity decisions and pricing decisions. On-premises, over-provisioning was a capital planning problem – you bought too much hardware, it sat underutilised, you wrote it off. Uncomfortable, but bounded. In the cloud, over-provisioning is a recurring line item. Every compute unit you provision for load the old plan did not anticipate shows up in next month’s bill. And the month after that.
When agent load sits outside the modelled envelope, teams face a binary choice: accept availability degradation as the system exhausts the resources it was sized for, or overprovision to cover the gap. Responsible teams overprovision. They have to. Nobody wants to be the person who held the line on cost while the order system, claims platform or payments engine fell over.
But overprovisioning in this context isn’t a tactical response to a spike. It’s a structural adjustment to the fact that the old sizing assumptions are no longer valid. The agent load profile isn’t an exception. It’s what the system looks like now.
This is also where the removal of human mediation has direct economic expression. As explored in an earlier piece, human users were natural rate-limiters. Their think-time, their bounded sessions, their predictable behaviour held the cost envelope in place without anyone designing that property into the system. Agents remove it. The cost structure that seemed stable wasn’t engineered – it was a side effect of who was generating the load. That side effect is gone.
It is the same pattern as the hidden cost of letting agents query live systems: costs arriving through mechanisms the old model does not track, at a scale it did not anticipate, becoming permanent fixtures in the cost base.
The limit of observability
The industry will respond with dashboards. It always does. There will be agent-aware observability, workload classification, separate cost attribution and better database telemetry. Some of that will be useful.
None of it changes the underlying problem.
Capacity and concurrency assumptions are built from historical load data. They are, by construction, backward-looking. They extrapolate from what has been observed to predict what should be planned for. The problem with agent load isn’t that the tools to observe it are immature – it’s that even perfect observability of the past wouldn’t produce a valid forecast for the future, because agents are generating load that has no historical precedent in most enterprise environments.
You can’t build a planning model from data that doesn’t exist yet. And until you have enough agent-generated load history to construct a new baseline – enough to understand the compression patterns, the fan-out ratios, the feedback loop characteristics specific to your workload – your sizing assumptions remain an artefact of a world that no longer describes what is happening to your system.
The bill arrives before the model catches up.
In the cloud, that gap is not theoretical. It renews every month.
This article is part of the Databases in the Age of AI series.