Decades of Change Management. None of It Was Built for Agentic AI

The meeting had been running for forty minutes and everyone in the room knew something was wrong – they just couldn’t name it yet.

The change advisory board was reviewing a request to deploy an agent-based workflow into production. The documentation was complete. The testing summary was thorough. The UAT sign-off had come back with no blocking issues. The release manager had run through the checklist: test coverage, rollback procedure, deployment window, notification plan. The approver had asked the right questions. The answers had been technically accurate.

And yet the room felt uncomfortable in a way that nobody had articulated.

The agent had been tested. But the tests had produced different outputs on different runs with the same inputs. Not dramatically different – within acceptable ranges, the team said. The UAT testers had signed off because the behaviour they’d observed was acceptable. Nobody had flagged that “acceptable in the instances we observed” was doing quite a lot of work in that sentence. Nobody had asked what the approval was actually certifying. The process had continued as designed, generating documentation that looked like assurance.

What the room was sensing, without the vocabulary to name it, was that the change management apparatus had just been asked to certify something it was never built to certify.


The Assumption That Was Never Written Down

Enterprise change management is built on an assumption so foundational that it was never made explicit – because it never needed to be. The assumption is this: software behaves reproducibly. Given the same inputs, a system produces the same outputs. That reproducibility is what makes testing meaningful and what makes an approval decision valid.

That assumption is load-bearing in ways that only become visible when it’s removed.

Test environments work because production behaviour is reproducible from test behaviour. Regression suites work because a passing test means the system will pass in production. UAT sign-off works because observed behaviour is sufficient grounds for confidence in future behaviour. Rollback procedures work because returning to a previous version means returning to a known state. Approval gates work because the approver can certify, on the basis of evidence, that the system behaves correctly.

Each of those statements is only true if the system behaves reproducibly. Introduce non-determinism and the assumptions underpinning the chain no longer hold. The evidence that testing produces is no longer evidence of what the system will do in production. It is evidence of what the system did in testing. For software that behaves reproducibly, those are the same thing. For a probabilistic system, they are not.

This is why the change advisory board meeting described above felt wrong without anyone being able to say why. The process had been followed. The documentation was complete. But the documentation was recording observations from a system whose behaviour, by design, cannot be fully characterised from observation.


What the Approver Is Being Asked to Certify

The human approver in a change governance process is doing more than following a checklist. They are exercising judgment – they are deciding whether the evidence before them is sufficient grounds for confidence that the system will behave correctly in production.

That judgment has always rested on a specific kind of evidence: the system did X in testing, therefore it will do X in production. The approver’s signature is a statement that this inference is valid.

As the series has explored in the context of the hidden costs of agentic workloads, agentic AI doesn’t interact with production systems the way traditional software does. But the problem here is more fundamental than behaviour at the database tier. Most agentic systems wrap the probabilistic model in deterministic orchestration – the workflow logic, the tool calls, the guardrails. The governance challenge isn’t in certifying those layers. It’s in certifying the reasoning at the centre, which draws its responses from a probability distribution. Any individual output is one sample from that distribution, not a reproducible result.

The approver is no longer certifying correctness. They are certifying that a range of possible behaviours is probably acceptable – even though that range has not been fully characterised. Enterprise governance has no mechanism for that decision. The checkbox was never designed to contain it.

This is not a failure of the people in the room. It is a failure of the question the process is asking them to answer. The question changed. The process didn’t.


The Chain Breaks at the Same Moment

The cascade logic matters here, because it explains why the problem is not incremental.

Testing produces reproducibility. Governance consumes it. The approval decision is only meaningful if the testing was meaningful, and the testing is only meaningful if the behaviour it observed is the behaviour that will occur in production. Non-determinism breaks that chain – not at one point, but everywhere simultaneously.

This is why testing and release governance cannot be separated when thinking about agentic AI. They fail together, for the same reason, at the same moment.

Transactions assume intent – and agents don’t guarantee it. That article identified one dimension of the problem: the database has no mechanism to distinguish a well-reasoned human decision from a machine action based on misread context. The change management problem is the same problem one layer up. The governance process has no mechanism to distinguish a system whose behaviour has been validated from a system whose behaviour has been sampled. Both produce documentation that looks like assurance. Only one of them is.


Why Systems of Record Bear the Highest Risk

Non-determinism is not uniformly dangerous across all systems. A probabilistic customer-facing chatbot produces variable outputs that users experience as personality variation. The experience differs; the permanence does not. No system of record has been changed.

A probabilistic agent making decisions that commit to a system of record is a different matter entirely. Its variable outputs become permanent enterprise reality. The stakes of non-determinism are qualitatively different when the output is a database write rather than a conversational response – as the audit trail problem makes clear. When an agent commits a transaction based on reasoning that was sound in most contexts but not this one, the audit trail records what happened. It cannot tell you whether the behaviour was within the bounds that governance was supposed to certify, because governance was never able to define those bounds in terms the system could express.

Change management protecting systems of record was designed for software that behaves reproducibly. The approval process certifies correctness because correctness is categorical – the system either does the right thing or it doesn’t. Probabilistic systems don’t have categorical correctness. They have distributions of behaviour, and some parts of the tail of that distribution may not have been sampled during testing.

In regulated industries – financial services, healthcare, critical infrastructure – this matters beyond the operational. Change governance processes are not just internal quality controls. They are frequently compliance requirements, part of the evidence that an organisation can produce when a regulator or auditor asks how a change was validated before it reached production. If the process can no longer certify what it claims to certify, the organisation may be carrying degraded regulatory standing without knowing it.


The Human Judgment That Lost Its Anchor

This series has identified several places where human presence was doing structural work that was never acknowledged as structural. The brake was human – the natural rate-limiting that human interaction imposed on data systems was a protection that nobody designed and that agents remove without replacing. Human identity was load-bearing in access control and audit. Human intent was the implicit validation layer behind every committed transaction.

Change management is another one of those places. The approver signing off a UAT review wasn’t just following process. They were exercising epistemic closure – deciding that the available evidence was sufficient to certify the system’s production behaviour. That judgment was anchored in the reproducibility assumption. The evidence was trustworthy because the system was deterministic. Remove the anchor and the judgment becomes unmoored.

The people responsible for the systems that agents reach into are often not the same people driving agent deployment. But the approver in a change governance process – even when they have full operational authority over the production system – is being asked to make a judgment for which the evidence base is now structurally incomplete. They are not less capable than they were. The question underneath them changed.


Appearance of Rigour Without the Substance

The industry is reaching toward responses. Probabilistic testing frameworks. Shadow deployments. Canary releases. Continuous behavioural monitoring. These are meaningful developments, and in time some of them will mature into approaches that provide genuine assurance for agentic systems.

None of them yet resolves the governance question. They defer it.

A canary release tells you how the system behaved in a limited production exposure. Continuous monitoring tells you how it has been behaving since deployment. Neither tells you, before the deployment decision is made, what distribution of behaviours to expect and whether that distribution is acceptable for a system making irrevocable commits to enterprise data. The approval still has to be made by someone, against some standard, at a specific moment in time. That moment is what the existing governance apparatus was built to serve – and it is exactly where the apparatus fails.

Until enterprise governance frameworks develop a coherent model for certifying probabilistic systems – and none yet exist in production-ready form for systems of record specifically – change management will continue to be applied to agentic AI deployments in ways that generate documentation, satisfy checklists and move approvals through queues, while providing the appearance of rigour without the substance.

The meeting described at the start of this article will keep happening. The room will keep feeling wrong. And the process will keep running to completion.


This article is part of the Databases in the Age of AI series.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.