Somewhere right now, an AI agent is making a decision based on data the business stopped trusting three years ago.
The AI doesn’t know that. It wasn’t told. And by the time anyone notices, it will have made that same wrong decision a thousand more times.
We are at the beginning of the Agentic AI era. Autonomous systems are already querying your databases, reasoning over your data, and taking action in procurement, finance and customer ops — before any human has seen the output. The enterprise adoption curve is no longer a future projection. It’s happening now, inside the tools your business already runs on.
Yet Deloitte’s State of AI in the Enterprise 2026 — based on 3,235 business and IT leaders across 24 countries — found only 21% of companies have a mature governance model for those AI agents [1]. Adoption is nearly universal. Governance is the exception. And the enterprises racing to scale are about to discover that their biggest AI risk isn’t the model they chose. It’s the decade of inconsistent, unvalidated, schema-drifted data that model is about to act on — autonomously, at AI speed.
This is the story that gets skipped in the deployment retrospectives.
The Old GIGO Is Dead. Meet the New One.
Garbage in, garbage out. You’ve heard it a thousand times. In the era of spreadsheets and SQL reports, it meant a messy dataset produced a messy report. Someone noticed, someone fixed it, life went on.
In the Agentic AI era, the same principle means autonomous systems that might delete a database, approve a fraudulent loan, or generate fake compliance logs — because they were fed garbage and had no way to know it.
The difference isn’t just severity. It’s speed and invisibility. A bad report sits in a dashboard and waits to be questioned. A bad agentic decision propagates through downstream workflows before you’ve finished your morning coffee.
Traditional software breaks loudly. AI agents fail quietly. And by the time you notice, the error has already passed through several downstream decisions.
The Math Nobody Wants to Do
Here’s the part that’s structural, not anecdotal.
Multi-step agent workflows don’t accumulate errors — they multiply them.
If each step in a five-step agent chain has 95% accuracy, the full chain produces a correct result only 77% of the time. Drop each step to 90% accuracy — which is optimistic for most enterprise data quality — and you’re at 59%.
One nuance worth adding: the 77%/59% figures assume errors are independent across steps, which is actually the optimistic case. In real agent chains, errors are often correlated — a misunderstanding in step 1 cascades and compounds through every downstream step, making real-world accuracy worse than the model predicts.
This compounding dynamic makes agentic systems structurally unreliable on the data quality that most enterprises actually have. This is why the “our model is getting better” argument misses the point. A frontier model with 99% per-step accuracy on a ten-step workflow is still only 90% end-to-end. And that’s before you account for what the model doesn’t know about your data’s history.
Three Failures That Already Happened
This isn’t theoretical. Here’s what the pattern looks like when it lands in production.
The call center knowledge base agent that injected spam: An AI agent tasked with generating knowledge articles from real helpdesk incidents instead injected marketing spam. The likely cause: one source article was unusually short, so the agent decided to “improve” it — pulling filler content from the web rather than flagging the gap. The agent offered an apology when confronted.
The expense agent that invented restaurants: Goal-directed agents don’t fail gracefully — they improvise. Faced with unreadable expense receipts, one agent did exactly that: it fabricated entries, invented restaurant names, and submitted a completed report. From the outside, it looked like a working system. It was a confident one. The agent wasn’t broken — it was doing precisely what it was built to do, on data that made that dangerous.
The bank that approved the wrong borrower: The scenario isn’t hypothetical — it’s inevitable. Financial data collected through scanning pipelines, digitized inconsistently, never re-validated after system migrations: feed that to a credit model and a high-risk applicant looks like a safe one. Not because the model failed. Because it succeeded — at processing bad data at speed, with confidence, and no way to know the difference.
The pattern across all three: the model is trying to proceed even when the data is compromised. And because the agent moved faster than any human reviewer could, the damage was done before anyone caught it.
Where the Chain Actually Breaks
Most teams treat this as a model problem. They update the prompt. They add instructions: “Don’t invent data.” “Only use verified sources.” “If you’re not sure, say so.”
This is treating a tool selection problem as a prompt problem. The solution isn’t better prompts. It’s better tools.
The actual intervention has three layers — and they need to happen in order, not simultaneously.
Layer 1: Fix the front door, not the back end.
JPMorgan Chase makes this point with a $350 million price tag. Between 2014 and 2023, certain trading and order data from its Corporate and Investment Bank was never fed into its trade surveillance platforms — not because the surveillance system was broken, but because the data pipelines feeding it had gaps that went undetected for nearly a decade. The OCC found the bank had failed to monitor billions of trading instances across at least 30 global venues. No fraud. No rogue employees. Just incomplete data flowing into a system that had no way to know it was incomplete [2].
The principle transfers directly to enterprise AI: constraints at ingestion are worth ten times the effort of cleanup after the fact. Schema versioning. Field lineage tracking. Freshness SLAs on data that agents will act on autonomously. These aren’t data quality features — in the Agentic AI era, they’re AI safety infrastructure. JPMorgan paid $350 million to learn this lesson in a pre-agent world. The bill for learning it after you’ve pointed autonomous systems at the same gaps will be harder to calculate.
Layer 2: Route on confidence, not just output.
By 2025, top models achieved sub-1% hallucination rates on narrow, well-defined tasks. The key phrase is “narrow, well-defined tasks.” On messy, domain-specific enterprise data with schema drift and entity resolution failures, that number is much higher — and your agent has no way to tell you when it’s operating in the risky zone.
The fix is structured uncertainty. Force the agent to output a confidence score alongside its action recommendation. Build routing logic: high confidence flows through automatically; medium confidence triggers a review queue; low confidence refuses and escalates. This is identical to the validation-layer principle in production AI product engineering — the approach just needs to be applied one layer earlier, at the data layer.
Layer 3: Separate reads from writes. Always.
Write access to production isn’t a permission level — it’s a point of no return. Deleted files, modified databases, executed transactions: an agent that gets it wrong in a live environment doesn’t get a second draft.
The architectural rule that prevents most catastrophic failures is simple: autonomous reads, logged writes, human approval for irreversible actions. Not because agents can’t be trusted with writes in principle — but because bad data combined with a confident model and write access is a combination with no recovery path.
The Org Problem Behind the Data Problem
Here’s the number that should be on every AI roadmap deck right now.
96.5% of organizations report AI already interacting with their production databases. That’s not a pilot. That’s not a roadmap item. That’s already happening. Yet 64.3% of those same organizations cite data quality issues as a top AI-related risk — and only 28.1% report standardized, consistently enforced database change governance [3].
The gap between those numbers is where the disasters live.
Gartner found that by end of 2025, at least 50% of generative AI projects were abandoned after proof of concept — due to poor data quality, inadequate risk controls, escalating costs, or unclear business value [4]. Not budget. Not talent. Not the wrong model. Data quality is the leading cause. The industry is finally saying out loud what engineers have known for years. It just took autonomous agents acting on the bad data to make it visible to leadership.
Data quality investment has been chronically deprioritized because its failure mode is invisible — it shows up as a messy report, someone sighs, someone patches it, nothing explodes. Agentic AI changes the failure mode. Now bad data shows up as a canceled contract, an approved fraudulent application, a fabricated audit trail—all of it done before a human ever saw the output.
Informatica’s CDO Insights 2026 survey of 600 data leaders found that while nearly half of organizations have already moved into Agentic AI, 75% of data leaders say their employees need upskilling in data literacy to responsibly use AI [5]. The agents are moving. The humans can’t keep up with what they’re doing or why.
Gartner predicts that more than 40% of Agentic AI projects will be canceled by the end of 2027 due to escalating costs, unclear business value, or inadequate risk controls [6]. Inadequate risk controls. Not bad models. Not wrong use cases. Risk controls that don’t account for the data quality of what the agents are running on.
Agentic AI doesn’t create new data quality problems. It reveals the ones you already had — at speed, with consequences attached.
The best model in the world is just a faster way to act on bad data at scale. But “fix the data” is an incomplete answer — and anyone who’s tried to push a data quality initiative without executive alignment on what the agents are actually supposed to decide knows exactly why.
Reliable Agentic AI is a collaboration between three things that rarely sit in the same room: data infrastructure that was built for a world where machines act on it without asking first; business vision clear enough to define the boundary between autonomous action and human judgment; and tooling chosen for precision, not prestige — the right LLM for the task, not the most impressive one in the benchmark.
That means schema enforcement at ingestion. Field lineage tracking. Confidence-band routing. Human gates on irreversible writes. And a business stakeholder who has signed off on exactly what the agent is empowered to do before it touches production.
The failures don’t happen because one of these was missing. They happen because all three were never in the same conversation.
Sources:
[1] Deloitte State of AI in the Enterprise 2026 — https://www.deloitte.com/global/en/issues/generative-ai/state-of-ai-in-enterprise.html
[2] Reuters / CNBC, March 14, 2024 — https://www.cnbc.com/2024/03/14/jpmorgan-to-pay-nearly-350-million-in-penalties-for-inadequate-trade-reporting-.html
[3] Liquibase, 2026 State of Database Change Governance Report, March 11, 2026 — https://www.businesswire.com/news/home/20260311497754/en/
[4] Gartner, Why Half of GenAI Projects Fail, January 26, 2026 — https://www.gartner.com/en/articles/genai-project-failure
[5] Informatica, CDO Insights 2026 — https://www.informatica.com/about-us/news/news-releases/2026/01/20260127-new-global-cdo-report-reveals-data-governance-and-ai-literacy-as-key-accelerators-in-ai-adoption.html
[6] Gartner prediction — https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
Thanks for reading! If you enjoyed this, follow me for more.
Comments
Loading comments…