
The first surprise of enterprise AI is how quickly the conversation leaves the model and moves to everything around it: data centers, power contracts, GPU supply, cooling, latency, vendor dependency and inference bills that looked harmless when the pilot was small.
That is the part of the AI story many boardrooms are only beginning to absorb. The first phase of generative AI was about capability. Could the model write, code, reason, summarize and search?
The second phase is about scale.
Can we afford to run it? Can the grid support it? Can our architecture handle it? Can we measure whether all this intelligence is actually producing a business outcome?
Those questions are becoming harder to avoid.
The International Energy Agency estimates that data centers consumed around 415 terawatt-hours of electricity in 2024, roughly 1.5% of global electricity consumption. By 2030, that figure is projected to more than double to around 945 terawatt-hours, with demand growing about 15% annually from 2024 to 2030. Goldman Sachs Research has put the same shift in market terms: global power demand from data centers is forecast to rise 165% by 2030 compared with 2023 levels.
This is not a footnote to the AI story. Increasingly, it is the AI story.
For all our talk about intelligence becoming abundant, the infrastructure behind it is anything but abstract. The AI economy runs on physical constraints: GPUs, electricity, land, water, fiber, cooling, interconnects and people who can operate all of it. The cloud can make compute feel infinite to the user. It is not infinite to the company paying the bill.
Within enterprises, this is unfolding differently. The demo is cheap. The pilot is manageable. A few teams use AI to summarize documents, assist developers or draft support responses. The costs look acceptable.
Then the company tries to scale.
Suddenly, one workflow is not one model call. An agent may search, retrieve, reason, call a tool, check a policy, ask another model to verify, retry after an error, summarize the result and log the action. One business task becomes a chain of calls. Multiply that across thousands of employees, customers and automated workflows, and the economics change quickly.
This is the inference paradox.
When the cost of intelligence falls, companies do not necessarily spend less on intelligence. They use more of it. Cheaper storage did not make companies store less data. Cheaper compute did not make them run fewer applications. Cheaper tokens will not make enterprises use fewer model calls. If anything, agents will increase consumption because they turn AI from an occasional assistant into an always-on layer of work.
This is why AI strategy is becoming infrastructure strategy.
Gartner has warned that more than 40% of agentic AI projects may be canceled by the end of 2027 because of escalating costs, unclear business value or inadequate risk controls. The same forecast expects 33% of enterprise software applications to include agentic AI by 2028, up from less than 1% in 2024.
Put those two facts together and the picture becomes clearer. Agents are coming. Many agent projects will also fail. Not because the idea is wrong, but because companies will underestimate the operating discipline required to run them. A chatbot can be treated like a feature. An agent behaves more like an operating unit. It needs identity, permissions, logs, memory, error handling, escalation rules, cost controls and security boundaries. It also needs a way to know whether it completed the task correctly, and at what price.
As we approach 2026, the biggest AI question in many companies is no longer just, “Are we moving fast enough?”
That question changes the conversation. The next AI metric is not tokens per dollar. It is dollars per business outcome. Cost per resolved support ticket. Cost per contract reviewed. Cost per bug fixed. Cost per qualified lead. Cost per hour saved that actually turns into margin, speed or revenue.
That sounds obvious, but most companies are still not set up to measure it. They can track usage. They can track licenses. They can show that employees are using copilots. But usage is not value. A lot of AI activity sits at the edge of the company, making individuals feel faster while the operating model remains mostly unchanged.
The infrastructure winners will think differently.
They will understand that AI scale is not one bottleneck. It is a stack of bottlenecks. At the bottom is power. Above that is data center capacity. Then GPUs. Then inference engines. Then model routing. Then application design. A company that treats all of this as “cloud spend” is going to miss the strategic shift.
This is already visible in how the largest technology companies are behaving. Microsoft signed a 20-year power purchase agreement with Constellation to support the restart of Three Mile Island Unit 1, now positioned as the Crane Clean Energy Center, to match data center electricity use with carbon-free energy. Google signed a deal with Kairos Power to buy energy from multiple small modular reactors, targeting up to 500 MW by 2035. Amazon expanded its nuclear-linked relationship with Talen Energy through a 1,920 MW power purchase agreement to support operations in Pennsylvania.
That tells us something important. AI infrastructure is no longer just about buying chips. It is about securing energy.
But chips still matter. A lot. Nvidia’s Blackwell platform was announced with a specific promise: running real-time generative AI on trillion-parameter models at up to 25x less cost and energy consumption than the prior generation. That is not just a hardware upgrade. It is a business model unlock. If one generation of chips can materially reduce cost per token, applications that looked too expensive can suddenly become viable.
There is also a layer of innovation above the chip. The question is not only, “How many GPUs do I have?” It is, “How much useful work can I squeeze out of each GPU?”
Nvidia’s TensorRT-LLM showed how inference software can change economics. On Llama 2 70B, Nvidia reported a 4.6x performance speedup, leading to a 3x reduction in total cost of ownership and a 3.2x reduction in energy consumed compared with an A100 baseline. vLLM’s PagedAttention work showed the same broader idea from the open-source side: better memory management can produce up to 24x higher throughput than Hugging Face Transformers and up to 3.5x higher throughput than Text Generation Inference in its early experiments.
Business leaders do not need to understand every detail of FP8 kernels or memory paging. But they do need to understand the implication: AI cost is not fixed. The same model can be wildly more or less expensive depending on how it is served, batched, cached, quantized and routed. Post-training and distillation will matter here too: large frontier models can teach smaller, cheaper models to perform narrow enterprise tasks reliably. You don’t need a trillion-parameter model that can write Shakespeare to process invoices, review contracts or file taxes.
This is where model routing becomes important. Many companies still send too many tasks to the most powerful model because it feels safe. That is expensive and often unnecessary. A simple extraction, classification or support query does not always need a frontier model. Research such as FrugalGPT showed that cascading across models can match the performance of the best individual LLM with up to 98% cost reduction in some experiments.
That is likely where enterprise AI is heading. Not one model for everything, but a routing layer that decides which model, tool or workflow should handle the task. This is not glamorous work. It will not produce the viral demo. But it is the work that separates an AI experiment from a production-grade AI system.
If AI becomes core to how a company serves customers, writes code, manages operations or makes decisions, then compute access and inference economics become board-level concerns. Vendor concentration matters. Latency matters. Energy availability matters. GPU supply matters. Model routing matters.
The companies that win will not be the ones that use the most AI. They will be the ones that waste the least. They will know which tasks deserve premium intelligence and which do not. They will treat compute as scarce even when the interface makes it feel abundant. They will measure dollars per business outcome, not just tokens per dollar.
The first AI winners proved intelligence could be packaged. The next winners will prove it can be delivered reliably, affordably and at scale.
Sources
- International Energy Agency, Energy and AI, 2025.
- Goldman Sachs Research, How AI Is Transforming Data Centers And Ramping Up Power Demand, 2025.
- Gartner, Over 40% Of Agentic AI Projects Will Be Canceled By End Of 2027, 2025.
- Constellation / Microsoft, Crane Clean Energy Center announcement.
- Google / Kairos Power nuclear energy agreement.
- Talen Energy / Amazon power purchase agreement.
- Nvidia Blackwell platform announcement.
- Nvidia TensorRT-LLM technical blog.
- vLLM, Easy, Fast, and Cheap LLM Serving with PagedAttention.
- FrugalGPT paper.
Comments
Loading comments…