2. Where every dollar goes

category

Playbooks

date

6/8/2026

author

Decomposing the AI cost stack

When people describe AI as expensive, they usually gesture at GPUs. The GPU explanation is technically accurate and analytically shallow. It is the equivalent of explaining why airlines are unprofitable by pointing at jet fuel. Correct, incomplete, and not particularly useful for anyone trying to understand what changes.

Most analyses treat AI costs as a monolith: compute is expensive, things will get cheaper, wait and see. That framing misses the structure of the problem entirely. The AI cost stack has at least five distinct layers, each with its own economics, its own leverage points, and its own set of beneficiaries extracting margin. Breaking open the black box reveals something important: the obstacles to Intelligence on Tap are real, specific, and — with one or two exceptions — solvable.

Part 1 of this series established why solving them matters. The trajectory of inference costs, falling 1,000x in three years, suggests the economics of AI delivery are moving in the right direction at an unprecedented pace. But trajectory is not destiny. Understanding where every dollar currently flows is the prerequisite for identifying where compression will happen next.

Layer one: Chips and accelerators (47-67% of training cost)

The GPU explanation for AI costs is incomplete, but it is not wrong. Chips and accelerators represent the largest single layer in the training cost stack, consuming 47 to 67 cents of every training dollar depending on workload and model architecture.

The dominant supplier in this layer is capturing margins that have no parallel in modern technology history. NVIDIA's data center GPU business operates at 73-75% gross margins, exceeding the most notorious technology monopolies of prior eras. Oracle databases at peak dominance reached 65% gross margins. Cisco networking equipment, at the height of the late 1990s infrastructure buildout, operated in the same range. NVIDIA has surpassed them. The acceleration of revenue, profits in aggregate and per employee at NVIDIA are stunning and are unprecedented in our modern era of competitive capitalism.

What makes this margin durable, for now, is not the silicon itself. CUDA, NVIDIA's parallel computing platform, has been installed and learned by approximately 4.5 million developers over fifteen years. The lock-in is not hardware alone. It is software, workflow, and accumulated expertise. A developer moving from NVIDIA to an alternative accelerator does not just swap chips. They retrain their team, rewrite their code, and absorb significant transition risk at a time when time is priceless, and capital is abundant to the players. That friction is worth more to NVIDIA than any physical moat in its supply chain.

The path through this layer is already visible. Custom silicon programs at Google (TPUs), Amazon (Trainium), and Microsoft (Maia) have demonstrated 50-70% lower training costs for specific, targeted workloads. These are not theoretical cost reductions. They are in production. The strategy mirrors Google's original commodity-server approach precisely: sacrifice flexibility for efficiency on the tasks you run at scale, and extract enormous cost advantages as a result. The question for the rest of the industry is not whether custom silicon works. It is whether any given organization runs sufficient volume to justify the development cost.

Layer two: Non-chip infrastructure (24-35%)

The second layer is less visible but nearly as significant. Interconnects, servers, memory, and networking consume 24 to 35 cents of every infrastructure dollar, and this layer has its own margin extraction problem.

NVIDIA's NVLink and NVSwitch interconnect ecosystem — the high-speed fabric required to link GPUs within and across servers — adds $3,000 to $5,000 per GPU in interconnect hardware alone. High-bandwidth memory, the specialized memory that feeds GPU compute at speeds standard DRAM cannot match, costs $300 to $600 per chip. Every component in the AI server stack commands premium pricing because the demand is inelastic: there is no equivalent substitute for running frontier models at production scale.

The compounding effect matters here. Each layer in the stack carries its own margin. A data center operator buying NVIDIA GPUs, NVIDIA interconnects, specialized memory, and custom networking equipment is paying premium pricing at every step of the assembly. The fully loaded cost of a GPU cluster is substantially higher than GPU list prices suggest. This is the AI equivalent of what happened with enterprise software stacks in the 1990s, when the server cost was just the beginning of what Oracle, Sun, and middleware vendors extracted from each deployment.

Layer three: R&D and Talent (29-49%)

This is the number that surprises most people analyzing AI unit economics.

Researcher compensation rivals hardware as a cost driver in the AI stack. Top AI researchers earn a median total compensation of approximately $875,000, according to industry surveys. Elite researchers — the handful capable of independently advancing the state of the art in training efficiency or model architecture — command packages ranging from $1 million to $20 million. A frontier training effort requires a team of 50 to 200 people. Personnel costs for a single major training run, including salaries, infrastructure engineering, and research operations, run $150 million to $250 million.

The R&D-to-revenue ratio for leading AI labs runs 60-150%. Mature technology companies typically spend 15-20% of revenue on research and development. AI labs are spending multiples of their total revenue on research, before accounting for compute costs. This is not a sign of dysfunction. It is the correct behavior for a technology in its formative phase, where the research output has not yet been fully monetized. But it is a cost component that does not appear in most discussions of AI economics, and it must.

The talent layer also has a different compression curve than chips. GPU prices fall as manufacturing scales and competition emerges. Researcher salaries do not follow the same trajectory. As the number of organizations competing to hire elite AI talent continues to grow, this layer may prove stickier than any other in the stack.

Layer four: data (emerging but structural)

The free data era is over.

Through approximately 2022, frontier AI models were trained substantially on internet data scraped at low or zero marginal cost. The legal and commercial landscape has shifted decisively. Known data licensing deals exceeded $800 million in aggregate in 2024, with more announced regularly. Reddit's content licensing arrangement with Google runs approximately $60 million per year. Arrangements with major news organizations, academic publishers, and other content owners are establishing market rates that will only increase as AI companies compete for training data with demonstrable quality advantages.

Frontier labs are also paying credentialed experts directly to generate new training data from scratch. Scale AI and Mercor have built billion-dollar businesses recruiting lawyers, doctors, teachers, investment bankers, senior developers, and math olympiad winners at rates of $200 per hour and more for the reasoning-heavy datasets that post-training now demands.

Beyond licensing costs, copyright exposure adds billions of dollars in potential liability to the balance sheets of AI labs actively litigating their data practices. The litigation risk alone is altering procurement behavior. Several labs have shifted meaningfully toward synthetic data generation, which carries its own computational cost but eliminates legal exposure. Either way, the days of training foundation models on essentially free internet data are behind the industry. Data has moved from a near-zero-cost input to a structural line item with its own set of market dynamics.

Layer five: Energy (small today, strategic tomorrow)

Energy represents only 2-6% of training costs at current prices — a relatively modest share of the total stack. The forward picture is different.

AI-optimized data center facilities cost $20 million to $40 million per megawatt to build, compared to $7 million to $12 million for traditional data center construction. Power consumption per rack has increased roughly 10x over five years as compute density has grown. The power demands of frontier model training are doubling approximately every year, with projections of 4 to 16 gigawatts of dedicated AI compute capacity required by 2030.

Energy is not yet a cost problem. It is becoming a bottleneck problem. The limiting factor in several hyperscaler expansion plans is not capital, not silicon availability, and not talent. It is the speed at which power infrastructure can be permitted, built, and connected. This constraint does not appear prominently in cost-per-token calculations today, but it is already shaping where data centers get built, how quickly capacity expands, and which providers can serve latency-sensitive workloads from which geographies. Energy is the layer that starts small and ends strategic.

Sources: SemiAnalysis, LambdaLabs, a16z infrastructure analysis; percentages vary by model type and workload

The amortization problem

Cutting across all five layers is a structural distortion in how AI infrastructure costs get reported.

GPU hardware becomes economically obsolete in 2 to 4 years as each new generation delivers roughly 2x the price-performance of its predecessor. Companies depreciate that hardware over 5 to 6 years using accounting conventions designed for physical infrastructure that wears out slowly. The result is a systematic understatement of true infrastructure costs in reported financial results.

This is not an academic observation. It directly affects unit economics comparisons. A data center that depreciated its H100 fleet over 6 years and then faces replacement pressure from B200 availability in year 3 is carrying assets on its balance sheet at values that do not reflect their remaining economic utility. The true cost of AI infrastructure is higher than depreciation-adjusted numbers suggest — and any analysis that ignores this mismatch will reach systematically optimistic conclusions about the pace of margin improvement.

The resolution is architectural separation: dedicated training hardware on accelerated replacement cycles, inference hardware on longer ones. Training workloads drive the frontier and require the latest silicon. Inference workloads operate on more mature, price-stable hardware and can tolerate longer replacement cycles. Managing these fleets separately, with different replacement cadences and different depreciation treatment, is a meaningful lever for improving both economics and the accuracy of how those economics are reported.

Where the leverage is

Breaking open the AI cost stack reveals a consistent pattern. Each layer has near-term defenders: NVIDIA's CUDA ecosystem, specialized memory suppliers, hyperscaler talent markets, incumbent data holders. But each layer also has a credible compression path.

The AI cost stack is not a monolith. It is a set of discrete engineering problems, each with identifiable leverage points, progressing at different speeds.

The Google precedent matters here as well. Google did not solve its infrastructure cost problem in a single move. It addressed commodity servers first, then custom networking, then eventually custom silicon with TPUs — a process that took nearly a decade. AI is following the same sequence, compressed into a shorter timeframe by competitive pressure and the pace of underlying hardware improvement.

The most consequential compression path operates on the work itself. DeepSeek-V3 trained a 671-billion-parameter mixture-of-experts model for $5.6 million in 2.79 million GPU hours, less than 10% of Llama 3.1 405B's compute. Sparse activation means only 37 billion of those parameters fire per token at inference, and distillation packs reasoning into smaller models, so savings compound on every query served, not just the training run.

Custom silicon for the chip layer. Open interconnect standards and next-generation memory for non-chip infrastructure. Open-source model development (Meta's Llama family being the most visible example) that amortizes talent costs across the industry rather than concentrating them at individual labs. Synthetic data generation as a substitute for licensed content. Geographic diversification and purpose-built efficiency gains for energy. None of these solutions arrives simultaneously. None arrives without friction. But the structure of the problem is now clear.

The companies that understand exactly where their dollars are going are the ones positioned to compress costs where compression is possible, and to avoid spending capital chasing leverage that does not exist. That clarity is what separates an engineering problem from an open question.

Part 1 of this series established that AI's cost trajectory looks like Google's: falling fast, with volume growth that will eventually outrun price decline. Part 2's answer is that the cost stack, when decomposed, looks like Google's as well: multiple discrete layers, each solvable, none solved all at once. The companies that worked through those layers methodically — Google with servers, then networking, then silicon — built the most valuable infrastructure businesses in history.

Intelligence on Tap requires working through the same layers in AI. The stack is visible. The leverage points are identifiable. The only question is sequence and speed.

Sources and data notes

Cost stack percentages: Industry estimates based on LambdaLabs, SemiAnalysis, and a16z infrastructure analysis; percentages vary by model type and infrastructure configuration.

NVIDIA margins: NVIDIA 10-K and quarterly earnings filings; CUDA developer count from NVIDIA public disclosures.

Custom silicon cost reductions: Google TPU performance data from Google Research; Amazon Trainium from AWS re:Invent disclosures; Microsoft Maia from public announcements.

Interconnect and memory costs: SemiAnalysis GPU server teardown analysis; HBM pricing from memory industry surveys.

Researcher compensation: Levels.fyi AI researcher salary surveys; company filings and public disclosures.

Data licensing: Licensing deal aggregates from industry reporting; Reddit/Google licensing terms from SEC filings.

Energy costs: McKinsey data center construction cost estimates; Uptime Institute power density surveys; IEA AI energy consumption projections.

DeepSeek-V3: https://www.deeplearning.ai/the-batch/deepseek-v3-redefines-llm-performance-and-cost-efficiency/, https://arxiv.org/html/2412.19437v1, https://deepwiki.com/deepseek-ai/DeepSeek-V3/3.3-mixture-of-experts-(moe)

Download the full report

The economics, architecture, and future of AI, and what must change for ubiquitous, on-demand intelligence to become a sustainable, long-term reality.

Thank you! You can download the PDF now.
Something went wrong while submitting.

Related insights

Playbooks

Playbooks
Podcasts
Lak Ananth, Jonathan Goldberg
6/9/2026

1. The economics of Intelligence on Tap

 

Playbooks
Podcasts
Lak Ananth, Jonathan Goldberg
6/7/2026

3. The architecture wars

 

Playbooks
Podcasts
Lak Ananth, Jonathan Goldberg
6/6/2026

4. Tokens, thinking, and speed

 

Playbooks
Podcasts
Lak Ananth, Jonathan Goldberg
6/5/2026

5. Beyond transformers

 

Playbooks
Podcasts
Lak Ananth, Jonathan Goldberg
6/4/2026

6. From concentration to diffusion