The âWalled Gardenâ has officially fallen. For a decade, Googleâs Tensor Processing Units (TPUs) were the forbidden fruit of the AI world. You could rent them inside Google Cloud, observe their power from a distance as they trained AlphaGo and Gemini, but you could never own them. You certainly couldnât install them in your own data center.
That rule just broke.
In a move that signals a structural shift in the trillion-dollar AI market, emerging reports indicate Google is finalizing deals to rent and eventually sell TPUs directly to Meta and potentially Apple. This is not just a supplier agreement; it is the first real crack in Nvidiaâs armor and a desperate, brilliant move by Google to save its own software ecosystem.
If you are holding Nvidia stock or building an AI infrastructure strategy, you need to understand the physics, the economics, and the software wars that drove this decision.
The News: Strange Bedfellows
The enemy of my enemy is my customer. Meta (Facebook) has been the worldâs most voracious consumer of Nvidia H100s, accumulating over 600,000 of them. But Mark Zuckerberg has been open about his desire to break free from the âNvidia Taxââthe premium paid for CUDA lock-in and high-margin hardware.
According to deep industry whispers and reports:
- The Deal: Meta will gain access to âbare metalâ TPU instances (likely Trillium v6) initially hosted by Google but partitioned completely from Googleâs public cloud.
- The Roadmap: By 2027, the framework allows for Meta to purchase TPU pods outright for installation in their own non-Google data centers.
- The Apple Factor: Similar discussions are occurring with Apple, who needs massive compute for âApple Intelligenceâ but is famously allergic to relying on competitors like Microsoft or paying Nvidiaâs margins.
This changes the supply/demand equation overnight. If Meta stops buying 20% of Nvidiaâs supply because they switched to TPUs, the scarcity that supports Nvidiaâs pricing power evaporates.
Technical Deep Dive: The Physics of Light vs. Copper
Why would Meta want a TPU when the Nvidia Blackwell B200 is the âmost powerful chip in the worldâ? The answer isnât in the chip itselfâitâs in the wires. Or rather, the lack of them.
The OCS Advantage (Optical Circuit Switching)
Nvidiaâs SuperPODs rely on InfiniBand, a wildly fast but traditional electrical switching network. To connect 100,000 GPUs, you need miles of copper and active electrical switches that constantly convert electrons to photons and back.
Googleâs secret weapon, deployed since TPU v4, is Palomar (and now Jupiter), based on Optical Circuit Switches (OCS).
Instead of digital packet switchingâwhere a router reads a packet, buffers it, decides where it goes, and re-transmits itâGoogle uses MEMS (Micro-Electro-Mechanical Systems). These are tiny, microscopic mirrors that physically rotate to bounce a beam of light from one server to another.
The Physics Wins:
- Speed of Light: There is zero optical-to-electrical conversion. The light just bounces.
- Power Zero: A mirror requires zero power to hold its position. It only consumes power when it moves (reconfigures). An electrical switch consumes massive wattage 24/7 just to keep the ports active.
- Topology Agnostic: Google can physically re-wire the network in milliseconds. If a âTorusâ shape is better for Training Job A, and a âDragonflyâ shape is better for Inference Job B, the mirrors just tilt, and the supercomputer is physically re-wired.
The Efficiency Math
While Nvidia chases raw FLOPs, Google chases Performance/TCO. The Trillium (v6) TPU claims a 67% energy efficiency improvement over the v5e, largely due to this interconnect efficiency.
Letâs look at the theoretical cooling math. Electrical switches generate heat. That heat must be cooled. OCS switches generate almost no heat.
In an Nvidia cluster, Networking Power can be 15-20% of the total cluster draw. In a TPU pod, OCS reduces that networking power draw by nearly 95%. In a 100MW data center, that efficiency delta means you can fit 30% more compute into the same power envelope. For Meta, training Llama 5, that is worth billions.
The Software War: JAX vs. CUDA
This is the part Wall Street misses. Google isnât just selling chips because it wants hardware revenue. Itâs selling chips to save JAX.
The CUDA Moat
Nvidiaâs monopoly isnât hardware; itâs software. Everyone codes in CUDA (via PyTorch). Because everyone uses CUDA, everyone buys Nvidia. Itâs a self-reinforcing loop.
Googleâs JAX
Googleâs internal language, JAX, is arguably superior for advanced physics and heavy math. It optimizes automatically for the XLA (Accelerated Linear Algebra) compiler. But outside of DeepMind and specialized researchers, adoption is low.
By putting TPUs into Metaâs hands, Google is forcing Metaâs engineersâthe best in the world outside of Googleâto optimize PyTorch for XLA.
- If PyTorch runs perfectly on TPUs (via XLA), the âCUDA Moatâ fills with sand.
- Suddenlly, AMD chips (which also run XLA/ROCm) becomes viable.
- Intel Gaudi becomes viable.
- The entire ecosystem de-couples from Nvidia.
Contextual History: A Decade of Secrets
Google invented the Transformer architecture (the âTâ in GPT) in 2017. They built the first TPU in 2015. They were years ahead. Why did they lose?
- 2016 (TPU v1): Built purely for inference (running AlphaGo).
- 2018 (TPU v2/v3): The training beasts. Liquid cooled. But Google kept them locked in Cloud. âIf you want the chip, you must use our cloud.â
- 2020-2023 (The Hubris): Google assumed their hardware advantage was infinite. They didnât sell the chips.
- 2024 (The Panic): Microsoft and OpenAI seized the lead using Nvidia GPUs. AWS built Trainium.
- 2025 (The Pivot): Google realizes that âCloud Exclusivityâ is a death sentence. Developers go where the chips are. If the chips arenât everywhere, the developers leave.
This sale is an admission of failure in strategy, but a stroke of genius in recovery.
Financial Analysis: The âNvidia Taxâ Calculation
Letâs run the numbers on why Meta is taking this deal.
The Nvidia Markup:
- H100 Manufacturing Cost (TSMC + CoWoS + HBM): ~4,000
- H100 Sale Price: ~30,000
- Margin: ~85%
The TPU Economics: Google designs the TPU in-house and uses Broadcom for backend design. They donât need to make an 85% margin on Meta. They can sell the TPU for $15,000, make a healthy 50% profit, and still offer Meta a 50% discount relative to Nvidia.
For a cluster of 100,000 chips:
- Nvidia Cost: $3 Billion
- TPU Cost: $1.5 Billion
Savings: $1.5 Billion. That pays for the entire cooling infrastructure of the data center.
Forward-Looking Analysis: The Chip Alliance
This move effectively creates an âAnti-Nvidia Alliance.â
- Google: Results in hardware revenue + saves JAX relevance + hurts their biggest rival (Nvidia).
- Meta/Apple: Gets leverage to negotiate lower prices with Nvidia and diversifies their supply chain efficiency.
- Amazon: Already has Trainium, now sees validation of the non-Nvidia path.
The Market Impact: We are moving from a Monopoly (Nvidia has 90% share) to an Oligopoly (Nvidia, Google, AWS, AMD).
- Short Term (1-2 years): Nvidia remains king. The supply backlog is too long, and software takes time to port.
- Medium Term (3-5 years): Margins compress. This specific news is the signal that the 80% gross margins Nvidia enjoys are likely peaking.
Verdict: The hardware is real. The physics (OCS) are superior for specific workloads. If Google can deliver the supply, the âCode Redâ is no longer just for Googleâitâs for Jensen Huang.
đŠ Discussion on Bluesky
Discuss on Bluesky