Link Copied!

銅の壁:未来のAIチップが光で動作しなければならない理由

私たちは電気の物理的な限界に達しています。NvidiaとTSMCがAI産業を熱死から救うために、銅線をシリコンフォトニクスに置き換える競争をしている理由。

🌐
言語に関する注記

この記事は英語で書かれています。タイトルと説明は便宜上自動翻訳されています。

データ伝送に光を利用したシリコンフォトニクスチップの顕微鏡写真。

There is a dirty secret in the AI industry: We are running out of electricity. Not just to power the chips, but to move data between them.

For 50 years, we have relied on copper wires to move electrons from point A to point B. It worked for CPUs. It worked for early GPUs. But for modern AI training clusters—which act like one giant brain spanning thousands of chips—copper has hit a physical limit known as the “Copper Wall.”

Resistance creates heat. Capacitance slows signals down. As we try to push terabytes of data between Blackwell GPUs, the copper wires are turning into heaters.

The industry’s solution is radical: Stop using electricity to move data. Use light.

The Physics Problem: The SerDes “Tax”

To understand why this shift is inevitable, we have to look at the physics of data transmission, specifically the SerDes (Serializer/Deserializer).

Every time a GPU sends data, it has to:

  1. Serialize: Take parallel data and turn it into a fast stream.
  2. Amplify: Blast that stream with enough voltage to push electrons through a resistive copper wire.
  3. Equalize: The receiver has to filter out the noise and signal degradation caused by the wire.

This process is expensive.

  • Copper Cost: Moving a bit of data across a server board costs 10-15 picojoules per bit (pJ/bit).
  • The Scale: When you are moving Exabytes of data for AI training, that adds up to megawatts.

Silicon Photonics changes the math.

  • The Light Cost: Once a photon is generated, it moves through a waveguide (a microscopic glass tunnel) with near-zero resistance.
  • Efficiency: Optical interconnects can move data for < 1 pJ/bit. That is a 10x order-of-magnitude improvement.

The Technology: How to Bend Light on a Chip

How do you put lasers on a computer chip? It’s harder than it sounds.

1. The Waveguides (Silicon Nitride)

You can’t just shine a light through silicon; it eventually scatters. Engineers etch tiny tunnels of Silicon Nitride or Silicon-on-Insulator (SOI) onto the chip.

  • Total Internal Reflection: Just like fiber optic cables under the ocean, these tunnels bounce light off their own walls, allowing data to turn 90-degree corners on a chip without losing signal.

2. The Modulation (Ring Resonators)

You need to turn the light “On” and “Off” billions of times a second to encode data (1s and 0s). They use Micro-Ring Modulators.

  • Imagine a tiny ring of silicon next to the waveguide. When you apply a tiny voltage to the ring, its refractive index changes (the Electro-Optic Effect).
  • It goes from “transparent” to “opaque” instantly. This switch happens in picoseconds.

3. The Thermal Nightmare (Thermo-Optic Effect)

Here is the catch: Silicon changes its optical properties when it gets hot processes. A 1-degree temperature change can de-tune the laser, blinding the connection.

  • The Fix: Every single ring modulator has a tiny, atomic-scale heater built into it. A control loop constantly tunes this heater to keep the ring at the exact temperature required to resonate, fighting the heat coming off the GPU next door.

Co-Packaged Optics (CPO): The End of Pluggables

For years, “optical” meant the distinct cables plugging into the back of a server (QSFP-DD modules). The electrons had to travel 10 inches across the motherboard to get to the plug. That 10-inch journey is where the energy is lost.

COUPE (Compact Universal Photonic Engine) is TSMC’s answer. It uses 3D Stacking to place the photonic engine directly on top of (or right next to) the GPU die.

  • The Path: The electrical signal travels microns, not inches.
  • The Latency: 130 picoseconds (optical) vs 330 picoseconds (copper) for on-chip global interconnects.

The “External Laser Source” (ELS) workaround

There is one major problem with putting optics on a hot GPU: Heat kills lasers. Lasers degrade rapidly above 70°C. GPUs run at 80°C+.

The solution is the External Laser Source (OIF ELSFP Standard).

  • Remote Power: The actual laser diode is placed in a cool box on the front of the server rack (the “power supply” for light).
  • Fiber Delivery: It shoots a “blank” beam of continuous light into the hot GPU package.
  • Modulation: The GPU’s photonics chip just modulates (chops up) that light. It doesn’t generate it.
  • Benefit: If the laser dies, you just swap the pluggable module on the front. You don’t throw away the $30,000 GPU.

The Players: Who is Winning?

1. The Giants: Nvidia & TSMC

Nvidia has already signaled that its future NVLink interconnects will go optical. The timeline is rumored for 2026-2027 with their “Rubin” architecture. They are relying on TSMC’s COUPE heterogeneous integration to make this mass-producible.

2. The Disruptors: Lightmatter & Ayar Labs

Startups are moving even faster.

  • Lightmatter: Their “Passage” product is a wafer-scale interconnect. Imagine a giant silicon wafer that acts as a switchboard for light. You place dozens of GPUs on top of it, and they communicate via light without any cables at all.
  • Ayar Labs: They championed the remote laser concept (SuperNova). Their TeraPHY chiplets are arguably the most advanced “drop-in” optical I/O solution today.

The Future

We talk a lot about Moore’s Law (transistor density), but the real bottleneck today is the I/O Wall. A 100,000-GPU cluster is not limited by how fast it can think; it is limited by how fast it can share thoughts.

The era of the “Copper Chip” is ending. The next generation of supercomputers will be hybrids: Electronic brains thinking with electrons, but communicating with an optical nervous system of light.

Sources

🦋 Discussion on Bluesky

Discuss on Bluesky

Searching for posts...