Link Copied!

CXL: A Falha da "RAM Infinita"

Se não conseguirmos tornar a RAM rápida o suficiente, temos que trapacear. A solução da indústria para a escassez não é apenas "fazer mais chips" - é um novo protocolo chamado CXL que finalmente nos permite "baixar mais RAM".

🌐
Nota de Idioma

Este artigo está escrito em inglês. O título e a descrição foram traduzidos automaticamente para sua conveniência.

Visualização futurística do pool de memória CXL com fluxos de dados brilhantes conectando racks de servidores

Key Takeaways

  • The “Download RAM” Reality: CXL 3.0 allows servers to borrow memory from a shared pool over a PCIe-like bus, effectively solving physical slot limitations.
  • Solving the 25% Waste: Up to a quarter of data center memory sits idle (“stranded”) because it’s locked to a CPU that doesn’t need it. CXL liberates this capacity.
  • Speed of Light Physics: By using PCIe 6.0 PHY and PAM-4 signaling, CXL 3.0 achieves 64 GT/s, making remote memory fast enough to feel local.
  • The End of the Motherboard: We are moving toward “disaggregated computing” where CPU, RAM, and Storage are separate appliances connected by light and copper.

Introduction

For decades, the “download usage RAM” meme has been the ultimate litmus test for computer literacy. You can’t download hardware. RAM is a physical stick of silicon and capacitors, soldered or slotted directly next to the CPU because the speed of light is too slow to put it anywhere else.

But in 2025, the joke is becoming a reality. The industry has hit a wall: CPU core counts are exploding, but the number of memory channels we can jam onto a motherboard is not. We are starving our processors.

Enter Compute Express Link (CXL), specifically the 3.0 revision. It is a protocol that allows us to break the laws of traditional server architecture. By decoupling memory from the CPU and placing it into a shared “pool,” CXL allows one server to access memory physically located in another drawer—or even another rack—at speeds near enough to local DRAM that the software can’t tell the difference.

It is the closest we will ever get to an “Infinite RAM” glitch. Here is the physics, the economics, and the engineering behind the most important hardware shift of the decade.

Background: The “Stranded Memory” Crisis

To understand CXL, you have to understand the economic nightmare of modern hyper-scalers (like Google, AWS, and Azure).

The 25% Tax

In a traditional server, you buy a CPU and you fill the slots with RAM. If that server runs a compute-heavy task that uses 100% of the CPU but only 10% of the RAM, 90% of that RAM is wasted. It is “stranded.” It cannot be lent to the server next door that is crashing because it ran out of memory.

According to research from Microsoft and Marvell, approximately 25% of all data center memory is stranded at any given moment.

Wasted Capital=Total DRAM Spend×0.25\text{Wasted Capital} = \text{Total DRAM Spend} \times 0.25

With server DRAM sales projected to hit $40 billion by 2028, that is an $8 billion annual loss. In an era where AI models are doubling in size every few months, leaving that much capacity on the table is unacceptable.

Understanding CXL 3.0: The Physics of Cheating

CXL isn’t a new cable; it’s a protocol that rides on top of the PCIe electrical interface. If PCIe is the road, CXL is the traffic rules that allow cars (data) to drive at Ferrari speeds without crashing.

How It Works: Creating “Coherency”

The magic of CXL is Cache Coherency.

Normally, if a device (like a GPU or an SSD) writes to memory, the CPU doesn’t instantly know about it. The CPU might be holding an old copy of that data in its L1/L2 cache. This mismatch causes bugs. CXL introduces three protocols to fix this:

  1. CXL.io: Discovery and configuration (basic PCIe stuff).
  2. CXL.cache: Allows devices to snoop the CPU’s cache.
  3. CXL.mem: Allows the CPU to read/write the device’s memory as if it were system RAM.

CXL 3.0 takes this a step further with Memory Pooling.

The Bandwidth Math

CXL 3.0 typically uses the PCIe 6.0 physical layer. This relies on PAM-4 (Pulse Amplitude Modulation 4-level) signaling. Instead of sending 0s and 1s (NRZ), it sends four voltage levels (00, 01, 10, 11), effectively doubling the throughput per clock.

Throughput=64 GT/s×16 lanes×256242(FEC overhead)128 GB/s\text{Throughput} = 64 \text{ GT/s} \times 16 \text{ lanes} \times \frac{256}{242} (\text{FEC overhead}) \approx 128 \text{ GB/s}

While this is slower than the absolute fastest local DDR5 (which can push 200+ GB/s across multiple channels), it is fast enough for “Expansion Memory”—the tier of RAM used when the local allocation fills up.

The Evolution: Why CXL 3.0 is the “Glitch”

To understand why CXL 3.0 is a revolution, we must look at its predecessors.

  • CXL 1.0/1.1 was a simple point-to-point connection. You could plug a memory expander card into a server, and the CPU would see it. Useful, but static. It solved the “I need more RAM” problem but not the “Stranded Memory” problem.
  • CXL 2.0 introduced switching. Like a network switch, you could connect multiple devices to a single host, or partition a device among hosts. This was the birth of pooling, but it was limited to a single “fan-out” hierarchy.
  • CXL 3.0 is where the physics gets interesting. It introduces “Fabrics.” Instead of a simple tree structure, devices can speak to each other in a peer-to-peer mesh. A GPU can talk directly to a CXL memory stick without waking up the CPU. This reduces latency and mimics the “Infinite RAM” behavior of a localized cluster. It turns the entire data center into one giant motherboard.

The Deep Dive: Disaggregation and Fabrication

The implications of this go beyond just saving money. It changes how we build servers.

The Death of the Motherboard

Traditionally, a motherboard is a strict map: 2 CPU sockets, 32 DIMM slots. You live and die by that map. With CXL, the “motherboard” becomes a “backplane.”

  • Drawer 1: Just CPUs (Compute).
  • Drawer 2: Just RAM (Memory Pool).
  • Drawer 3: Just GPUs (Accelerators).

They connect via a CXL Switch. If Server A needs 1TB of RAM today for a massive database compile, the Switch assigns it. If it needs only 32GB tomorrow, the Switch reclaims that RAM and gives it to Server B for AI training.

Multi-Logical Devices (MLDs)

CXL 3.0 introduces MLDs, allowing a single stick of CXL memory to be sliced up and shared among up to 16 hosts simultaneously. Inside the memory controller, a unit called the Channel Management Unit (CHMU) acts as a traffic cop, ensuring that when Host A writes to Address 0x100, it doesn’t overwrite Host B’s data at the same physical address (unless they are intentionally sharing data, which CXL also supports).

Industry Impact

Impact on AI Training

AI training is memory-bound. The H100 and B200 GPUs are beasts, but they are often starving for data. CXL allows significantly larger model weights to be held in “near-memory,” reducing the need to fetch data from slow NVMe SSDs.

Impact on Cloud Pricing

This efficiency will likely lead to “Spot RAM” instances. Just as you can buy spot EC2 instances on AWS, you might soon be able to dynamically burst your RAM capacity for pennies on the dollar, using the stranded capacity of the rack next to you.

Challenges & Limitations

  1. Latency Penalty: Physics is stubborn. Adding a switch and a few feet of cable adds nanoseconds. CXL memory has about 170-250ns of latency, compared to 80-100ns for local DRAM. It’s a “Tier 2” memory—faster than SSDs, but slower than local RAM.
  2. Signal Integrity: PAM-4 signaling is fragile. At 64 GT/s, a signal degrades after just a few inches of PCB trace. Complex retimers and high-quality cabling are required, raising the upfront cost.
  3. Software Support: Operating systems need to be “CXL-aware.” The OS kernel must be smart enough to place “hot” data in local RAM and “warm” data in CXL RAM (Tiering).

Forward-Looking Analysis: The 5-Year Outlook

Short-Term: The Hybrid Era (2025-2026)

We will see “CXL Expanders”—add-in cards that look like SSDs but contain RAM. These will be plugged into standard servers to add 512GB or 1TB of cheap capacity.

Medium-Term: Rack-Scale Architecture (2027+)

The server chassis will disappear. We will buy “Compute Bricks” and “Memory Bricks.” Data centers will look less like rows of pizza boxes and more like massive, liquid-cooled hives of disaggregated resources.

The “Infinite Memory” Horizon

Eventually, applications won’t know where their memory is. An application could request 100TB of RAM, and the CXL fabric will cobble it together from stranded capacity across the entire data center floor.

Conclusion

CXL is the industry’s admission that Moore’s Law for pins and wires has ended. We can’t cram more wires into a CPU socket. So, we stopped trying to make the pipe bigger, and instead built a smarter plumbing system.

For the average user, this means cloud services that are cheaper and faster. For the engineer, it means the end of “Out of Memory” errors. The RAM isn’t infinite, but for the first time in history, it’s finally fluid.

Sources

🦋 Discussion on Bluesky

Discuss on Bluesky

Searching for posts...