CXL Spec Forging Ahead with High Performance Computing

Article By : Gary Hilson

Running over PCIe physical layer makes open interconnect easier to adopt...

The Compute Express Link (CXL) specification is forging ahead at a steady pace. Version 2.0 of the open industry-standard interconnect is now available less than two years after its initial inception, while consortium member vendors already releasing products using the latest iteration. Like the now mature Non-Volatile Memory Express (NVMe) interface specification, CXL 2.0 is adding new features and functionality to meet increased performance demands while staying backwards compatible with its predecessors — CXL 1.0 was released in March 2019 and 1.1 was announced in June of the same year. Updates in 2.0 are being driven by rapidly evolving datacenter architectures that must support the growing demands of emerging workloads for artificial intelligence (AI) and machine learning (ML). The continued proliferation of cloud computing and the “cloudification” of the network and edge are also factors. In a briefing for EE Times, CXL Consortium director Larrie Carr said these trends are compounding existing challenges in the data center, such as increasing demand for heterogeneous computing and server disaggregation and a need for increased memory capacity and bandwidth.
CXL’s three protocols can be used alone or in combination for specific use cases, accelerators in memory to support dense computation or memory buffers to support memory capacity expansion and storage class memory. (Courtesy CXL Consortium). Click on the image for a larger view.
Running over the PCIe physical layer, CXL is an open industry-standard interconnect, which offers coherency and memory semantics using high-bandwidth, low-latency connectivity between host processor and devices such as accelerators, memory buffers, and smart I/O devices. Among the additional features in version 2.0 are support for switching to enable device fan-out, memory scaling, expansion and the migration of resources; memory pooling support to maximize memory utilization, limiting or eliminating the need to overprovision memory; and, standardized management of the persistent memory interface while enabling simultaneous operation alongside DDR, freeing up DDR for other uses. The updates to CXL 2.0 mean it’s easier to assign an end device to any one of 16 hosts through the concept of memory assignment, said Carr. “If a given host does not want to use the end device anymore, the CXL switch would allow and manage a hot plug event for that device to be disconnected for one host and reassigned to another host for memory pooling.” As part of the overall CXL 2.0 specification update, a working group was established to focus on how to provide a standardized interface to persistent memory. By defining a standard API for management, he said, anyone can add a persistent memory to a CXL-connected port in a standard manner. Similar to NVMe, “CXL allows anyone to bring their memory technologies to market and leverage the existing software ecosystem.” Combined with the switching capabilities, said Carr, there’s a great deal of flexibility about memories can be tapped into. CXL can be further broken down into its three protocols: CXL.io, CXL.cache, and CXL.memory, which can be used alone or in combination for specific use cases. An example of all three being used at the same time might be accelerators in memory to support dense computation, while memory buffers would pair CXL.io and CXL.memory to support memory capacity expansion and storage class memory. CXL’s fast development and uptake can be attributed to the broad industry support and participation in development of the standard, which transcends memory and component vendors to include big players such as Google, IBM, Facebook, and Intel on its board of directors, the latter of which is often seen as driving the market when it comes to semiconductor segments such as memory. As a CXL consortium member, Microchip Technology was quick out of the gate with a CXL 2.0 product, announcing its latest low latency PCI Express 5.0 and CXL 2.0 retimers known as XpressConnect. Like the overall CXL specification, the retimers address the high-performance computing demands of data center workloads by supporting ultra-low latency signal transmission required to support AI, ML, and other computation workloads, even advanced driver assisted systems (ADAS) in vehicles, said Ahmad Danesh, Microchip’s product marketing and strategy manager for data center solutions. PCIe retimers are usually implemented as an integrated circuit (IC) chip placed on a PCB that can be used to extend the length of a PCIe bus. The retimer takes care of the discontinuities that are caused by interconnect, PCB, and cable changes that lead to poor PCIe signals by outputting a re-generated signal as if it were a fresh PCIe device, in both directions.
Among the additional features in CXL 2.0 are support for switching to enable device fan-out. (Courtesy CXL Consortium).
Because CXL leverages PCIe, Microchip’s XpressConnect support a wide range of PCIe and CXL devices and are available in multiple lane count variants of up to 16 lanes of PCIe Gen 5, he said. By reusing the PCIe 5.0 physical layer, CXL creates a protocol overlay to support a whole host of new components that can be attached to provide high bandwidth, low latency connectivity between compute, accelerators, memory devices, smart IO devices, including smart NICs. However, there are challenges when connecting extremely low latency devices via CXL, said Danesh, because the PCIe retimer specs initially came out for Gen 3, the latency targets and the way they were architecting retimers at that time weren’t thinking about the use cases of CXL. Rather, the focus was on dealing with block I/O transactions and talking to NVMe drives. “CXL comes along and it broke the model. You have these very latency sensitive systems,” he said, and this challenge is further compounded by CXL 2.0 adding the ability to provide memory expansion through the use of switches. Microchip didn’t have to revamp an existing product, however, as this is the company’s first re-timer, and it’s designed for CXL 1.1, 2.0, and PCIe Gen5, the latter of which is also a departure in some ways from its predecessors, said Danesh. “It took years to get from Gen3 to Gen4 and very quickly we transitioned to Gen5.” A significant change in the last few years has been that applications are not only driving unprecedented growth in actual data, but the amount of computation that needs to happen on that data, he said. “We have more sources of data creating these increasingly larger data sets and we also now need more efficient ways to access and process data.” That means all the “plumbing” has to keep up with faster speeds and the new challenges something like CXL creates, such as maintaining signal integrity in a system where there’s large enclosures running PCIe Gen5 and CXL 2.0 signals, said Danesh. “You need retimers just to physically solve those reach problems. We’ve addressed the reach problem by providing lower cost devices, so you can use a retimer and then you can use less expensive board materials and less expensive cables to get to that same distance.” He said a lot of applications are going to require a re-timer as they take advantage of the various devices being connected via CXL, including non-volatile persistent memory type of devices or volatile DRAM. “The latencies you’re dealing with are in the tens of nanoseconds, not in the thousands of nanoseconds like it was for PCIe attached devices.” Danesh said the re-usage of the PCIe physical layer allows for simple adoption of CXL because it has native support on the CPU side, unlike Gen-Z, which requires translation from CXL, and he expects an uptick in devices supporting CXL 1.1 devices this year.

Leave a comment