Has Cerebras solved the classic wafer-scale challenges?
Amid a flurry of announcements from SC ’19, there was one in particular that caught my eye.
You may remember EETimes’ coverage of the Cerebras wafer-scale AI accelerator from back in the summer. The company’s wafer scale engine (WSE) is made up of processing tiles that fill an entire wafer – overall there are 1.2 trillion transistors, 400,000 compute cores and 18GB of on-chip memory. In short, it’s a monster, but a monster specifically created to face the unique challenges of processing AI workloads.
Wafer-scale devices fabricated over the decades have been unsuccessful, so of course there were numerous questions about Cerebras’ approach when it unveiled its device. For example, how does the company deal with wafer defects related to normal manufacturing yield? Cerebras assured us they can successfully route around them with software. And how do you program this thing? Again, Cerebras said they have it under control, with a software platform that includes a graph compiler and fits into existing workflows.
Powering and cooling such a large device was also a big question mark. With the launch of its CS-1 system, a computer built around this extraordinary device, Cerebras has now revealed a few small practical details.
The Cerebras wafer scale engine is effectively a single chip the size of an entire die (Image: Cerebras)
The CS-1 is 26” tall and occupies a third of a data centre rack (15 rack units). This system, says the company, can replace hundreds or thousands of GPUs, which would need dozens of racks. There is one WSE in each system and it is fed with 1.2 Tbps of data (twelve 100-Gigabit Ethernet lanes).
Documents released by Cerebras reveal that “powering and cooling the world’s largest and fastest processor chip is an exceptionally challenging undertaking,” and while they don’t say exactly how much power is required, they do give us a clue. Apparently, the system uses less than one-tenth of the power (per compute unit) that a GPU-based system would, and they also mention that a GPU-based system on the same scale would require hundreds of kilowatts. Perhaps it’s safe, then, to assume the CS-1 draws power in the tens of kilowatts range.
If you can fit three CS-1s in each rack, that is a hefty amount of power. Estimates for the highest power density in today’s datacentres vary from 15-20kW per rack up to 40kW per rack, so even at the low end of the estimate, we are talking about a lot of power being consumed by the CS-1. In any case, the system uses twelve power supplies in a 6+6 redundant configuration.
The CS-1 is, unsurprisingly, water cooled. Two pumps move water through a manifold at the back of the WSE. The water then goes through a heat exchanger, which exhausts heated air through four fans.
The Cerebras CS-1 system. At the very top left are the twelve ethernet connections for connection to datacentre infrastructure. Twelve power supplies are visible in the upper left quadrant, with the two water pumps for the cooling system at the top right. In the bottom half of the case are four cooling fans (Image: Cerebras)
The WSE itself is mounted in a specially-designed package that Cerebras calls the “engine block”.
The engine block sandwiches step-down power modules in front of the motherboard, with a cold plate behind it. The manifold directs cooling water across the back of this cold plate to cool the motherboard.
The large amount of current required means power has to be delivered through the main board, rather than at the edges of the wafer. Problems with the difference in the coefficient of thermal expansion (CTE) between the board and the WSE, which mean the two expand at different rates as they get hot, have been solved by a custom connector design which maintains electrical conductivity. Cerebras did not provide further details on this connector, unfortunately.
The Cerebras “engine block” housing its wafer scale engine. On the left are the power pins, connected to step-down power modules. The motherboard, in red, is just visible in the centre of the sandwich. On the right is the brass manifold which connects to the water pumps (round disks) (Image: Cerebras)
The CS-1 is already in use at the US Department of Energy’s Argonne National Laboratory in Lemont, Illinois, the same site that will host the forthcoming Aurora exascale supercomputer. At Argonne, the CS-1 is used to apply huge neural networks to such complex problems as cancer drug response prediction, the properties of black holes and the understanding and treatment of traumatic brain injuries.
As more practical details of the system surrounding Cerebras’ wafer scale device emerge, this information helps to make what seemed like an incredible idea feel a lot more realistic. The CS-1 is already in use at both Argonne and the Lawrence Livermore National Laboratory, who will no doubt be putting it through its paces in terms of workload. Are these practical, albeit academic, systems evidence that Cerebras has effectively solved the wafer-scale challenges of yield, power and mismatched CTEs? It certainly looks like it.