Cray won a $600 million deal to build the El Capitan exascale system, sweeping all three next-gen supercomputers for U.S. government labs.
SAN JOSE, Calif. — Cray Inc. won a $600 million contract for the El Capitan exascale system, but it’s processor partner will be selected later. It’s the third of three high-end supercomputers Cray will supply the U.S. Department of Energy (DoE).
El Capitan will have a peak performance of more than 1.5 exaflops when it is commissioned in 2023. The estimated 30 MW system will use a combination of processors and accelerators to be chosen closer to its time of assembly.
The DoE plans to spend a total of $1.8 billion on the three systems. In May, the DoE awarded Intel and Cray a contract to build Aurora at Argonne National Laboratory, expected to be the first exascale system in the U.S. It will be running before the end of 2021 using future Xeon processors, Optane DIMMs, and Intel's Xe accelerator.
Also in May, the DoE awarded AMD and Cray a $600 million deal to build Frontier at Oak Ridge National Laboratory before the end of 2021. It will use AMD Epyc CPUs and Radeon GPUs as accelerators and, like El Capitan, it will deliver 1.5 exaflops.
The world’s current two fastest supercomputers, Summit and Sierra, both use IBM Power 9 processors and Nvidia GPU accelerators, delivering 148 and 94 petaflops, respectively. However, the follow-ons for those chips have not made an appearance so far in the DoE’s exascale contracts.
IBM is slated to describe a next-generation Power processor later this month at the Hot Chips conference. Cray said its Shasta systems are processor and accelerator agnostic.
For its part, Cray will supply a systems architecture that includes its Slingshot interconnect for all three DoE exascale systems as well as a new software platform for El Capitan. Singshot is based on a Cray ASIC that delivers up to 200 Gbits/second per port and supports liquid cooling. The next-gen software includes support for Kubernetes containers to converge traditional high-performance computing jobs with newer machine-learning tasks
Despite its across-the-board wins in DoE exascale contracts, bids from “a number of competitors” for El Capitan were “tremendously competitive,” said Bill Goldstein, a director at Lawrence Livermore National Laboratory that will host the system.
“Cray was best suited for the types of problems we have to solve and the best value for U.S. taxpayers — it was bang-for-the-buck competition,” he said.
For Cray, the deals mean it has booked about $1.5 billion in orders for its Shasta systems before the first units are delivered, said Cray chief executive Pete Ungaro.
El Capitan’s main mission will be to run classified models and simulations for U.S. nuclear weapons, the only way the systems have been evaluated since live tests were banned in 1992. The system’s performance allows moving some current 2D work to much higher resolution 3D tasks.
“We are currently redesigning both the warhead and delivery systems, and that’s the first time we’ve done that in 30 years. Every component of must be redesigned and re-manufactured. This is a new kind of problem that can’t be done in 2D, and El Capitan is being delivered just in time to solve this problem,” said Goldstein
The system will also run other classified jobs in areas such as cybersecurity. Eventually, parts of the system will be made available for unclassified jobs, Goldstein added.
The three U.S. exascale projects are in competition with three similar projects in China. Just who will have bragging rights for the first exascale system remains unclear. In recent years, China has led the Top 500 list several times, and it now has more Top 500 supercomputers than the U.S.