Race to build the worlds first exascale systems heats up as U.S. Department of Energy awards Intel and Cray a contract to build 3 exascale-class supercomputers.
SAN JOSE, Calif. — The U.S. Department of Energy awarded Intel and Cray a contract for more than $500 million to build the first of three exascale-class supercomputers. Intel said that the system will be running before the end of 2021 using future Xeon processors, Optane DIMMs, and a so-called Xe product, believed to be a member of the GPU family in design under Raja Koduri.
The DoE plans to spend a total of $1.8 billion on the three systems. It is expected to soon announce that the team of IBM and Nvidia will build the two other systems using their future power processors and GPUs.
The Intel/Cray system, called Aurora, will be built at the Argonne National Lab using more than 200 Cray cabinets and likely be the first of the three in service. The IBM/Nvidia systems would be follow-ons to Summit and Sierra, currently ranked as the two most powerful supercomputers in the world at 143 and 94 petaflops, respectively.
The DoE’s other two exascale systems, called Frontier and El Capitan, will be built at the Oak Ridge and Lawrence Livermore Labs, respectively, where the Summit and Sierra systems currently run. All of the U.S. systems are expected to deliver a peak performance of up to 1.3 exaflops, using up to 8 petabytes of memory and consuming about 40 MW.
Representatives of Intel and Argonne declined to reveal the size, power consumption, and architecture of Aurora or details of the Xe accelerator. The system will use the Slingshot fabric in Cray’s Shasta systems and run a combination of systems software from Cray and a set of yet-to-be defined programming libraries that Intel calls OneAPI.
The 2021 target could mean a fairly tight design cycle for the new Intel GPU in the works, said veteran graphics analyst Jon Peddie. Alternatively, the Aurora accelerator might be an FPGA or multicore x86 part replacing the discontinued Xeon Phi or a hybrid. Intel could also leverage technology that it acquired with Nervana and intends to ship later this year in versions for AI training and inference.
Worldwide, at least eight major exascale systems are in the works, including the three U.S. efforts. Three projects are competing in China to launch as early as 2020, although experts believe that they may not get up and running until 2021.
“It might be a nail biter” to see who gets bragging rights of having the first exascale system, said Jack Dongarra, a co-author of the Top 500 supercomputer list and a professor at the University of Tennessee.
A supercomputer roadmap for U.S. national labs.
A quick survey of exascale efforts around the globe
One exascale project in China is a follow-on of the Sunway TaihuLight in Wuxi, currently the world’s third most powerful system. It uses a whopping 10.6 million proprietary cores to deliver 93 petaflops.
The second China effort is a follow-on to the Tianhe-2A in Guangzhou currently using Xeon CPUs and Matrix-2000 accelerators designed by China’s National University of Defense Technology. It is ranked fourth worldwide at 61 petaflops. China’s third exascale effort is a new project led by server maker Sugon with x86 chips believed to be derived from AMD’s Zen core as part of a 2016 joint venture.
Elsewhere, researchers in Europe and Japan have separate exascale efforts, both based on Arm cores. Fujitsu is building the Post-K system and Bull is leading the European effort.
All of the systems aim to handle a mix of traditional high-performance computing jobs such as complex simulations as well as emerging workloads based on deep learning and other analytics algorithms. As such, the new systems will be important in proving grounds for accelerator technologies already popular in the Top 500.
So far, Intel listed only generic ingredients for Aurora.
Nvidia GPUs are, by far, the most popular accelerators in the Top 500 systems, distantly followed by Intel’s Xeon Phi, now being replaced presumably by its next-generation GPU. Intel dominates in CPUs, appearing in more than 95% of the systems.
The systems are typically measured using a Linpack benchmark using 64-bit floating-point performance. Deep-learning jobs typically use lower-precision math, meaning that these tasks could effectively see performance in the range of multiple exaflops on the new systems, said Dongarra.
Exascale systems are also expected to break new ground in optical networks and interconnects. For example, Intel said that it plans to use silicon photonics in the Aurora system.
“In general, [the new systems] will create another wave of acceleration across many areas of science, technology, and health care,” said Rick Stevens, an associate lab director at Argonne, announcing the Intel/Cray deal.
Aurora will handle a broad range of jobs for DoE and academic researchers. They include materials research in batteries and photovoltaics, earthquake and climate forecasting, and optimizing efficiency of wind turbines, as well as studying cancer and traumatic brain injuries in veterans.