Nvidia introduced a reference design platform for GPU-accelerated Arm-based servers. Arm, Ampere, Cray, Fujitsu, HPE, and Marvell are all going to build such servers for a variety of applications including, of course, supercomputing, the GPU vendor announced at Supercomputing 2019 in Denver.

Nvidia has been casually experimenting with Arm processors for supercomputers for at least 8 years, but its full support had previously been reserved for the x86 and Power ecosystems. Last summer it vowed to fully support Arm with its CUDA-X software platform, and today’s announcement is delivering on that promise. The company is initially making its Arm-compatible software development kit available as a preview.

Nvidia had two other announcements at the conference. One was the introduction of a software suite called Magnum IO that is “optimized to eliminate storage and input/output bottlenecks” in data centers by creating a path for GPUs to access data from memory directly, gaining a performance boost largely by bypassing CPUs. The other was the availability of a new kind of GPU-accelerated supercomputer in the cloud, accessible through Microsoft Azure.

Reference server
Arm is increasingly being used in supercomputing. Nvidia reported that Oak Ridge and Sandia National Laboratories (U.S.), the University of Bristol (U.K.), and Riken (Japan), have all begun testing GPU-accelerated Arm-based computing systems. Arm’s success in the supercomputing sector has caught the attention of hyperscale-cloud operators and enterprises.

Nvidia's Arm server reference design

Nvidia's Arm server reference design

The Arm architecture was designed for high throughput, low application latency, and power management. Arm licensees note that mix of features is particularly germane to artificial intelligence (AI) and machine learning (ML) workloads in data centers.

An issue for Nvidia is ensuring that its GPUs work with other companies’ individual Arm implementations, and in a private briefing with the media, Nvidia made it clear that it is going through the rigorous process of assuring integration with the likes of Ampere, Fujitsu and Marvell will be smooth.

Marvell issued a separate announcement about Nvidia GPU support of its ThunderX family of Arm-based server processors, which it said were particularly suitable for AI and ML.

Nvidia also noted its reference platform “also benefits from strong collaboration with Cray, a Hewlett Packard Enterprise company, and HPE, two early providers of Arm-based servers.”

The GPU-accelerated server platform includes CUDA-X libraries and development tools for accelerated computing. Nvidia said a wide range of high performance computing (HPC) software companies have used NVIDIA CUDA-X libraries to build GPU-enabled management and monitoring tools that run on Arm-based servers.

Magnum IO
Data centers have been experimenting with different approaches to interconnect to accelerate their operations. That’s where Nvidia’s Magnum IO comes in.

Its key feature is GPUDirect, which Nvidia said provides a path for data to bypass CPUs and travel on “open highways” offered by GPUs, storage and networking devices. Another element is GPUDirect Storage, which Nvidia said enables researchers to bypass CPUs when accessing storage and quickly access data files for simulation, analysis or visualization.

In the media briefing, Nvidia said Magnum IO delivers up to 20x faster data processing, based on the TPC-H benchmark – a level of performance Nvidia said was suitable to carry out complex financial analysis, climate modeling and other HPC workloads.

The company said it developed Magnum IO with DataDirect Networks, Excelero, IBM, Mellanox and WekaIO.

The Nvidia DGX SuperPOD

The Nvidia DGX SuperPOD

Magnum IO software is available now, with the exception of GPUDirect Storage, which is currently available to select early-access customers. Broader release of GPUDirect Storage is planned for the first half of 2020, the company said.

Azure supercomputer
Nvidia also announced the availability of what it described as a new kind of GPU-accelerated supercomputer. Azure’s new NDv2 instance offers up to 800 Nvidia V100 Tensor Core GPUs interconnected on a single Mellanox InfiniBand backend network.

The most basic NC6 (6-core) instance is available for 90 cents an hour — 30 cents with a long-term subscription.

As a practical matter, what does that mean? Nvidia and Microsoft used 64 NDv2 instances on a pre-release version of the cluster to train BERT (a commonly-used conversational AI model), in roughly three hours.

Nvidia said all NDv2 instances benefit from the GPU-optimized HPC applications, machine learning software and deep learning frameworks like TensorFlow, PyTorch and MXNet from the NVIDIA NGC container registry and Azure Marketplace.