Graphcore Takes on Nvidia with New AI Chip

Article By : Sally Ward-Foxton

Graphcore takes on Nvidia with 8-chip IPU Machine designed to compete directly with Nvidia DGX-A100...

British startup Graphcore has unveiled its second-generation IPU (intelligence processing unit), the Colossus Mark 2, an enormous 59.4 billion-transistor device designed to accelerate AI workloads in the data center. The company also launched a 1U server blade for data centres which incorporates four of the Colossus Mark 2 chips and allows scalability to supercomputer levels. This new offering is designed to place Graphcore in firm competition with market leader Nvidia for large scale data center AI acceleration.

Graphcore Mark 2 Chip

Graphcore’s Colossus Mark 2 chip (Image: Graphcore)

Graphcore’s Mark 1 device was released in 2018. Mark 2, which has migrated from TSMC 16nm to TSMC 7nm, achieves 250 TFlops with 1472 independent processor cores. The new chip has three times the amount of RAM – 900MB on-chip, up from 300MB in the previous versions. Graphcore’s figures have performance up roughly 8X compared the Mark 1 overall; versus 8x IPU Mark 1s, 8x Mark 2s can perform BERT training 9.3x faster, BERT-3Layer inference is 8.5x faster and EfficientNet-B3 training is 7.4x faster.

The IPU Machine (part number M2000) is a 1U server blade with four Colossus Mark 2 chips on it, offering a Petaflop of AI compute at FP16 precision.

“This is really the product that Graphcore has been working on since we started the company and that we have wanted to produce,” said Graphcore CEO Nigel Toon.

“The innovations are more than just going from TSMC 16nm to 7nm, the other innovations such as on chip RoCE and new AI number format plus more all add up. It keeps Graphcore ahead of Nvidia’s latest Ampere [offering] so it’s important timing for Graphcore,” said Michael Azoff, Chief Analyst, Kiasco Research.

Toon showed a side-by-side comparison showing what Graphcore offers at a similar price point versus Nvidia’s DGX-A100 system; launched a couple of months ago, DGX-A100 is powered by eight state-of-the-art 7nm Ampere A100 GPUs. A similar budget will buy you eight IPU Machines (24 IPU chips total), occupying 8U compared to the DGX-A100’s 6U. But Graphcore’s figures have their system offering 12x the FP32 (AI training) compute, and 3x the FP16 compute. It would also offer 10x the memory, allowing much bigger models to be supported. Overall, Graphcore believes such a system would offer a 16x performance advantage when training EfficientNet.

“[This would translate to] either much lower cost, less power or faster training, whichever parameter is most important for customers,” Toon said.

Graphcore takes on Nvidia: side by side comparison
Graphcore’s comparison of what they offer for around the same price as an Nvidia DGX-A100 (Source: Graphcore). Note that an Nvidia DGX-A100 is 6U compared to 8x Graphcore IPU Machines at 8U.
“The second generation Graphcore IPU is impressive from a performance standpoint with three times more memory, but I think easy scalability is perhaps its greatest feature,” said Karl Freund, senior analyst for AI at Moor Insights & Strategy. “The new fabric extends processing to literally thousands of IPUs, while the new IPU Machine enables a plug-and-play scalable infrastructure. With this new product, Graphcore may now be first in line to challenge Nvidia for data center AI, at least for large-scale training.”

While the IPU Mark 1 had IPU-Links to connect multiple chips together, Graphcore has built a new IPU-Fabric chip that supports building systems of up to 64,000 IPUs. IPU Machines can be connected directly, box-to-box, but also through switches to built out even larger configurations. Crucially, it’s easy to add IPU Machines to existing data centre infrastructure without having to add additional servers and networking cards.

Graphcore takes on Nvidia with IPU Machine

Graphcore’s IPU Machine, a 1U server blade with four of the new chips (Image: Graphcore)

“Unlike normal Ethernet switching, where everything connects to everything else, that’s not required in AI,” Toon explained. “So we have built a ring system for communication and we support collectives and all-reduce operations across those links to reduce the amount of communication that is actually required, and to reduce the power that is taken by the communications piece.”

Other improvements in the Mark 2 include updated Poplar software, a new high-speed memory technology and improved libraries that support sparse models.

Sales channel
Graphcore also announced Atos as a new channel partner for the IPU Machine (first-generation accelerator cards are already available in Dell EMC servers). Atos said it is already planning IPU clusters with its customers, who are largely European laboratories and scientific institutions. Toon also mentioned that Graphcore expects to announce a new channel partner program in September which will allow more OEMs to offer the IPU Machine as a white-label product.

Colossus Mark 1 products will continue to be available for now, but the Mark 2 products will ultimately supersede them, according to Toon.

Explore More:

How to choose the right processor IP for your ML application

Take advantage of wide-ranging AI opportunities. GigaOm report shows you how to devise, define, and deploy the right AI for job.

Leave a comment