LONDON — Graphcore’s AI accelerator chip, the Colossus intelligence processing unit (IPU) is now available for customers to use as part of Microsoft’s Azure cloud platform.

This is the first time any major cloud service provider has publicly offered customers the opportunity to run their data on an accelerator from any of the dozens of AI chip startups and as such, it represents a big win for Graphcore. Microsoft has said access will initially be prioritised for customers who are “pushing the boundaries of machine learning”.

 

Graphcore Colossus IPU chip

Microsoft Azure customers now have access to Graphcore’s IPU AI accelerator (Image: Graphcore)

Microsoft and Graphcore have been working together for two years to develop cloud systems and build enhanced vision and natural language processing models for the Graphcore IPU. In particular, the natural language processing (NLP) model, Google’s BERT (bidirectional encoder representations from transformers), which is currently very popular with search engines, including Google themselves.


Recommended
Graphcore CEO Touts 'Most Complex Processor' Ever
GPUs Holding Back AI Innovation


Using eight Graphcore IPU processor cards (each with a pair of Colossus accelerators), BERT can be trained in 56 hours, similar to the result for GPU with PyTorch, though it is faster than the GPU with TensorFlow (see graph below). Graphcore says customers are seeing BERT inference throughput increase threefold, with 20% improvement in latency.

Given the level of hype surrounding Graphcore — the company is valued at $1.7 billion — these performance improvements seem rather modest. It remains to be seen whether the promised improvement is enough to tempt customers into optimising their models for the IPU.

Bert Training Graph


Click here for larger image
Training results for BERT on Graphcore IPU versus GPUs running PyTorch and TensorFlow (Image: Graphcore)

BERT Inference graph


Click here for larger image
Inference results for BERT on Graphcore IPU versus GPU (Image: Graphcore)

 

Advanced models
At the same time, Graphcore has also released some results on more advanced models, where it showed more dramatic performance improvements.

Inference on image processing model ResNext was accelerated 3.4x in terms of throughput at 18x lower latency, compared to a GPU solution consuming the same amount of power. ResNext uses a technique called group separable convolutions, which splits convolution filters into smaller separable blocks to increase accuracy while reducing the parameter count. This approach is well-suited to the IPU, Graphcore says, because of the chip’s massively parallel processor architecture and more flexible, high-throughput memory; smaller blocks of data can be mapped to thousands of fully independent processing threads.

 

BERT Inference graph


Click here for larger image
Graphcore’s IPU outperforms the GPU solution on ResNext-101 inference, with higher throughput and lower latency (Image: Graphcore)

Graphcore also showed good results for Markov Chain Monte Carlo (MCMC)-based models, a new type of probabilistic algorithm which is used for modelling financial markets. This type of model has been out of reach for many in the finance industry, as it was previously considered too computationally expensive to use, said Graphcore. Early access IPU customers in the finance sector have been able to train their proprietary, optimised MCMC models in 4.5 minutes on IPUs, compared to over 2 hours with their existing hardware, a 26x speed up in training time.

Reinforcement learning (RL), another popular technique in modern AI algorithm development, can also be accelerated compared to typical existing solutions. Graphcore cited a factor of ten improvement in throughput for RL models, even before they are optimised for the IPU.