Efficient Processor-in-Memory Chip Accelerates AI Inference

Article By : Sally Ward-Foxton

This technique can enable extremely energy efficient AI inference and will be available on GloFo’s 22FDX process...

Imec and GlobalFoundries have demonstrated a processor-in-memory chip that can achieve energy efficiency up to 2900 TOPS/W, approximately two orders of magnitude above today’s commercial processor-in-memory chips. The chip uses an established idea, analog computing, implemented in SRAM in GlobalFoundries’ 22nm fully-depleted silicon-on-insulator (FD-SOI) process technology. Imec’s analog in-memory compute (AiMC) will be available to GlobalFoundries customers as a feature that can be implemented on the company’s 22FDX platform.

Imec Processor-In-Memory Chip
Imec’s AnIA test chip, seen here mounted on the PCB used for measurement and characterization, can achieve up to 2900 TOPS/W (Image: Imec)

Analog compute
Analog compute, or processor-inmemory, is an established technique that is already used in commercial AI accelerator chips from startups Mythic, Syntiant, Gyrfalcon and others.


Recommended

How to choose the right processor IP for your ML application
Silicon 100: Emerging Startups to Watch


Since a neural network model may have tens or hundreds of millions of weights, sending data back and forth between the memory and the processor is inefficient. Analog computing uses a memory array to store the weights and also perform multiply-accumulate (MAC) operations, so there is no memory-to-processor transfer needed. Each memristor element (perhaps a ReRAM cell) has its conductance programmed to an analog level which is proportional to the required weight.

Applying a voltage proportional to the input activation (via digital-to-analog conversion — on the left of the diagram below) means the current through each element is proportional to the product of the activation and the weight. The current through each vertical bit-line (vertical lines in the diagram below) is the sum of these activation-weight products, which can be fed through an analog-to-digital converter. This sum of the activation-weight products is critical to the calculation of neural network algorithms.

Processor-In-Memory Analog Compute Diagram
Analog computing uses an array of memristor cells to calculate matrix vector multiplication without having to send data between memory and processor (Image: Imec)

“In practice, many options are possible besides ReRAM — we can use MRAM, Flash, DRAM… the objective of this program is to understand which is best for the application and to optimize the options for each application domain,” explained Diederik Verkest, program director for machine learning at Imec.

Test chip
Imec has built a test chip, called analog inference accelerator (AnIA), based on GlobalFoundries’ 22nm FD-SOI process. AnIA’s 512k array of SRAM cells plus digital infrastructure including 1024 DACs and 512 ADCs takes up 4mm2. It can perform around half a million

Ioannis Papistas
Ioannis Papistas (Image: Imec)

computations per operation cycle based on 6-bit (plus sign bit) input activations, ternary weights (-1, 0, +1) and 6-bit outputs.

“We are able to produce the matrix vector multiplication output at different supply voltages, 0.8 and 0.6V,” said Ioannis Papistas from Imec’s machine learning group. “Operating at lower supply voltages without affecting the accuracy of the operation can significantly reduce the power consumption of operation, which is especially important for inference in energy constrained systems. This is an important feature of our design, enabled by the 22FDX process, that enables competitive inference on the edge.”

Imec showed accuracy results for object recognition inference on the CIFAR 10 dataset which dropped only one percentage point compared to a similarly quantised baseline. With a supply voltage of 0.8 V, AnIA’s energy efficiency is between 1050 and 1500 TOPS/W at 23.5 TOPS. For 0.6 V supply voltage, AnIA achieved 5.8 TOPS at around 1800-2900 TOPS/W.

Processor-In-Memory Chart
Energy efficiency for various AI accelerators compared to Imec’s AnIA test chip (Click to enlarge) (Image: Imec)

Mainstream innovation
“The innovation [Imec presented] is going to become mainstream,” said Hiren Majmudar, VP and GM of GlobalFoundries’ computing business unit. “We are seeing partners, customers of GlobalFoundries who are in the post-production stage with validated silicon… we expect that analog compute-based silicon will be hitting production around the end of this year or early next year. In terms of the mass market deployment, we anticipate analog compute to start getting into mass market certainly no later than 2022. But it could potentially happen sooner than that.”

Diederik Verkest
Diederik Verkest (Image: Imec)

GlobalFoundries is working to include Imec’s AiMC technology as a feature that can be implemented on the 22 FDX platform to enable energy-efficient AI accelerators. The FD-SOI process is designed for low power consumption, with the ability to operate down to 0.5 V with 1 pico amp per micron for ultralow standby leakage. 22FDX with the new AiMC feature is in development at GlobalFoundries’ 300mm production line at Fab 1 in Dresden, Germany.

As for Imec, the machine learning program will continue. The group’s ambition is to reach 10,000 TOPS/W (10 TOPS below 100mW) for always-on smart sensors and consumer wearables, said Verkest.

“In our ML program, our next steps are to reduce the size of these compute cells and to start looking at emerging memory devices as a next generation implementation for this principle,” he said.

Join the Conversation

  1. Jonathan Levitt says:

    The privacy, security and latency benefits of this new technology will have an impact on AI applications in a wide range of edge devices, from smart speakers to self-driving vehicles. Jonathan from https://redbytesite.com