« Previously: Road to innovating machine learning chips

CEA’s ambition is to develop a neuromorphic circuit. The research institute believes such a chip is “a valid complement to deep learning to extract information from data close to the sensors.” Before achieving that goal, CEA sees a few more interim steps. Development tools like N2D2 are paramount for chip designers to develop customised solutions for “high TOPS (tera operations per second) per Watt performance DNN."

Further, those who look to leverage DNN on edge computing can use an actual piece of hardware to experiment with it. For that, CEA is offering an ultra-low power programmable accelerator, called P-Neuro.

The current P-Neuro neural network processing unit is based on FPGA. However, CEA is turning that FPGA into an ASIC, according to Duranton.

In its lab, Duranton demonstrated a face-detection convolution neural network application on the FPGA-based P-Neuro. The demo compared P-Neuro with an embedded CPU (quad ARM cores on Raspberry Pi; Samsung Exynos running on Android), all running the same embedded CNN application, tasked to do “face extraction” out of a database of 18,000 images.

At the time of the demonstration, the speed of P-Neuro's performance was at 6,942 images per second, with energy efficiency at 2,776 images per Watt. Compared to the embedded GPU (Tegra K1), P-Neuro based on FPGA running at 100MHz has proven to be faster by a factor of two and four to five times more energy efficient.

P-Neuro is built on clustered SIMD architecture, featuring optimised memory hierarchy and interconnect.  
Pneuro 421(cr) Figure 1: P-Neuro's diagram showing optimised memory hierarchy and interconnect.  

Working on new projects

For CEA researchers, P-Neuro “is a chip for short term,” stressed Duranton. P-Neuro is built on a full CMOS device using binary coding. The team is also working on a full CMOS device using spike coding.

But to take full advantage of advanced devices to break the density and power issues, the team has set its goals higher.

During the interview, Carlo Reita, director, Nanoelectronics Technical Marketing and Strategy at CEA-Leti, said that it’s critical to leverage the technologies developed for advanced silicon devices and memories in designing physical implementation on dedicated components. One route is conventional and monolithic 3D integration, using CEA-Leti’s CoolCube. Another is the use of Resistive RAM as synaptic elements, said Reita. Advanced device technologies such as FD-SOI and nanowires also come into play.

Meanwhile, the EU, as a part of the EU Horizon 2020 program, is seeking “to fabricate a chip implementing a neuromorphic architecture that supports state-of-the-art machine learning and spike-based learning mechanisms.”

The project, called NeuRAM3, has said its chip will feature “an ultra-low power, scalable and highly configurable neural architecture.” The goal is to deliver “a gain of a factor 50x in power consumption on selected applications compared to conventional digital solutions.”

CEA is deeply involved in the project, explained Reita. CEA’s own research goals are tightly aligned with the mission of the NeuRAM3 project, which includes the development of monolithically integrated 3D technology in FD-SOI, and the use of integrated RRAM synaptic elements.

Reita explained that under the NeuRAM3 project, the new mixed-signal multi-core neuromorphic device should be able to significantly reduce the power consumption compared to IBM’s brain-inspired called TrueNorth.  
TrueNorth-Comparison (cr) Figure 2: Device's comparison to IBM’s brain-inspired TrueNorth.  

Other participants of the NeuRAM3 project include: IMEC, IBM Zurich, STMicroelectronics, CNR (The National Research Council in Italy), IMSE (El Instituto de Microelectrónica de Sevilla in Spain), The University of Zurich and Jacobs University in Germany.

First published by EE Times.

 
« Previously: Road to innovating machine learning chips