Homegrown CNN Accelerator Moves AI to the Edge

Article By : Sally Ward-Foxton

Maxim plans to combine its new CNN accelerator with its other IP for a range of AI-enabled application-specific SoCs...

Maxim Integrated has released its first SoC with a dedicated on-chip AI accelerator. The MAX78000 device, which uses convolutional neural network (CNN) accelerator IP developed in-house, is capable of around 30 GOPS at ultra-low power consumption. This is ample compute to run applications such as facial recognition and keyword spotting at power consumption levels that meet the tight budgets of battery-powered wearables and IoT devices. Maxim has been making microcontrollers for more than 20 years, but the company has focused on wearables and IoT devices in recent years. An AI accelerator from Maxim may have come as a surprise to some industry watchers, but it shouldn’t have, says Kris Ardis, executive director for the Micros, Security and Software Business Unit at Maxim Integrated. “We’re maniacally focused on energy, and trying to extend devices’ battery life,” Ardis told EE Times. “We also have a lot of experience with throwing really complex, dedicated hardware at things like cryptographic blocks, things that will make those complex mathematical equations run faster and with lower energy.” Ardis described how an internal happy-hour autonomous robot racing competition five years ago was the springboard for the company’s interest in machine learning. Maxim engineers in the Dallas office (Ardis is a veteran ofDallas Semiconductor, acquired by Maxim in 2001) would build their own robots to race around a wooden maze. Machine learning became a passion project, which became a skunkworks chip, which has now become a production chip. Today, the machine learning team is split between Dallas and Istanbul. Two cores The MAX78000’s CNN accelerator is complemented by two microcontroller cores, which control the system and get the data in and out of the CNN accelerator (they are not involved in the neural network computation). One is an Arm Cortex-M4F and one is a Maxim implementation of a low-power RISC-V core. The reasons for using a RISC-V core are not only financial, Ardis said.
Maxim AI Accelerator block diagram
The MAX78000 features an Arm Cortex-M4F core, a power-optimised RISC-V core and an in-house developed CNN accelerator (Image: Maxim)
“We expect customers to start programming on the M4F, as they are prototyping with maybe a camera or an audio chip,” Ardis said. “But when they’re really trying to squeeze the energy out, the RISC-V is the right core to do that. It can manage things in a lower power manner. The other thing we use that for is sometimes the data needs some massaging… you might want to change the endianness, make the image black and white, or something like that. We considered designing hardware around these cases but the possibilities got too endless, so we put in the lowest power processor we could to help with that data massaging job.” CNN accelerator Maxim’s CNN accelerator has 64 parallel processors, each with a pooling unit and a convolution engine with dedicated weight memory. Four processors share one data memory, and groups of 16 share common controls. Supported operations include 1D and 2D convolution and the chip supports 1-, 2-, 4- and 8-bit weights (1-bit weights, or binarized neural networks are gaining popularity for some extremely energy-sensitive applications). The overall approach is to minimize movement of data and memory access to conserve energy. Other energy saving features include the option to run slower at lower currents if the current budget is tight. The chip has 512 kB Flash data memory for network inputs, while the weight memory inside the CNN accelerator is 442 kB (this is interspersed with the processing engines and is not available to the user, as such). This weight memory is configurable to support from 442,000 8-bit integer weights to 3.5 million 1-bit weights. The accelerator is optimized for CNNs, which are commonly used for image processing applications today, but it can also support non-image applications by converting input data into an image. Maxim uses this technique for data like heart rate and blood pressure, but also for audio applications like keyword spotting. Maxim’s software tool bridges the machine learning and embedded development universes. It works with PyTorch and TensorFlow to build files that are compatible with the chip’s limitations. It also converts the neural network into C code before configuring and loading the weights into the CNN.
Maxim AI Accelerator tool flow
Tool flow for the MAX78000. Maxim’s synthesis tool creates C code that runs on the device (Image: Maxim)
Micro Joules According to Ardis, the device is flexible enough to run entire applications or to function as an AI co-processor, depending on the application. It packs enough punch to handle image processing including object detection and classification or facial recognition, audio apps like keyword spotting and noise cancellation, and time-series data processing such as heart rate or predictive maintenance applications.
Maxim AI Accelerator evaluation kit
The MAX78000 eval kit running Maxim’s facial recognition demo (Image: Maxim)
In internal tests, Maxim ran the same neural networks on one of its power-optimized Arm Cortex-M4F based microcontrollers and the MAX78000. The MAX78000 performed image classification on the MNIST dataset at 1100x lower energy and 400x faster versus a software solution on the Cortex M4F, and it was 600x lower energy and 200x faster on keyword spotting than the Cortex M4F. Maxim has a facial recognition demo running on the MAX78000 which collects the image and runs the inference in around 14ms, consuming 400 µJ per inference (most of the energy is consumed by the image capture). On a keyword spotting demo which listens for 20 different keywords, the MAX78000 can run an inference in 2.0ms and consumes 140 µJ. Roadmap Ardis said that the MAX78000 is hopefully the first in a family of chips with Maxim’s CNN accelerator built in. “Our next chip will have a bigger accelerator,” he said. “We’re going to try to get much higher performance image processing, maybe even video processing in a chip. We’ll be continuing to add features that our customers want, maybe some operators that we don’t support today… whatever the newest, fanciest activation function is.” There is also plenty of scope for building more application-specific SoCs by combining the CNN accelerator with existing IP from Maxim’s wearable, industrial and financial terminals microcontroller businesses. Ardis proposed an authentication badge with facial recognition that also needs NFC capability, for example, or a smaller CNN accelerator paired with Bluetooth capability and a smaller memory for IoT sensor nodes. The MAX78000 comes in an 8x8mm BGA package and is available now, along with an evaluation kit. A 4x4mm wafer scale packaged version will be available shortly.

Leave a comment