Glow Compiler Optimizes Neural Networks for NXP MCUs

Article By : Sally Ward-Foxton

NXP is the first MCU maker to support Glow for MCUs, which compiles neural network models into target-optimized object code...

In a sign that machine learning techniques are fast gaining adoption on embedded platforms, NXP announced that it has created a customized implementation of Glow for microcontrollers (MCU), including some of its i.MX RT family. Glow is a neural network compiler that optimizes neural networks for specific target hardware.

NXP is the first of the microcontroller vendors to create a customized version of Glow for its hardware. It has done so for the Cortex-M cores and Tensilica HiFi4 DSP core on its i.MX RT685, RT 1050 and RT1060 microcontrollers.

The company said using the custom implementation of Glow for its MCU products doubled performance of the CIFAR-10 model on the Arm Cortex-M core, compared to the standard version of Glow, or increased by a factor of 25 using an on-chip DSP accelerator.

Glow for MCU targets endpoint applications

NXP’s microcontroller customers are increasingly asking about machine learning (Image: NXP).

There is increasing drive to use machine learning techniques on embedded platforms from applications that want to process images or voice, or for anomaly detection.

“Based on all the ML activities over the last couple of years, people have been kicking the tires, but it really hasn’t started to get interesting until ymost recently,” said Markus Levy, director of AI and machine learning technologies at NXP. “Up until about six months ago, I was getting about one customer request every couple of weeks. And now it seems like we’re getting customer questions and new customers almost on a daily basis. So it’s really started to pick up, and ML is definitely becoming ML for all, and not just for a select few.”

While most NXP customers are using machine learning on its application processor and crossover processor products, particularly the i.MX 8M+ crossover processor which includes a dedicated machine learning accelerator block, microcontroller customers are also showing an interest.

“We’re seeing pickup on the MCU side as more and more people are realizing that with a 600 megahertz or one gigahertz processor, you can do a lot of ML,” he said.

Glow up
Glow (Graph LOWering neural network compiler) is an open-source project initially developed at Facebook to optimize code for its cloud hardware, but it works just as well at the tiny end of the spectrum, explained Levy.

“Facebook has no intention of operating in the MCU environment, their main purpose behind Glow was to develop a compiler to support cloud-based accelerators,” said Levy. “But it operates on ONNX [open neural network exchange] models, which is adaptable to any training framework. There’s nothing inherently that makes something a cloud-based model or an edge based model.”

Glow takes neural network models and generates highly optimized code for the target hardware. Compared to typical inference engines, which use a just-in-time compiler with a runtime component (ie, the runtime looks at the graph and parses each layer as it comes in), Glow has an ahead-of-time mode. This removes a significant processor and memory overhead.

“Glow takes a different approach. Instead of an inference engine with a runtime component to it, this is a neural network compiler that actually creates object code,” said Levy. “So you basically end up with object code just like you do any other part of the application. And because of that, it runs a lot faster. And that’s why we’re really excited about it.”

Glow for MCU devices creates object code
(Left) implementation using TensorFlow Lite, (right) implementation using Glow compiler. (Image: NXP)

While this is great for performance, Levy said, it’s at the cost of flexibility.

“We have a lot of customers using TensorFlow Lite both on microcontroller [Cortex-M] as well as i.MX [Cortex-A] Products. With the runtime engine, you have this installed in your system and you can easily reload new models, so it allows you flexibility to change the model,” he said. “Whereas with Glow, once you compile it for the application, you would have to regenerate another object file if you wanted to change models. The trade-off here is flexibility for performance and memory. And we find that in MCU-based edge applications, where they’re fighting for every single bit of performance and memory, this tends to be more favorable.”

Glowing results
While Glow itself is target agnostic (it can be compiled for x86, FPGA or Arm architectures), purpose-built software libraries like those built by NXP can further optimize code to exploit the features of specific hardware. NXP has spent “well over a year” integrating target-specific libraries for Arm Cortex-M cores (CMSIS-NN) and the Tensilica HiFi4 DSP (NNLib), with platform-specific optimizations for the i.MX RT series.

Tests performed by NXP showed that compared to running Glow out of the box (without tapping into any target-specific software accelerator libraries), running CIFAR-10 on the i.MXRT685 using NXP’s implementation of CMSIS-NN on the Cortex-M doubled performance, or using NXP’s implementation of NNLib on the HiFi4 DSP boosted performance by about 25x.

This implementation of Glow is available in NXP’s eIQ machine learning software development environment, via the MCUXpresso software development kit.

“We still have a lot of customers that are very much on the TensorFlow Lite wagon, and we’ll continue to support that,” Levy said. “I hope people start to realize the significant benefits of Glow, and that really takes off as well… We’ll look at this in the future and see what other types of devices we will enable.”

While there are no current plans to expand customized Glow for MCU support to other NXP products, this will almost certainly be based on customer demand going forwards, Levy said.

Explore More

Leave a comment