BARCELONA — GreenWaves Technologies, a startup based in Grenoble, France, launched an apps processor designed to do image, sound and vibration AI analysis on battery-operated sensing devices. The processor, called GAP8, is built on the RISC-V and PULP open-source projects.

Greenwaves’ first sample chip just came back last week from TSMC, which built it using its 55nm low power process. With this brainchild in hand, the company is pitching its GAP8 processor and GAP8 software development kit this week both at Mobile World Congress here and Embedded World in Nürnberg, Germany.

Mike Demler, senior analyst at the Linley Group, told us, “It’s the first time I’ve seen someone add a neural engine to an MCU-class processor.”

The move by the French startup illustrates how the AI frenzy is infecting even the IoT world, where most edge devices are both resource- and power-constrained.   

Founded in 2014, Greenwaves didn’t originally aim to design embedded AI processors. The initial goal, to do an innovative orthogonal frequency-division multiplexing (OFDM) algorithm known as GreenOFDM on a processor, however, recently shifted focus. The company reset its sights on machine learning applications, acknowledged Loïc Liétar, co-founder and CEO of GreenWaves. This pivot became inevitable, explained Liétar, when he saw “far more [market] traction” on the processor’s ability to do “content understanding (image, sound, vibration).”

GreenWaves was born when two projects merged into one. Liétar was originally interested in solving the high-power consumption limits of OFDM and was looking for an appropriate processor architecture to map his algorithm. Eric Flamand, Liétar’s long-time friend and now GreenWaves’ CTO, was then developing an ultra-low power processor for content understanding. After the two decided to join forces as a single startup, they leveraged Flamand’s PULP-based architecture to offer both machine learning functions and GreenOFDM.

Asked about whatever happened to GreenOFDM, Liétar noted, “A couple of customers are interested in the SW modem capabilities of GAP8, albeit not for GreenOFDM, which would require the development of a specific power amplifier to deliver on its promise.”

Put simply, GreenWaves’ GAP8 consists of nine RISC-V cores. One serves as a fabric controller managing peripherals and communication with the outside world. The other eight cores are organized in a cluster with shared data and instruction memory. The cluster — consisting of eight RISC-V cores — has an integrated hardware convolution computation engine that accelerates inference calculations for convolutional neural networks (CNNs).

According to Greenwaves, the fabric controller and cluster live in separate voltage and frequency domains, so that each consumes power only when necessary. Greenwaves also used the standard RISC-V ISA extension mechanism to add instructions that boost performance for DSP-centric operations, which are frequently found in the algorithms executed on the cluster.

GAP8 Architecture
GAP8 Architecture (Source: Greenwaves)

Click here for larger image

Liétar explained, “For most developers, GAP8 is programmed just like any MCU.”

When compute-intense tasks need to be launched, they go to the cluster through the APIs of a rich compute library included in the GAP8 SDK. “A tool-driven methodology also allows trained CNNs described with an AI framework to be optimized for and ported onto GAP8,” he added.


Where AI, IoT and MCU meet
Greenwaves hopes to position GAP8 into the market whirlwind where AI, IoT and MCU meet.

Greewaves promises that GAP8 will deliver “scalable compute performance at dynamically adjustable power consumption points from 1mW to 60mW and standby and data acquisition in the range of nAs to µAs.”

Asked to compare GAP8 with other neural network processors, Liétar noted that embedded vision processors/dedicated CNN processors with TFLOPS of computing power can run complex machine learning applications. However, they consume too much power to get designed into battery-operated devices.

This reality opens a sweet spot for Liétar and GAP8, somewhere between ultra-low power MCUs (100s of MOPS), such as STMicroelectronics STM32, and high-end low power MCUs/mid-range apps processors (several GOPs) such as Allwinner’s apps processors or NXP I.MX apps processors.

GAP8, he claimed, can prove 20 times more energy-efficient than mid-range apps processors, while bringing down the system cost two to three times.

GAP8 Positioning: Energy efficiency vs. Computing power
GAP8 Positioning: Energy efficiency vs. Computing power (Source: Greenwaves)

The goal for GAP8 is to deliver “a flexible compute engine that can accelerate a wide range of algorithms from CNN to traditional machine vision, sound or vibration analysis at an absolute low power point,” he noted.

Asked about target applications for GAP8, Liétar cited embedded systems for counting people and objects for smart cities, vibration analysis for the industrial market, robotic control/navigation for consumer robotic vacuum cleaners, keyword spotting for smart speakers and object recognition for home surveillance systems.

Consider, he said, traffic lights in a smart city. With machine learning capabilities, the traffic light can count cars are at any given time. In a smart office space, management can install a system to see how many desks are free to use.

All of this begs one question: Why run such machine learning applications on battery-operated systems? Don’t traffic signals and smart offices come with their own power?  Liétar said, “It turns out that those who want to do such analysis are not usually the same people operating traffic lights or smart offices.” One needs to be able to attach such an AI feature as an independent battery-operated unit to the existing infrastructure, he explained.

Greenwaves says GAP8 can do always-on face detection with a few milliwatts of power, while indoor people-counting and presence-detection could be done without replacing batteries for years.

Asked about customers, Liétar said that Greenwaves has at least one customer, with whom it has been working since last fall. Since Greenwaves launched its software development kit, the feedback has been encouraging. “We have seen at least 20 customers have downloaded it, since we launched the SDK,” he noted, “although we can’t tell you how active they are.”

Coming soon: Gapduino
Greenwaves is getting ready to roll out in April its GAP8 hardware development kit priced at 100 euros (about $123). Included in the kit are the GAPDUINO board and the GAP8 SDK.

GAPUINO is an Arduino Uno compatible Master or Shield with a camera connector for external cameras, according to the company. It can be powered via a battery (SAF17500), DC connector or USB.

Greenwaves also created a sensor board (Arduino shield format) containing several sensors including: 4 x MP34DT01 microphones, VL53 time of flight, IR sensor, pressure sensor, light sensor, temperature and humidity sensor and a 6-axis accelerometer / gyroscope.



Secret sauce in ultra-low power GAP8
To push the limits of GAP8’s energy efficiency, Greenwaves has applied a “set of levers in a consistent and balanced manner,” explained Liétar.

They include an extended RISC-V instruction set architecture to pack more operations into each cycle, an energy optimal sign-off frequency, hardware synchronization, eight-core parallelization, a fast turn-on/switch off function achieved by putting the power management unit inside the chip, a shared instruction cache and a hardware convolution engine.

Loic Lietar
Loïc Liétar

More specifically, hardware synchronization is important because, for fine-grain loops, synchronization dispatched to the eight cores could burn up to 50 percent of the sequential computing. This would drastically limit efficiency in parallelization, said Liétar. “Doing this synchronization in HW removes this limitation,” he said. 

Meanwhile, having 8 independent cores offer more parallelization opportunities down to fine grain than a VLIW or GPU architectures, claimed Liétar.

Greenwaves comes with a strong academic background. Co-founder and CTO Flamand, who still has a position at ETH Zurich, is a software developer who created DSP instruction extensions for PULPino, a 32-bit RISC-V processor designed by researchers at ETH Zurich and Università di Bologna as port of the Parallel Ulta Low Power (PULP) project.

The Linley Group’s Demler suspects that Greenwaves’ academic background and the OpenSource fervor might have been the key drivers for Greenwaves’ processor design. Noting that the startup’s opportunity is mostly lower cost, Demler believes it is likely to face tough competition from ST or NXP. On the higher end, Greeenwaves’ rivals will be companies with dedicated neural-network processors and coprocessors like FPGA solutions, he noted.

Acknowledging GAP8’s unique architecture, Demler cautioned, “Just because you’re doing something different doesn’t mean it’s better. As a startup, they are going to have to work hard to get some meaningful design wins. To do that, they need to focus on delivering the whole-product solution for a particular market/application, not just the uniqueness of their architecture.”

Asked if Greenwaves has any plans to license its embedded AI processor as an IP, Liétar said, “Never. I’ve been in this business long enough to know that you can’t really make money as an IP vendor unless you are Arm.”

— Junko Yoshida, Chief International Correspondent, EE Times