Eta Compute has developed a high-efficiency ASIC and new artificial intelligence (AI) software based on neural networks to solve the problems of edge and mobile devices without the use of cloud resources.

Future mobile devices, which are constantly active in the IoT ecosystem, require a disruptive solution that offers processing power to enable machine intelligence with low power consumption for applications such as speech recognition and imaging.

These are the types of applications for which Eta Compute designed its ECM3531.

The IC is based on the ARM Cortex-M3 and NXP Coolflux DSP processors. It uses a tightly integrated DSP processor and a microcontroller architecture for a significant reduction in power for the intelligence of embedded machines. The SoC includes an analog to digital converter (ADC) sensor interface and highly efficient PMIC circuits. The chip also includes I2C, I2S, GPIOs, RTC, PWM, POR, BOD, SRAM and Flash. The patented hardware architecture (DIAL) is combined with fully customizable CNN-based algorithms to perform machine learning inference in hundreds of microwatts.

The processor, named Tensai, can be used with the popular TensorFlow or Caffe Software. This solution can support a wide range of applications in audio, video and signal processing where power is a strict constraint, such as in UAV (unmanned aerial vehicles) markets, in the Internet of things (IoT) and wearable markets.

ECM3531SP includes pretrained learning machine speech recognition and keyword spotting applications. ECM3531PG pretrained photoplethysmogram (PPG) application and ECM3531SF includes machine algorithms for fusion of gyro, magnometer, and accelerometer sensors (Figure 1).

EMC3531

Figure 1: EMC3531 with its development board [Source: Eta Compute]

The patented hardware architecture is combined with the fully customizable Eta Compute algorithms based on CNN, LSTM, GRU and SNN (spiking neural network) to perform machine learning inference in very few mW. Eta provides kernel software for convolutional neural networks on Coolflux's DSP, which are scalable compared to other NN (neural networks) and which will reduce an additional 30% of the power with asynchronous technology.

Tensai's computational properties offer 30-fold power reduction in a specific CNN-based image classification benchmark — unlike other Cortex-M7-class microcontrollers. Eta Compute has reached 0.04mJ per image out of 8 million operations (figure 2).

chart

Figure 2: accuracy versus SNR [Source: Eta Compute]

The high energy efficiency ASIC and the CNN software developed by Eta Compute avoid the need for numerous training samples for peripheral applications, where the amount of resources (both memory and calculation) is limited. A recent benchmark reached by Eta Compute was an improvement of 2-3 orders of magnitude in the efficiency of the model compared to various variants of neural networks for keyword recognition by consuming only 2 mW of power.

For sensing applications, particularly for motion and environmental sensors, the Eta Compute methodology allows sensor hubs to execute more extensive sensor algorithms by providing data and updates in real time from mobile network devices and the Internet of Things (IoT). The collaboration with Rohm Semiconductor has enabled the development of a Wireless Smart Ubiquitous Network (Wi-SUN) which is compatible with sensor nodes. The nodes will combine Rohm sensor technology and Eta Compute's low-power MCUs to offer solutions for intelligent utility and IoT networks for smart cities. They will be designed for frequent low-latency communications that absorb less than 1 μA during rest and, more importantly, only 1 mA during detection.

Eta Compute believes that neural network technology will play a key role in enabling intelligent peripheral devices. Thanks to the ability to learn and process sensory data directly on the margins in an energy efficient manner, new ASICs will provide relief to the bandwidth requirements needed to send raw data to a cloud-based learning service. The energy efficiency of neuromorphic processors will also allow "always on" solutions without suffering from handicaps deriving from power requirements.

Machine learning and neural network
There is a lot of talk today about artificial neural networks, especially as they are used in the field of AI and machine learning (ML).

Artificial neural networks (ANN) are an algorithm used to solve complex problems that cannot be easily codified; they are a cornerstone of machine learning (ML). They are called "neural networks" because the behavior of the nodes that compose them vaguely resembles that of biological neurons. A neuron receives signals from various other neurons via synaptic connections and integrates them. If the resulting activation exceeds a certain threshold, it generates an action potential that propagates through its axon to one or more neurons.

A neural network can be imagined as composed of different "layers" of nodes, each of which is connected to the nodes of the next layer. In biological neurons, the action potential is transmitted in full once the potential difference to the membranes exceeds a certain threshold. In a sense, this is also true for "artificial" neurons. The only difference is that the response behavior is adapted according to need and is determined by the activation function. The intricate part of a neural network is its learning. The learning of a neural network takes place when there is some feedback, i.e., a response that allows us to check if we have (actually) learned what we are learning.

Deep Learning is an ML technique that uses algorithms that are able to simulate the human brain. These algorithms are based on the development of neural networks for learning and performing a specific activity.

The learning algorithms used to teach neural networks are divided into 3 categories. The choice of which to use depends on the field of application for which the network is designed and its type (feedforward or feedback). The algorithms are supervised; unsupervised; reinforcement. In the neural networks that learn through the reinforcement algorithm, there are neither input-output associations of examples, nor an explicit adjustment of the outputs to be optimized. Neural circuits learn exclusively from interaction with the environment (Figure 3).

neural net

Figure 3: Neural network

The ASIC approach
A convolutional neural network (CNN) requires repeated convolution throughout the pipeline, and the number of operations can be extremely high for video applications. These algorithms also tend to be highly parallel, requiring the division of data between different processing units and making it essential to connect the pipeline in the most efficient way. Furthermore, there is a significant transfer of data back and forth between the memory. Deep learning chipsets are designed to address these aspects and optimize performance, power, and memory.

ASIC processors provide a kind of software and hardware engine being able to manage a deep learning framework. Companies are also developing their boards so that they can be inserted into servers with minor code changes. Application developers can code a deep learning algorithm, set some options, and continue to develop software just like they would with the central processing unit (CPU). The ASIC would have functionalities for the application and would offer only a few adjustable parameters.

The specificity of the design is focused on the resolution of a single problem, allowing to achieve performance levels (in terms of processing speed and power consumption) that are difficult to obtain with the use of more generic solutions.

Computational machines require such a large amount of data processing, it recommends the use of expensive supercomputers. The next generation of neural network processors will try to achieve the balance between the efficiency of the computational elaboration and the power needed for the processing.

GPUs cannot be properly classified as ASICs, but they are designed for specific applications. Given that very loose definition, the GPU is an ASIC used to process graphical algorithms. GPUs are is fast and relatively flexible. The alternative is to design a custom ASIC dedicated to performing extremely fast fixed operations. In the case of Google's TPU, they lend themselves well to a high degree of parallelism and the processing of neural networks (Figure 4 and 5).

GPU platform

Figure 4: GPU platform for CNN [Source: Nvidia]

ARM chipset
CPUs have the advantage of being infinitely programmable, with decent but not stellar performance. On the other hand, FPGAs from Intel and Xilinx offer excellent performance at a low power, with more software flexibility. FPGAs are mainly used in ML inference. For specific workloads, the performance of an FPGA does not come close to that of a high-end GPU.

Eta Compute designed its new IC to offer the best of both worlds. The learning mode precludes the need for many training samples and is more desirable for applications with edge devices where the amount of computing resources is limited. All of this requires highly efficient learning models for various applications that place the computational resources at the limit. All the while, it operates at low power.

The demand for in-depth learning and statistical inference is driving the hardware industry to hardware specialized in ML. With the expansion of AI (Artificial Intelligence) applications, the demand for specialized ML devices is driving the hardware in the next stages of evolution. It will be fascinating to experience the impact of these technologies applied in the health, medical, transport, and robotics fields.

Allied Market Research’s report, “Machine Learning Chip Market by Type and Application — Global Opportunity Analysis and Industry Forecast, 2014 – 2022”, indicates the global machine learning chip is projected to reach $37.8 billion in 2025, registering a CAGR of 40.8% from 2017 to 2025.  

ML market

Figure 5: the market for Machine Learning solutions [Source: “The Future of Machine Learning Hardware”, Phillip Jama, Sept. 2016]