Latest research on brain-inspired end-to-end analog neural networks promises fast, very low power AI chips, without on-chip ADCs and DACs...
A research collaboration between neuromorphic chip startup Rain Neuromorphics and Canadian research institute Mila has proved that training neural networks using entirely analog hardware is possible, creating the possibility of end-to-end analog neural networks. This has important implications for neuromorphic computing and AI hardware as a whole: it promises entirely analog AI chips that can be used for training and inference, making significant savings on compute, power, latency and size. The breakthrough marries electrical engineering and deep learning to open the door for AI-equipped robots that can learn on their own in the field, more like a human does.
In a paper entitled “Training End-to-End Analog Neural Networks with Equilibrium Propagation,” co-authored by one of the “godfathers of AI,” Turing award winner Yoshua Bengio, the researchers show that neural networks can be trained using a crossbar array of memristors, similar to solutions used in commercial AI accelerator chips that use processor-in-memory techniques today, but without using corresponding arrays of ADCs and DACs between each layer of the network. The result holds potential for vastly more power-efficient AI hardware
In a video-link interview with EE Times, Bengio and co-authors Jack Kendall and Ben Scellier, as well as Rain Neuromorphics CEO Gordon Wilson explained the implications of this important work.
“Today, energy consumption and cost are the biggest limiting factors that prevent us from deploying new types of artificial intelligence,” said Wilson. “We really want to find a far more efficient substrate for compute, one that is fundamentally more energy efficient, one that allows us to not to limit training to massive data centers, but also move us into a world where we can imagine independent, autonomous, energy-unlimited devices, learning on their own. And that’s something that we think this [work] is opening the door towards.”
The researchers have simulated training end-to-end analog neural networks on MNIST classification (the Modified National Institute of Standards and Technology database of handwritten digits), where it performed comparably or better than equivalent-sized software-based neural networks.
Crossbar arrays of memristive elements are the foundation of analog computing techniques. ASICs available today from companies such as Mythic, Syntiant and Gyrfalcon use memory cells as the memristive element, and can perform matrix vector multiplication using very little power compared to CPUs and GPUs. However, most of the power they do use is consumed by the ADCs and DACs needed between each layer of calculations to mitigate device mismatch or slight non-idealities in the properties of the memory cells, which would otherwise affect the accuracy of the final result.
These non-idealities are the reason that a neural network hasn’t been implemented entirely on analog hardware so far. While they are troublesome for inference, they are fatal to training, since the effect is exacerbated by the forward and backward data paths required by backpropagation (the most common training algorithm used today).
“The reason [other companies] are focusing on inference is because training in analog is actually really hard,” said Kendall, a co-author on the paper who is also Rain Neuromorphics’ CTO. “If you try to do back propagation in analog, you get this effect where the forward activations and backwards activations from the two separate [data] paths… errors due to device mismatch and non-idealities tend to accumulate as you back-propagate through the network. If you look at implementations of backpropagation in analog, they perform very poorly for this reason.”
Wilson’s view is that separating analog training and inference into two separate problems, per current industry practice, is ultimately wrong.
“If you wanted your inference in analog, there’s noise in that system, and for some of the folks who’ve been building analog inference chips, they realize they need to build a whole new paradigm for training, where they insert the noise that mirrors what you have in the analog inference chip,” he said. “It creates a much more costly and inefficient way to do it – because the hardware isn’t matching, you’re using separate hardware for training and inference. But if you combine it into one platform, you not only can you continuously adaptively learn, you don’t have that mismatch between [training and inference] devices.”
Enter Equilibrium Propagation (EqProp), a technique invented in 2017 by Bengio and Scellier. This training algorithm has only one data path, so avoids the problems back propagation causes in analog hardware. There’s a caveat, though; EqProp only applies to energy-based networks.
“Energy-based models are a kind of biologically inspired neural network that relies on equilibrium states,” explained Scellier. “In practice, in the last four decades since they were invented, we’ve used conventional digital computers to simulate the laws of physics, and [we’ve done so in a way that] minimizes these energy functions…. The key insight into our work with [Rain Neuromorphics] is that instead of simulating these laws of physics to minimize energies, we use the laws of physics to build these efficient analog networks.”
One surprising result of this new research is that electrical circuit theory has been directly linked to neural networks for the first time, using Kirchhoff’s laws. This means a new box of mathematical tools from electrical engineering can be applied to deep learning and used to transpose from one field to the other.
“What we have accomplished in this work is to bridge the conceptual gap between mathematical energies and physical energies,” said Scellier. “This can give us new insights into how to analyze neural networks, how to analyze energy-based models and how to analyze and train circuits in circuit theory. This is very exciting because there’s a lot of work to do, at the mathematical level.”
The upshot is that while EqProp has existed as a concept since 2017, this new work has helped turn an abstract idea into something that could be physically realised with a circuit. This would make end-to-end analog computation possible, without the need for converting to and from the digital domain at every step.
“We’re using the physics to directly implement the computations we want, rather than having to make very complicated constructions to transform what the physics does into what we would do normally in software,” said Bengio. “That’s why we can save so much in terms of computation, time, energy, and size of the circuits.”
Bengio explained that it isn’t so much that the algorithm learns about the mismatched devices or device non-idealities on the chip, more that it just doesn’t care.
“If you’re able to tweak each of these devices to modify some of its properties, like its resistance, so that the overall circuit performs the thing you want, then you don’t care that each individual, say, multiplier or artificial neuron, doesn’t do exactly the same thing as its neighbor,” said Bengio. “One of the central principles of deep learning is that you want the overall computation, the whole circuit together, to perform the job you’re training it for. You don’t care what each particular one is doing, so long as we can tweak it so that, together with the others, they form a computation that’s what we want.”
Bengio described computational units on a chip as corresponding to neurons in the brain – each is slightly modified as learning progresses, so the end result gets more accurate. But the deep learning process itself doesn’t require mathematically identical compute units. It’s our insistence on doing the calculations in software that’s causing the inefficiencies.
“The way that people have been trying to do it… is to coax the analog devices into trying to be idealized multiplications and additions. And of course it’s hard, and two devices won’t do the same thing,” he said. “You end up having to spend a lot of energy and time in order to achieve the computation, because you’re forcing each of the elements to do this idealized thing that you can write down in an equation, but really you [shouldn’t] care – you don’t need this. You only need the overall circuit, just like in your brain.”
The important disadvantage to an end-to-end analog neural network is that device mismatches and non-idealities are obviously not the same from chip to chip. So some level of training will be required for each chip, rather than simply implementing a pre-trained model like we do today. Bengio suggests some sort of initialisation could be done in the factory, saying that it might not need to be trained from scratch, merely tweaked.
“It’s a little bit like humans,” he said. “I mean, no two people are the same! Because our neurons aren’t exactly the same, and our experience is not exactly the same. So these circuits, they would probably also be like that – no two chips will be doing exactly the same thing.”
Deep learning future
The researchers say this work could guide the development of a new generation of ultra-fast, low-power neural network hardware supporting both inference and on-chip learning. Currently most models are in the order of millions of neurons, but technology like this could eventually enable analog networks that scale to the size of a human brain (86 billion neurons).
Aside from the possibility of efficient, scalable analog chips for AI, one of the wider implications of this work is that analog computation using EqProp as a training framework offers a path for future developments in mainstream deep learning. Many neuromorphic approaches today use a different brain-inspired paradigm based on spiking neural networks (SNNs), which promises energy-efficient training and inference.
“[SNNs] have never actually outperformed back propagation-based models in terms of performance,” said Wilson. “This is because with the algorithm that’s used to train spiking neural networks, STDP [spike timing dependent plasticity], fundamentally you don’t have access the global gradient information as you would have with back propagation. With our energy-based model, we retain those advantages of back propagation when we move into the analog world.”
The researchers also point out that there is a lack of a theoretical framework for training SNNs, whereas EqProp and energy-based models provide the theoretical framework for training end-to-end analog neural networks (by stochastic gradient descent with a local weight update mechanism). While this may appear to put the two paradigms at odds with each other, both Kendall and Bengio agreed that they will likely be unified eventually.
For their part, Rain Neuromorphics is planning to capitalize on this breakthrough by building specialized hardware for it. The company is working on commercializing two main technologies, said Wilson: end-to-end analog neural network hardware building on this new EqProp-based work, and separately, the company’s memristive nanowire neural network chip (MN3). These technologies are not really related, other than that Kendall had a hand in both breakthroughs, and they are both brain-inspired.
“We will eventually combine these two hardware innovations,” Wilson said. “Originally, we were only commercializing the MN3 as a coprocessor which would still require digital-analog and analog-digital conversion. We are now ultimately planning to combine the MN3 with EqProp to commercialize massive, scaled up, sparse, end-to-end analog neural networks.”
MN3, invented in 2014 at the University of Florida by Kendall and materials science professor Juan Nino (the third cofounder of Rain), is designed to enable scaling of analog compute hardware. While today’s analog chips can perform matrix multiplication very fast, the arrays scale up poorly because the inputs and outputs are limited to two edges of the chip. An MN3 chip has an array of neurons connected with randomly deposited memristive nanowires which form the synapses and can enable low-power chips that can handle both training and inference.
“The MN3 moves the memristor elements from inside the CMOS to on top of the CMOS, allowing the entire CMOS [layer] to be filled with a grid of inputs and outputs,” said Wilson. “This shift in array architecture allows analog matrix multiplication to scale up massively.”
Today’s analog matrix multiplication arrays have as many as 4000 inputs and 4000 outputs, while the MN3 can scale up to hundreds of thousands of each. The idea uses a special type of sparsity – small world sparsity – which mirrors the sparsity pattern observed in the brain.
Rain taped out an MN3 test chip called Cumulus a year ago in TSMC 180 nm, and will tape out a second (larger) version later this year. Tapeout for an EqProp-based test chip is planned for 2021.