Why objective benchmarks conducted by third parties may not yet be the answer.
Among the issues discussed in my last post was why comparing “like with like” is so hard. In practice, researchers tend to choose a benchmark metric that suits their particular technology, then treat the result as the only figure of merit that matters.
In the absence of any alternative, it’s hard to criticize that approach.
However, there is another option: enlisting evaluators not directly involved in technology development. That has been a trend over the last few years, with at least three papers published this year doing just that.
This is laudable, but the papers also illustrate just how difficult it is to get benchmarking right. Apples and oranges. In a paper issued by Oak Ridge National Laboratory, the authors selected different machine learning tasks that neuromorphic simulators should be able to run. They then measured performance as well as how much power the tasks consumed.
The varied tasks should have provided a well-rounded view of the systems. Tested were NEST, Brian, Nengo, and BindsNET, all of which are used to design and simulate different kinds of networks. They were run on both a PC and accelerated using various methods, including GPUs. None of the boards used neuromorphic hardware, which some could have used.
For practical reasons, run time was limited to 15 minutes. According to co-author Catherine Schuman, the hardware choice reflected the investigators’ desire to ensure the study was relevant to those without advanced equipment. That’s a reasonable goal even if optimizing neuromorphic simulators on classical hardware could be seen as a bit of a contradiction. Completing the study in weeks rather than months (hence the run-time limit) also seems like an obvious decision. However, the result was that only two-fifths of the machines completed some of the tasks, leaving big gaps in the data.
Another experiment on robotic path planning from FZI Research Center for Information Technology in Karlsruhe, Germany, confronted a different problem. The SpiNNaker system from the University of Manchester was chosen as a representative neuromorphic technology, then compared with a system using Nvidia’s Jetson boards designed to accelerate machine learning.
SpiNNaker was originally designed more as a simulator than as actual neuromorphic hardware (in contrast toSpiNNaker 2), and so fared poorly in terms of power efficiency. Other low-power neuromorphic chips (like Intel’s Loihi) were not tested.
Since SpiNNaker is part of the Human Brain Project, of which FZI is a participant, it’s not surprising researchers used what was available. Indeed, these might well have been the right comparisons for their specific purposes. Whether the results really represent a useful benchmarking exercise is less clear.
Finally, a project at the University of Dresden, in collaboration with the creators of Nengo and SpiNNaker, was much less ambitious in its goals: comparing SpiNNaker 2 with Loihi for keyword spotting and adaptive control tasks. (Spoiler alert: SpiNNaker was more energy efficient for the former and Loihi for the latter.)
Comparing just two systems may seem to make this a less important benchmarking study (though it fulfilled some other important goals). But it may also have been the only way researchers could generate a fair and useful comparison.
That approach also demonstrates the difficulty.
In a 2018 commentary on neuromorphic benchmarking, Mike Davies, head of Intel’s Loihi project, suggested a suite of tasks and metrics that could be used to measure performance. These include everything from keyword spotting, to classification of the Modified National Institute of Standards and Technology database digits, to playing sudoku, gesture recognition and moving a robotic arm.
Perhaps Davies’ most compelling suggestion was pursuing the grander challenge from robotics and AI: creating contests in which machines can compete directly against each other (RoboCup soccer) or even against humans (chess or Go).
Even foosball has emerged as a potential interim challenge, but seems unlikely in the long run to present sufficient complexity to demonstrate any advantages offered by neuromorphic engineering.
Among the advantages of competitions are that, rather than standardizing in arbitrary ways, individual research groups can use their creativity to forge the best system, optimized for their hardware, along with encoding methods, learning rules, a network architecture and a neuron-synapse type.
Where flexibility in the rules is needed, accommodations can be made or rejected in consultation with other players—who may themselves require restrictions to be lifted or relaxed. Done well, that approach could provide a more creative and higher-level playing field, thereby advancing neuromorphic technology.
This article was originally published on EE Times.
Dr. Sunny Bains teaches at University College London, is author of Explaining the Future: How to Research, Analyze, and Report on Emerging Technologies, and is currently writing a book on neuromorphic engineering.