DVS has a fighting chance to become a viable, mass-production vision sensor technology.
Roughly a billion dollars of investment in CMOS image sensors (CIS) over the past 20 years has led to the current market where these beautiful imagers are produced by the billions each year. As CIS became a commodity, neuromorphic silicon retina “event camera” development languished, only gaining industrial traction when Samsung and Sony recently put their state-of-the-art image sensor process technologies on the market.
Our event camera introduced at the 2006 ISSCC conference included huge 40-um pixels using a 350-nm process. Even then, CIS pixels were down around a few microns. In 2017, Samsung published an ISSCC paper on a 9-um pixel, back-illuminated VGA dynamic vision sensor (DVS) using their 90-mn CIS fab. Meanwhile, Insightness announced a clever dual intensity + DVS pixel measuring a mere 7.2-um.
Both Samsung and Sony have built DVS with pixels under 5um based on stacked technologies where the back-illuminated 55-nm photosensor wafer is copper-bumped to a 28-nm readout wafer.
Amazing increases in event readout speed have also resulted from industrial development. These clever designs are bringing DVS pixels down to the sizes of standard global-shutter machine vision and automotive camera pixel sizes. It means that DVS has a fighting chance to establish itself as a viable mass production vision sensor technology in the same “megapixel race” that has consumed CIS for decades.
The development of neuromorphic silicon retinas is a great example of faith meeting practical reality. The development of silicon retina event cameras goes back to 1989 with Kunihiko Fukushima’s Reticon. and the work of Carver Mead and Misha Mahowald at Caltech in the early 1990’s.
I joined this effort as a graduate student at Caltech with Mahowald and Mead as mentors. We neuromorphic engineers believed we could build a camera that worked like the biological eye. The reality after a decade of early work was that our “silicon retina” pixels were vastly too big (i.e. expensive), too noisy (i.e. they made terrible pictures.
Just as important, they didn’t offer sufficient advantage over CIS.
All this early development was taking place concurrently with the constant improvement of CIS. A breakthrough of sorts occurred during our work on the European project called CAVIAR when Patrick Lichtsteiner and I came up with the DVS pixel circuit. Anton Civit assisted me in building the first USB DVS camera. We sold several hundred 128-x-128-pixel DVS cameras to neuromorphic community early adopters who were not ASIC developers. This pixel architecture is the foundation of all subsequent generations from all the major players (even when they don’t say so on their web sites).
The DVS brings a “unique selling proposition” over previous silicon retinas and standard cameras, owing to its combination of sparse, quick spiking output that responds reliably to low contrast natural scenes while offering great dynamic range and speed. Early DVS cameras allowed neuromorphic researchers to play with the technology to determine its potential.
A decade later, conventional machine vision and robotics researchers did the same. This would not have happened without my students Patrick Lichtsteiner, Raphael Berner, and Christian Brandli, who now lead several of startups. The other key was long term support from UZH and ETH for basic technology development and funding from the European Commission’s Future and Emerging Technologies initiative.
Similar to what occurred with CMOS image sensors, event camera startups like Insightness (recently acquired by Sony), iniVation (who carry on the iniLabs mission), Shanghai-based CelePixel and well-heeled Prophesee are established, with real products to sell. Others will surely follow.
Recently, mainstream computer vision researchers introduced to event cameras (mainly via academic collaboration or through our neuromorphic workshops) have published compelling results derived from event cameras. DVS with event-based sensitivity to changes in brightness has become synonymous in these communities with “event camera.” That’s the case even though the original neuromorphic definition included a much broader class of vision sensors capable of mimicking the computational power of biological retinas.
The last 30 years has seen steady development of event camera technologies. If the fallout from COVID-19 does not delay further research, we may see mass production of these sensors in the next few years (consumers can find Samsung DVS products on store shelves already). Along the way we have had lots of fun playing with DVS at many different levels, ranging from device physics and CMOS circuit design to complete robotic systems driven by AI.
I now think about of DVS development as mainly an industrial enterprise, but it was the heavy focus on sparse computing that has led us over the last five years to exploit activation sparsity in hardware AI accelerators. Like the spiking network in our brains, these AI accelerators only compute when needed. This approach—promoted for decades by neuromorphic engineers—is finally gaining traction in mainstream electronics.
The fundamental neuromorphic organizing principle (as Carver Mead might put it) is to compute only where and when needed.
—Tobi Delbruck is a Professor of Physics and Electrical Engineering at ETH Zurich in the Institute of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland.