Abstract
We present the first gesture recognition system implemented end-to-end on event-based hardware, using a
TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by
a Dynamic Vision Sensor (DVS). The biologically inspired
DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every
pixel at a fixed frame rate. This sparse, asynchronous data
representation lets event-based cameras operate at much
lower power than frame-based cameras. However, much of
the energy efficiency is lost if, as in previous work, the event
stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event
stream using TrueNorth, a natively event-based processor
with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while
consuming less than 200mW. The CNN achieves 96.5%
out-of-sample accuracy on a newly collected DVS dataset
(DvsGesture) comprising 11 hand gesture categories from
29 subjects under 3 illumination conditions