Accelerated Inference Framework of Sparse Neural Network Based on Nested
Bitmask Structure
Abstract
In order to satisfy the ever-growing demand for
high-performance processors for neural networks,
the state-of-the-art processing units tend to use
application-oriented circuits to replace Processing
Engine (PE) on the GPU under circumstances where
low-power solutions are required. The applicationoriented PE is fully optimized in terms of the circuit architecture and eliminates incorrect data dependency and instructional redundancy. In this paper, we propose a novel encoding approach on a
sparse neural network after pruning. We partition
the weight matrix into numerous blocks and use a
low-rank binary map to represent the validation of
these blocks. Furthermore, the elements in each
nonzero block are also encoded into two submatrices: one is the binary stream discriminating the
zero/nonzero position, while the other is the pure
nonzero elements stored in the FIFO. In the experimental part, we implement a well pre-trained sparse
neural network on the Xilinx FPGA VC707. Experimental results show that our algorithm outperforms
the other benchmarks. Our approach has successfully optimized the throughput and the energy ef-
ficiency to deal with a single frame. Accordingly,
we contend that Nested Bitmask Neural Network
(NBNN), is an efficient neural network structure
with only minor accuracy loss on the SoC system