Abstract:
Deep Neural Networks (DNNs) are, in general, very compute and memory
intensive and require significant hardware as well as energy/power resources
for performing inference. Compression of deep neural networks reduces these
requirements by pruning (careful elimination of insignificant network connections) and weight sharing. Numerous works have been carried out in
designing hardware accelerators for compressed deep neural network inference. However, these work provide very limited information for reproducing
their design in an absolute manner. The main goal of this thesis is to design
control and datapath logic of one of the state-of-the-art accelerators, i.e.,
Efficient Inference Engine (EIE), such that it can efficiently perform sparse
matrix-vector multiplication (SpMV) of sparse weight matrix and sparse activation vector by minimizing the intermediate stalls. This thesis will also
study different parameters of the EIE architecture in order to propose an
inference engine which can, in general, offer superior efficiency as compared
to EIE.