dc.description.abstract |
Neural Networks (NNs) are the core algorithms for many complex Artificial Intelligence (AI)
applications, such as image and video classification and recognition, signal processing, etc. But
these algorithms are both memory and computationally exhaustive, making it hard to deploy them
on systems with restricted hardware sources. Subsequently, these systems are also extremely
power greedy and expect a major amount of energy resources to perform the required
computations. Approximate Computing (AC) has been gaining prominence for relieving
computational and memory requirements of Deep Neural Networks (DNNs) benefiting from their
error tolerance behavior. AC can be separated into two types of Hardware and Software layer
approximations. In this research, we have considered optimization for CNN algorithms for H/W
platforms specifically FPGAs. In this regard, we proposed multiple automated tools. The first tool
deals with the memory optimization at S\W level approximations estimating the best levels (N) to
quantize and encode weights with respect to user-defined requirements based on a genetic
algorithm (GA). The GA makes use of a regression equation to determine the best population. This
proposed module was tested for VGG-16, Cifar-Quick, and LeNet-5 returning the final population
with an absolute maximum error of 0.038 when tested for originally quantized weights. Further,
we proposed the design of an efficient decoder based on Canonical Huffman that can be utilized
for the efficient decompression of weights in CNN. The proposed design makes use of Hash
functions to effectively decode the weights eliminating the need for a searching dictionary. The
proposed design decodes a single weight in a single clock cycle. Our proposed design has a
maximum frequency of 408.97MHz utilizing 1% of system LUTs when tested for the Aritix 7
platform. Lastly, the third module deals with the estimation of the best approximate multipliers for
H/W. The proposed module is based on GA and utilizes the tf-approximate library to calculate the
accuracy loss in models for approximate multipliers. It was noted that the complex CNN model
VGG-16 required more iterations to determine the best multipliers compared to the simpler model
such as LeNet-5. |
en_US |