Employing Software and Hardware level Approximations for the optimization of Deep Neural Networks

Tariq, Rimsha

DSpace Home
→
E-Theses
→
CEME
→
Computer Engineering
→
MS
→
View Item

dc.contributor.author	Tariq, Rimsha
dc.date.accessioned	2023-08-09T11:30:01Z
dc.date.available	2023-08-09T11:30:01Z
dc.date.issued	2022
dc.identifier.other	319467
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/36073
dc.description	Supervisor: Dr. Sajid Gul Khawaja Co-Supervisor Dr. Farhan Hussain	en_US
dc.description.abstract	Neural Networks (NNs) are the core algorithms for many complex Artificial Intelligence (AI) applications, such as image and video classification and recognition, signal processing, etc. But these algorithms are both memory and computationally exhaustive, making it hard to deploy them on systems with restricted hardware sources. Subsequently, these systems are also extremely power greedy and expect a major amount of energy resources to perform the required computations. Approximate Computing (AC) has been gaining prominence for relieving computational and memory requirements of Deep Neural Networks (DNNs) benefiting from their error tolerance behavior. AC can be separated into two types of Hardware and Software layer approximations. In this research, we have considered optimization for CNN algorithms for H/W platforms specifically FPGAs. In this regard, we proposed multiple automated tools. The first tool deals with the memory optimization at S\W level approximations estimating the best levels (N) to quantize and encode weights with respect to user-defined requirements based on a genetic algorithm (GA). The GA makes use of a regression equation to determine the best population. This proposed module was tested for VGG-16, Cifar-Quick, and LeNet-5 returning the final population with an absolute maximum error of 0.038 when tested for originally quantized weights. Further, we proposed the design of an efficient decoder based on Canonical Huffman that can be utilized for the efficient decompression of weights in CNN. The proposed design makes use of Hash functions to effectively decode the weights eliminating the need for a searching dictionary. The proposed design decodes a single weight in a single clock cycle. Our proposed design has a maximum frequency of 408.97MHz utilizing 1% of system LUTs when tested for the Aritix 7 platform. Lastly, the third module deals with the estimation of the best approximate multipliers for H/W. The proposed module is based on GA and utilizes the tf-approximate library to calculate the accuracy loss in models for approximate multipliers. It was noted that the complex CNN model VGG-16 required more iterations to determine the best multipliers compared to the simpler model such as LeNet-5.	en_US
dc.language.iso	en	en_US
dc.publisher	College of Electrical & Mechanical Engineering (CEME), NUST	en_US
dc.title	Employing Software and Hardware level Approximations for the optimization of Deep Neural Networks	en_US
dc.type	Thesis	en_US