NUST Institutional Repository

Employing Software and Hardware level Approximations for the optimization of Deep Neural Networks

Show simple item record

dc.contributor.author Tariq, Rimsha
dc.date.accessioned 2023-08-09T11:30:01Z
dc.date.available 2023-08-09T11:30:01Z
dc.date.issued 2022
dc.identifier.other 319467
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/36073
dc.description Supervisor: Dr. Sajid Gul Khawaja Co-Supervisor Dr. Farhan Hussain en_US
dc.description.abstract Neural Networks (NNs) are the core algorithms for many complex Artificial Intelligence (AI) applications, such as image and video classification and recognition, signal processing, etc. But these algorithms are both memory and computationally exhaustive, making it hard to deploy them on systems with restricted hardware sources. Subsequently, these systems are also extremely power greedy and expect a major amount of energy resources to perform the required computations. Approximate Computing (AC) has been gaining prominence for relieving computational and memory requirements of Deep Neural Networks (DNNs) benefiting from their error tolerance behavior. AC can be separated into two types of Hardware and Software layer approximations. In this research, we have considered optimization for CNN algorithms for H/W platforms specifically FPGAs. In this regard, we proposed multiple automated tools. The first tool deals with the memory optimization at S\W level approximations estimating the best levels (N) to quantize and encode weights with respect to user-defined requirements based on a genetic algorithm (GA). The GA makes use of a regression equation to determine the best population. This proposed module was tested for VGG-16, Cifar-Quick, and LeNet-5 returning the final population with an absolute maximum error of 0.038 when tested for originally quantized weights. Further, we proposed the design of an efficient decoder based on Canonical Huffman that can be utilized for the efficient decompression of weights in CNN. The proposed design makes use of Hash functions to effectively decode the weights eliminating the need for a searching dictionary. The proposed design decodes a single weight in a single clock cycle. Our proposed design has a maximum frequency of 408.97MHz utilizing 1% of system LUTs when tested for the Aritix 7 platform. Lastly, the third module deals with the estimation of the best approximate multipliers for H/W. The proposed module is based on GA and utilizes the tf-approximate library to calculate the accuracy loss in models for approximate multipliers. It was noted that the complex CNN model VGG-16 required more iterations to determine the best multipliers compared to the simpler model such as LeNet-5. en_US
dc.language.iso en en_US
dc.publisher College of Electrical & Mechanical Engineering (CEME), NUST en_US
dc.title Employing Software and Hardware level Approximations for the optimization of Deep Neural Networks en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [329]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account