NUST Institutional Repository

Optimization and Compression of CNNs for Reconfigurable Architectures

Show simple item record

dc.contributor.author Riaz, Syed Morsleen
dc.date.accessioned 2023-07-25T05:22:19Z
dc.date.available 2023-07-25T05:22:19Z
dc.date.issued 2023
dc.identifier.other 329464
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/35023
dc.description Supervisor: Dr. Sajid Gul Khawaja Co-Supervisor Dr. M. Usman Akram en_US
dc.description.abstract Convolutional Neural Networks (CNNs) have gained great success in the past decade in solving challenging classification problems. In order to attain high accuracy, CNN models have to do an excessive number of computations. These computations require a maximum number of resources in terms of memory storage and computational time. For this purpose, Graphical Processing Units (GPUs) are used for deploying CNNs which improves the overall performance and reduces the computational complexity. However, implementation of CNNs on GPUs requires a capable processor and takes much power which prevents its use in the application of reconfigurable architectures. These models also consume maximum power and utilize a maximum amount of energy for high computations, due to this maximum resource utilization these models are hard to deploy on reconfigurable architectures and hence cannot be used in resource-restricted edge devices. Therefore, there is a need to develop an efficient strategy for the effective deployment of CNNs on edge devices or to use it in reconfigurable architectures. CNNs have error tolerance behavior and can predict the results using approximate values. By considering this point we have presented a memory-efficient and resource-limited technique for the compression and optimization of convolutional neural networks. Our main goal is to reduce the computational complexity and memory consumption of CNN architecture by preserving the model’s overall accuracy. To accomplish the declared objective, we proposed a collaborative network compression strategy where pruning based compression (PBC) is applied to lower the computational complexity of the model. PBC actually pruned the weight parameters of the layers in the network to achieve the maximum compression and hence produced the compressed model. In the next step, the pruned model is then divided into two sub-networks i.e., uniform or non-sparse network and sparse network. These networks are further optimized and compressed by using the proposed optimization technique. Due to the uneven weight distribution, optimization of network by incremental quantization (ONIQ) is used to quantize the layers in a sparse network. Similarly, the optimization of network by optimized quantization (ONOQ) technique is proposed and utilized for the layers of a uniform network to quantize the weight values up to optimal levels determined by the optimizer. An optimizer is used to extract the best levels for the quantization of weight parameters. These extracted optimal levels are used to obtain the greatest possible trade-off between compression ratio and model accuracy. We have applied the proposed strategy on LeNet-5 trained with the MNIST dataset, Cifar-Quick trained on the Cifar-10 dataset, and VGG-16 network trained with the ImageNet ILSVRC2012 dataset. We outperformed state-of-the-art techniques by achieving a high compression ratio with a very slight drop in accuracy. en_US
dc.language.iso en en_US
dc.publisher College of Electrical & Mechanical Engineering (CEME), NUST en_US
dc.subject Key Words: Network Compression, Deep Learning, Neural Networks, Reconfigurable Architecture, Approximate Computing, Memory Optimization en_US
dc.title Optimization and Compression of CNNs for Reconfigurable Architectures en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [331]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account