Optimization and Compression of CNNs for Reconfigurable Architectures

Riaz, Syed Morsleen

DSpace Home
→
E-Theses
→
CEME
→
Computer Engineering
→
MS
→
View Item

Optimization and Compression of CNNs for Reconfigurable Architectures

Riaz, Syed Morsleen

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35023

Date: 2023

Abstract:

Convolutional Neural Networks (CNNs) have gained great success in the past decade in solving challenging classification problems. In order to attain high accuracy, CNN models have to do an excessive number of computations. These computations require a maximum number of resources in terms of memory storage and computational time. For this purpose, Graphical Processing Units (GPUs) are used for deploying CNNs which improves the overall performance and reduces the computational complexity. However, implementation of CNNs on GPUs requires a capable processor and takes much power which prevents its use in the application of reconfigurable architectures. These models also consume maximum power and utilize a maximum amount of energy for high computations, due to this maximum resource utilization these models are hard to deploy on reconfigurable architectures and hence cannot be used in resource-restricted edge devices. Therefore, there is a need to develop an efficient strategy for the effective deployment of CNNs on edge devices or to use it in reconfigurable architectures. CNNs have error tolerance behavior and can predict the results using approximate values. By considering this point we have presented a memory-efficient and resource-limited technique for the compression and optimization of convolutional neural networks. Our main goal is to reduce the computational complexity and memory consumption of CNN architecture by preserving the model’s overall accuracy. To accomplish the declared objective, we proposed a collaborative network compression strategy where pruning based compression (PBC) is applied to lower the computational complexity of the model. PBC actually pruned the weight parameters of the layers in the network to achieve the maximum compression and hence produced the compressed model. In the next step, the pruned model is then divided into two sub-networks i.e., uniform or non-sparse network and sparse network. These networks are further optimized and compressed by using the proposed optimization technique. Due to the uneven weight distribution, optimization of network by incremental quantization (ONIQ) is used to quantize the layers in a sparse network. Similarly, the optimization of network by optimized quantization (ONOQ) technique is proposed and utilized for the layers of a uniform network to quantize the weight values up to optimal levels determined by the optimizer. An optimizer is used to extract the best levels for the quantization of weight parameters. These extracted optimal levels are used to obtain the greatest possible trade-off between compression ratio and model accuracy. We have applied the proposed strategy on LeNet-5 trained with the MNIST dataset, Cifar-Quick trained on the Cifar-10 dataset, and VGG-16 network trained with the ImageNet ILSVRC2012 dataset. We outperformed state-of-the-art techniques by achieving a high compression ratio with a very slight drop in accuracy.