NUST Institutional Repository

A Highly Optimized Design Space Exploration Scheme for implementing Deep Convolution Neural Networks

Show simple item record

dc.contributor.author M. Sohaib Ul Hassan
dc.date.accessioned 2021-01-12T10:01:43Z
dc.date.available 2021-01-12T10:01:43Z
dc.date.issued 2020
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/21007
dc.description Supervisor: DR. UMAR SHAHBAZ KHAN Co-Supervisor: DR SAJID GUL KHAWAJA en_US
dc.description.abstract Convolutional Neural Network (CNN) is an important machine learning algorithm. Due to its broad applications and classification accuracy it has become hot topic in recent times. Convolutional Neural Networks are both computationally expensive and have extensive memory accesses which has rendered it inefficient on general purpose computers. GPU implementations have improved the performance of algorithm but high energy consumption of GPUs doesn’t allow its usage in robotics and mobile embedded platforms. This study presents the implementation details of mapping Convolutional Neural Networks on field programmable gate arrays (FPGAs). Visual Geometric Group (VGG-16) Networks are the most admired CNN architectures in community. They have uniform and regular structure which is most suitable to be implemented on FPGA. So, a detailed discussion of mapping VGG-16 style networks on FPGA is presented. Flower Recognition example of Kaggle was used as case study. Training of a VGG style network was carried out on core i9 computer with NVIDIA GTX 1660 GPU. On dataset trained network achieved an accuracy of 90%. Trained CNNs are algorithmically simple to model and deploy. Xilinx Zynq Zedboard was used for analytical modeling and mapping of CNN. Trained CNN was partitioned into two parts hardware part and software part. Hardware part being comprised of computationally extensive convolutions and software part being comprised of computationally less expensive tasks such as Pooling layer, Fully Connected layer and SoftMax layer. Hardware part of CNN was mapped on Zynq-PL and software part was mapped on Zynq-PS. For different types of parallelism opportunities that exist in CNN workload, proposed methodology achieved inter output parallelism in design of hardware accelerator on Zynq-PL. Hardware design on Zynq-PL also took into consideration memory access patterns of convolution operation and optimized them to achieve good performance. For a complete network implementation, proposed methodology achieved a peak performance of 1.3 GMACCs at 120 MHz frequency and achieved a speed up of 4 times compared to software implementation on General Purpose Computer. en_US
dc.publisher CEME, National University of Sciences and Technology, Islamabad en_US
dc.subject FPGA, Convolutional Neural Network, VGG-16, Zedboard en_US
dc.title A Highly Optimized Design Space Exploration Scheme for implementing Deep Convolution Neural Networks en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [205]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account