Performance Comparison of AI Algorithms on GPUs and FPGA Based AI Platforms

Shahzad, Mustafa

DSpace Home
→
E-Theses
→
SEECS
→
Electrical Engineering
→
MS
→
View Item

Performance Comparison of AI Algorithms on GPUs and FPGA Based AI Platforms

Shahzad, Mustafa

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35238

Date: 2023

Abstract:

AI based machine learning algorithms, that form the backbone of modern object detection systems, are computationally very demanding and hence it becomes very difficult for general purpose low-power embedded devices to achieve the expected throughput. Therefore, in the recent times, there is an increasing trend in developing specialized architectures for AI algorithms. Usually GPUs are used as AI accelerators as they (GPUs) are very helpful in increasing the computational speed of the processes. But initially GPUs were developed for rendering purposes. In case the models are small, CPUs can also be used instead of GPUs. Apart from GPUs and CPUs, FPGAs are a good option for executing the AI algorithms. GPUs and CPUs do not provide an efficient strategy for pipe lining and parallel processing. The FPGAs provide pipeline parallelism; this helps in making the FPGAs much more efficient than GPUs and CPUs [7]. The data processing pipeline in FPGA is so efficient that it does not need typical control and instruction fetch units. It also does not need register write-back and other execution overheads [7]. FPGAs also carry an extra feature of being re-programmable. But there is a major reason that engineers usually prefer GPUs over FPGAs, that the execution of AI algorithms in the FPGAs is a bit difficult than GPUs. This problem of difficulty is solved by Xilinx. They have introduced a platform named Vitis-AI. Vitis-AI helps in executing AI algorithms on Xilinx development boards. The goal of this thesis is to make a comparison on the basis of performance of similar AI algorithms between GPU and FPGA. FPGA that is used in this thesis is Xilinx Zynq UltraScale+ MPSoC ZCU102. The GPU that is being used is Jetson Nano. The most commonly used three algorithms for object classification were shortlisted. These are ResNet-50, Inception and Squeezenet. The reason behind selecting these algorithms is that these are the ones that are most commonly used in object classification applications. In this thesis, the main focus was to do the comparison on the basis of object classification algorithms. The comparison is based on three perimeters; Accuracy, Inference Time and Frames Per Second. Accuracy helps us in getting an idea about how accurately an algorithm can classify an object in an image. Inference time tells us how much time an algorithm takes in getting the desired results and the frames per second is the frame rate. In the work carried out in this thesis, the FPGA turned out to be better than GPU and CPU if compared on the basis of the three shortlisted perimeters.