Abstract:
AI based machine learning algorithms, that form the backbone of modern
object detection systems, are computationally very demanding and hence it
becomes very difficult for general purpose low-power embedded devices to
achieve the expected throughput. Therefore, in the recent times, there is
an increasing trend in developing specialized architectures for AI algorithms.
Usually GPUs are used as AI accelerators as they (GPUs) are very helpful
in increasing the computational speed of the processes. But initially GPUs
were developed for rendering purposes. In case the models are small, CPUs
can also be used instead of GPUs. Apart from GPUs and CPUs, FPGAs
are a good option for executing the AI algorithms. GPUs and CPUs do
not provide an efficient strategy for pipe lining and parallel processing. The
FPGAs provide pipeline parallelism; this helps in making the FPGAs much
more efficient than GPUs and CPUs [7]. The data processing pipeline in
FPGA is so efficient that it does not need typical control and instruction
fetch units. It also does not need register write-back and other execution
overheads [7]. FPGAs also carry an extra feature of being re-programmable.
But there is a major reason that engineers usually prefer GPUs over FPGAs,
that the execution of AI algorithms in the FPGAs is a bit difficult than
GPUs. This problem of difficulty is solved by Xilinx. They have introduced
a platform named Vitis-AI. Vitis-AI helps in executing AI algorithms on Xilinx development boards. The goal of this thesis is to make a comparison on
the basis of performance of similar AI algorithms between GPU and FPGA.
FPGA that is used in this thesis is Xilinx Zynq UltraScale+ MPSoC ZCU102.
The GPU that is being used is Jetson Nano. The most commonly used three
algorithms for object classification were shortlisted. These are ResNet-50,
Inception and Squeezenet. The reason behind selecting these algorithms is
that these are the ones that are most commonly used in object classification
applications. In this thesis, the main focus was to do the comparison on the
basis of object classification algorithms. The comparison is based on three
perimeters; Accuracy, Inference Time and Frames Per Second. Accuracy
helps us in getting an idea about how accurately an algorithm can classify
an object in an image. Inference time tells us how much time an algorithm
takes in getting the desired results and the frames per second is the frame
rate. In the work carried out in this thesis, the FPGA turned out to be
better than GPU and CPU if compared on the basis of the three shortlisted
perimeters.