Abstract:
Self-driving cars are an active area of interdisciplinary research spanning Artificial
Intelligence (AI), Internet of Things (IoT), embedded systems, and control engineering. One
crucial component needed in ensuring autonomous navigation is to accurately detect vehicles,
pedestrians, or other obstacles on the road and ascertain their distance from the self-driving
vehicle. The primary algorithms employed for this purpose involve the use of cameras and
LiDAR data. The third category of algorithms consists of a fusion between these two sensor
data. Sensor fusion networks take input as 2D camera images and LiDAR point clouds to
output 3D bounding boxes as detection results. In this thesis, we categorize object detection
networks on the basis of input data. We experimentally evaluate the performance of three
object detection methods. These detection networks are YOLOv3, Birds Eye View (BEV)
network and PointFusion.
We offer a comparison of three object detection networks by considering the following
metrics - accuracy, performance in occluded environment, and computational complexity. The
results of various existing procedures are replicated on the KITTI benchmark dataset to
demonstrate the contrast. KITTI is a standard dataset used by many academics in their research
for vehicle detection. Average Precision (%) reported by YOLOv3, BEV and PointFusion are
42, 45 and 47.8% respectively. Through qualitative and quantitative results, it is shown that
the performance of a sensor fusion network is superior to single-input networks.