Abstract:
We adopted the methodology of Point Fusion for combining Image Data and 3D Lidar Data
using sensor fusion through a neural network. The procedure involves passing image data
through a 2D object detector which in our case was Faster R-CNN (pretrained on MS
COCO), then using the region of interests obtained as 2D Labels predicted by 2D object
detector to crop point clouds and obtaining its regions of interest. The cropped images were
passed through a ResNet Model (pretrained on ImageNet) and final average feature layer was
extracted from it. The cropped point clouds and the obtained features were passed through a
neural network that contained PointNet and Point Fusion network layers. The results were
obtained in the form of 8 corners for each bounding box for each vehicle/pedestrian present
in the image