Abstract:
Our framework consists of a multi-object tracking module and a video segmentation mod-
ule, which rstly run on our dataset independently. The tracking module implements
higher-order constraints on the smoothness of the object tracks and obtains high quality
trajectories through an iterative solution method using Lagrangian relaxation. In the seg-
mentation module, foreground and background super-pixels are obtained by clustering and
a linear SVM is trained using Lab color channel to get the nal likelihood of the superpixels
belonging to foreground. To assign the superpixels to a target, optical
ow and color are
used as features. Both the modules provide the location and IDs of the targets across the
video, hence, the errors in one module can be corrected using the results of the other. In
the joint processing module of our framework, the location of the tracking bounding boxes
is re ned using the segmentation results so that they are more accurately positioned on the
targets and include the least number of background pixels. ID switches in the segmentation
module are corrected using the tracking results which are more accurate in this respect.
Target detections initially missed are added to the results of each module with the help of
the other. Hence, this joint processing results in greater accuracy in both the segmentation
and tracking results.