Abstract:
Object recognition and tracking are one of the most active research areas related to field of computer vision that deals with detecting instances of objects. Object detection and tracking has many applications such as image retrieval, video surveillance and missile tracking etc.
The primary reason that object recognition and tracking is still one of the most active research area is because of the fact that it is blended in nearly every walk of life. Object recognition deals with recognizing of the object from a remote scene environment. Whenever an object of certain attributes is detected, all the other objects in the same scene that constitutes the object of interest are considered as noise. An algorithm needs not only to be precise in detecting the object from remote scene environment but it needs to be quick also. Similarly in case of tracking, object needs to detected and recognized in subsequent consecutive frames without being occluded or distorted by noise. Tracking is mainly based on probabilistic theory of estimating the positioning of object in subsequent frames based on its behavior in the current and previous frames.
In this thesis, an ensemble algorithm has been proposed which will first detect the image and then after detection and recognition, image tracking routine will be called. Image processing and pattern classification are used for both object recognition and tracking. The proposed system is based on different stages such as preprocessing, region detection, feature extraction, classification and finally the image tracking.
After the preprocessing stage is completed, main algorithm is primarily divided into three main parts. In the first phase, object recognition is performed. Object recognition is performed on the basis of Maximum Average Correlation Height (MACH) filter. MACH uses average similarity measure (ASM) to determine the location of the object of interest. MACH is designed in such a way that it gives maximum average correlation peak height with respect to the expected distortion and then it proceeds to minimize ASM.
MACH is performed on the first frame of the video sequence to recognize the object of interest. Once recognized, it is then fed to the second phase of the algorithm. The second phase of the algorithm is based on Affine Scale Invariant Feature Transform (ASIFT). As the name implies, ASIFT is a complete affine algorithm that can provide invariance up to six affine parameters i.e. zoom, rotation, translation (2 parameters) and two camera axis orientations (2 parameters). ASIFT is used in the project primarily for detection of the object once it changes its location drastically.
11
MACH tends to falter when object in successive frames changes one of the six parameters earlier. If the object of interest rotates or zoom-in/zoom-out drastically, ASIFT will be able to recognize it and feed the result to the tracking algorithm.
The third and final phase of the algorithm is based on tracking algorithm. Tracking is performed using Approximate Proximal Gradient (APG) approach. APG uses conventional Particle Filters for tracking purposes except that it speeds up the tracking procedure using sparse representation of the probabilities involved. A norm related minimization technique is introduced which improves the efficiency of Particle filters by discretizing the problem set of involved coefficients into two types of templates i.e. target templates and trivial templates. Using the APG approach the coordinates of the bounding box that constitutes the object are periodically updated over successive frames i.e. the object is tracked successfully. This is the first time that the ensemble of MACH and ASIFT has been employed for object recognition purposes and then merged with the APG algorithm.