Abstract:
This thesis addresses the problem of saliency detection in crowded scenes. Crowded scenes are those which have irregular scene density. Crowded scenes are more frequent in real world with applications in public management, security, population monitoring and urban planning. However, the task is highly challenging due to the large individual competing for attention. While regular scene density saliency models have been promising, they are still limited in their capability to detect salient regions in crowded scenes. Recent researches have shown the important of faces in human scenes. Faces are most important body part in human body and psychology studies has shown talking faces are more salient. Deep learning has emerged as a powerful learning mechanism, able to learn higher-level deep features when provided with a relatively large amount of labeled training data. Such networks have shown state-of-the-art object and scene recognition results on the ImageNet dataset. Here, we used crowd features to cluster crowd scenes into different crowd levels. Then, using low-level features combined with different face attributes, a deep fully connected neural network for each crowd level was proposed. We benchmark our results with previous model on crowded scenes, and present superior results on the Crowd dataset.