NUST Institutional Repository

Two Stream Deep CNN-RNN Attentive pooling architecture for Video Based Person Re-Identification

Show simple item record

dc.contributor.author Wajeeha Ansar
dc.date.accessioned 2020-11-24T13:19:57Z
dc.date.available 2020-11-24T13:19:57Z
dc.date.issued 2018
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/13785
dc.description Supervisor: en_US
dc.description.abstract Person re-identification task is building up the correspondence between person images taken at different places and time from different cameras. It is an essential task for visual surveillance system. In this thesis, we propose a novel two stream convolutional - recurrent model with attentive pooling. Each stream of the model is a Siamese network and acquire different characteristics of feature maps. To fully utilize learned feature maps and to have some common features, we fuse the output of two streams. The use of attentive pooling in our model can choose only informative frames over the whole input video sequence. Then learn spatial and temporal information for these selective frames. For proposed model we used [1] as our base model; but we perform a lot of changes in this base code as follow: (i) we make it two stream model (ii) we add attentive pooling layer in it, which is till now only used for action recognition tasks (iii) we add extra dropout layer in CNN base model (iv) RGB and optical flows are separately treat as input to learn spatial and temporal information separately (v) two stream fusion is done in proposed model to make one siamese cost feature for person re-ID. Fusion is done using weighted function. Which gives more weights to spatial features because these features are more discriminative as compare to temporal features. Experiments are performed on three publically available person re-ID datasets: MARS, PRID-2011 and iLIDS-VID. Experimental results shows that our proposed model is considerably best for feature extraction, and it outperforms existing state-of-the-art supervised models. Results are more efficiently increased by using both RGB and Optical flows as input rather than using either of them independently. Proposed model gives 14.6%, 14.0% and 16% better accuracy for iLIDS-VID, PRID-2011 and MARS respectively at rank1 than base model. en_US
dc.publisher SEECS, National University of Sciences and Technology, Islamabad en_US
dc.subject Computer Science en_US
dc.title Two Stream Deep CNN-RNN Attentive pooling architecture for Video Based Person Re-Identification en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [375]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account