Abstract:
Given a video containing a person, the goal of person re-identification is to identify the same person from videos captured under different cameras. A common approach for...Show MoreMetadata
Abstract:
Given a video containing a person, the goal of person re-identification is to identify the same person from videos captured under different cameras. A common approach for tackling this problem is to first extract image features for all frames in the video. These frame-level features are then combined (e.g. via temporal pooling) to form a video-level feature vector. The video-level features of two input videos are then compared by calculating the distance between them. More recently, attention-based learning mechanism has been proposed for this problem. In particular, recurrent neural networks have been used to generate the attention scores of frames in a video. However, the limitation of RNN-based approach is that it is difficult for RNNs to capture long-range dependencies in videos. Inspired by the success of non-local neural networks, we propose a novel non-local temporal attention model in this paper. Our model can effectively capture long-range and global dependencies among the frames of the videos. Extensive experiments on three different benchmark datasets (i.e. iLIDS-VID, PRID-2011 and SDU-VID) show that our proposed method outperforms other state-of-the-art approaches.
Published in: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
Date of Conference: 18-21 September 2019
Date Added to IEEE Xplore: 25 November 2019
ISBN Information: