research-article

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Authors:
Chen Shen

Zhejiang University & Alibaba Group, Hangzhou, China

Zhejiang University & Alibaba Group, Hangzhou, China
View Profile

,
Zhongming Jin

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Yiru Zhao

Shanghai Jiao Tong University & Alibaba Group, Shanghai, China

Shanghai Jiao Tong University & Alibaba Group, Shanghai, China
View Profile

,
Zhihang Fu

Zhejiang University & Alibaba Group, Hangzhou, China

Zhejiang University & Alibaba Group, Hangzhou, China
View Profile

,
Rongxin Jiang

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China
View Profile

,
Yaowu Chen

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China
View Profile

,
Xian-Sheng Hua

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

MM '17: Proceedings of the 25th ACM international conference on MultimediaOctober 2017Pages 1942–1950https://doi.org/10.1145/3123266.3123452

Published:23 October 2017Publication History

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 1942–1950

ABSTRACT

Person re-identification (re-ID), which aims at spotting a person of interest across multiple camera views, has gained more and more attention in computer vision community. In this paper, we propose a novel deep Siamese architecture based on convolutional neural network (CNN) and multi-level similarity perception. According to the distinct characteristics of diverse feature maps, we effectively apply different similarity constraints to both low-level and high-level feature maps, during training stage. Therefore, our network can efficiently learn discriminative feature representations at different levels, which significantly improves the re-ID performance. Besides, our framework has two additional benefits. Firstly, classification constraints can be easily incorporated into the framework, forming a unified multi-task network with similarity constraints. Secondly, as similarity comparable information has been encoded in the network's learning parameters via back-propagation, pairwise input is not necessary at test time. That means we can extract features of each gallery image and build index in an off-line manner, which is essential for large-scale real-world applications. Experimental results on multiple challenging benchmarks demonstrate that our method achieves splendid performance compared with the current state-of-the-art approaches.

References

Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An improved deep learning architecture for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3908--3916.Google Scholar
Léon Bottou. 2012. Stochastic gradient descent tricks. Neural networks: Tricks of the trade. Springer, 421--436.Google Scholar
Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016 b. Similarity learning with spatial constraints for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1268--1277.Google Scholar
Dapeng Chen, Zejian Yuan, Gang Hua, Nanning Zheng, and Jingdong Wang. 2015 a. Similarity learning on an explicit polynomial kernel feature map for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1565--1573.Google Scholar
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2016 a. A Multi-task Deep Network for Person Re-identification. arXiv preprint arXiv:1607.05369 (2016).Google Scholar
Ying-Cong Chen, Wei-Shi Zheng, and Jianhuang Lai. 2015 b. Mirror Representation for Modeling View-Specific Transform in Person Re-Identification. IJCAI. Citeseer, 3402--3408. Google ScholarDigital Library
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, Vol. Vol. 1. IEEE, 539--546. Google ScholarDigital Library
Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2360--2367.Google Scholar
Niloofar Gheissari, Thomas B Sebastian, and Richard Hartley. 2006. Person reidentification using spatiotemporal appearance Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. Vol. 2. IEEE, 1528--1535. Google ScholarDigital Library
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2009. Is that you? Metric learning approaches for face identification Computer Vision, 2009 IEEE 12th international conference on. IEEE, 498--505.Google Scholar
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 448--456. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
Cijo Jose and Franccois Fleuret. 2016. Scalable metric learning via weighted approximate rank component analysis European Conference on Computer Vision. Springer, 875--890.Google Scholar
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2288--2295. Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
Sheng Li, Ming Shao, and Yun Fu. 2015. Cross-View Projective Dictionary Learning for Person Re-Identification. IJCAI. 2155--2161. Google ScholarDigital Library
Wei Li, Rui Zhao, and Xiaogang Wang. 2012. Human reidentification with transferred metric learning Asian Conference on Computer Vision. Springer, 31--44. Google ScholarDigital Library
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. Deepreid: Deep filter pairing neural network for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 152--159. Google ScholarDigital Library
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197--2206.Google Scholar
Shengcai Liao and Stan Z Li. 2015. Efficient psd constrained asymmetric metric learning for person re-identification Proceedings of the IEEE International Conference on Computer Vision. 3685--3693. Google ScholarDigital Library
Jiawei Liu, Zheng-Jun Zha, QI Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-Scale Triplet CNN for Person Re-Identification Proceedings of the 2016 ACM on Multimedia Conference. ACM, 192--196. Google ScholarDigital Library
Xiaokai Liu, Hongyu Wang, Yi Wu, Jimei Yang, and Ming-Hsuan Yang. 2015. An ensemble color model for human re-identification Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on. IEEE, 868--875. Google ScholarDigital Library
Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical gaussian descriptor for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1363--1372.Google Scholar
Alexis Mignon and Frédéric Jurie. 2012. Pcca: A new approach for distance learning from sparse pairwise constraints Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2666--2672. Google ScholarDigital Library
Hyeonjoon Moon and P Jonathon Phillips. 2001. Computational and performance aspects of PCA-based face-recognition algorithms. Perception, Vol. 30, 3 (2001), 303--321.Google ScholarCross Ref
Sakrapee Paisitkriangkrai, Chunhua Shen, and Anton van den Hengel. 2015. Learning to rank in person re-identification with metric ensembles Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1846--1855.Google Scholar
Sateesh Pedagadi, James Orwell, Sergio Velastin, and Boghos Boghossian. 2013. Local fisher discriminant analysis for pedestrian re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3318--3325. Google ScholarDigital Library
Bryan Prosser, Wei-Shi Zheng, Shaogang Gong, Tao Xiang, and Q Mary. 2010. Person Re-Identification by Support Vector Ranking. BMVC, Vol. Vol. 2. 6.Google ScholarCross Ref
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, and others. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, 3 (2015), 211--252. Google ScholarDigital Library
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.Google Scholar
Hailin Shi, Yang Yang, Xiangyu Zhu, Shengcai Liao, Zhen Lei, Weishi Zheng, and Stan Z Li. 2016. Embedding deep metric for person re-identification: A study against large variations European Conference on Computer Vision. Springer, 732--748.Google Scholar
Zhiyuan Shi, Timothy M Hospedales, and Tao Xiang. 2015. Transferring a semantic representation for person re-identification and search Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4184--4193.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1 (2014), 1929--1958. Google ScholarDigital Library
Arulkumar Subramaniam, Moitreya Chatterjee, and Anurag Mittal. 2016. Deep Neural Networks with Inexact Matching for Person Re-Identification Advances in Neural Information Processing Systems. 2667--2675.Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarCross Ref
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google Scholar
Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification European Conference on Computer Vision. Springer, 791--808.Google Scholar
Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016 b. Joint learning of single-image and cross-image representations for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288--1296.Google Scholar
Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen. 2016 a. A survey on learning to hash. arXiv preprint arXiv:1606.00185 (2016).Google Scholar
Kilian Q Weinberger, John Blitzer, and Lawrence Saul. 2006. Distance metric learning for large margin nearest neighbor classification. Advances in neural information processing systems Vol. 18 (2006), 1473.Google Scholar
Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1249--1258.Google Scholar
Fei Xiong, Mengran Gou, Octavia Camps, and Mario Sznaier. 2014. Person re-identification using kernel-based metric learning methods European conference on computer vision. Springer, 1--16.Google Scholar
Yang Yang, Shengcai Liao, Zhen Lei, and Stan Z Li. 2016. Large Scale Similarity Learning Using Similar Pairs for Person Verification. AAAI. 3655--3661. Google ScholarDigital Library
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. 2014. Deep metric learning for person re-identification. Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 34--39. Google ScholarDigital Library
Li Zhang, Tao Xiang, and Shaogang Gong. 2016 b. Learning a discriminative null space for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1239--1248.Google Scholar
Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016 a. Sample-specific svm learning for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1278--1287.Google Scholar
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision. 2528--2535. Google ScholarDigital Library
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Learning mid-level filters for person re-identification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 144--151. Google ScholarDigital Library
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision. 1116--1124. Google ScholarDigital Library
Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016 a. Person Re-identification: Past, Present and Future. arXiv preprint arXiv:1610.02984 (2016).Google Scholar
Liang Zheng, Yi Yang, and Qi Tian. 2016 b. SIFT meets CNN: a decade survey of instance retrieval. arXiv preprint arXiv:1608.01807 (2016).Google Scholar

Index Terms

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification
  2. Machine learning

Recommendations

Multi-level Similarity Perception Network for Person Re-identification

In this article, we propose a novel deep Siamese architecture based on a convolutional neural network (CNN) and multi-level similarity perception for the person re-identification (re-ID) problem. According to the distinct characteristics of diverse ...
Read More
Part-based Feature Extraction for Person Re-identification
ICMLC '18: Proceedings of the 2018 10th International Conference on Machine Learning and Computing

In this paper, we propose a new part-based CNN feature extraction method for end-to-end person re-identification. In our method, the input images are first divided into two different non-overlapping parts, and then two different CNN models are trained ...
Read More
Multi-view feature fusion for person re-identification
Abstract
Person re-identification (ReID) suffers from camera view variants. Existing works, which typically learn a feature for each image, share a limitation that the learned features are single-view: each feature only contains information in ...
Highlights
- The complementary-view features are defined to mitigate view bias.
- Multi-view ...
Graphical abstract

Display Omitted
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolutional neural network
deep siamese architecture
multi-level similarity perception
person re-identification
Qualifiers
- research-article
Conference

Acceptance Rates
MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 1,050
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

MM '17: Proceedings of the 25th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-level Similarity Perception Network for Person Re-identification

Part-based Feature Extraction for Person Re-identification

Multi-view feature fusion for person re-identification