Abstract
Person search aims to locate and recognize a specified person from a gallery of uncropped scene images, which combines pedestrian detection and person re-identification (re-ID). Existing methods based on Faster R-CNN have been widely used to tackle the two sub-tasks jointly, but they ignore the feature misalignment problem, i.e., re-ID feature localization is not fully aligned with the detected bounding boxes (BBoxes). Due to the fine-grained property of re-ID, it is crucial to extract accurate appearance features. In addition, the granularity of BBoxes detected from gallery images is quite different, and it is defective to treat gallery boxes with different granularity as equal in estimating their similarities with the query. Three-way decision methods are fields of research on human-inspired computation. Inspired by them, we propose a three-way-based feature alignment framework (3W-AlignNet) to optimize the re-ID feature localization. The framework is implemented by iteratively generating new BBoxes and features from previous BBoxes. The three-way decision theory is applied to avoid the mismatch problem caused by increasing Intersection over Union (IoU). We further propose a Granularity Weighted Similarity (GWS) algorithm to relieve the granularity mismatch problem. Extensive experiments show that our method outperforms all other state-of-the-art end-to-end methods on two widely used person search datasets, CUHK-SYSU and PRW.
Similar content being viewed by others
References
Xiao T, Li S, Wang B, Lin L, Wang X. Joint detection and identification feature learning for person search. In Proc IEEE Conf Comput Vis Pattern Recognit. 2017;3415–24.
Dollár P, Appel R, Belongie S, Perona P. Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell. 2014;36(8):1532–45.
Zhang S, Bauckhage C, Cremers AB. Informed haar-like features improve pedestrian detection. In Proc IEEE Conf Comput Vis Pattern Recognit. 2014;947–54.
Yang B, Yan J, Lei Z, Li SZ. Convolutional channel features. In Proceedings of the IEEE International Conference on Computer Vision. 2015;82–90.
Liao S, Hu Y, Zhu X, Li SZ. Person re-identification by local maximal occurrence representation and metric learning. In Proc IEEE Conf Comput Vis Pattern Recognit. 2015;2197–206.
Cheng D, Gong Y, Zhou S, Wang J, Zheng N. Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In Proc IEEE Conf Comput Vis Pattern Recognit. 2016;1335–44.
Xiao T, Li H, Ouyang W, Wang X. Learning deep feature representations with domain guided dropout for person re-identification. In Proc IEEE Conf Comput Vis Pattern Recognit. 2016;1249–58.
Chen D, Zhang S, Ouyang W, Yang J, Tai Y. Person search by separated modeling and a mask-guided two-stream CNN model. In Proceedings of the European Conference on Computer Vision. 2020;29:4669–82.
Lan X, Zhu X, Gong S. Person search by multi-scale matching. In Proceedings of the European Conference on Computer Vision. 2018;536–52.
Wang C, Ma B, Chang H, Shan S, Chen X. Tcts: A task-consistent two-stage framework for person search. In Proc IEEE/CVF Conf Comput Vis Pattern Recognit. 2020;11952–61.
Xiao J, Xie Y, Tillo T, Huang K, Wei Y, Feng J. IAN: the individual aggregation network for person search. Pattern Recogn. 2019;87:332–40.
Yan Y, Zhang Q, Ni B, Zhang W, Xu M, Yang X. Learning context graph for person search. In Proc IEEE/CVF Conf Comput Vis Pattern Recognit. 2019;2158–67.
Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2015;91–9.
Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection. In Proc. IEEE Conf Comput Vis Pattern Recognit. 2018;6154–62.
Yao Y. Three-way decisions with probabilistic rough sets. Inf Sci. 2010;180(3):341–53.
Wen P, Li Y, Polkowski L, Yao Y, Tsumoto S, Wang G. Three-way decision: An interpretation of rules in rough set theory. In International Conference on Rough Sets and Knowledge Technology. 2009;642–9.
Yao Y. An outline of a theory of three-way decisions. In International Conference on Rough Sets and Current Trends in Computing. 2012;1–17.
Yao Y, Wang S, Deng X. Constructing shadowed sets and three-way approximations of fuzzy sets. Inf Sci. 2017;132–53.
Yao Y, Wang S, Deng X. Constructing shadowed sets and three-way approximations of fuzzy sets. Inf Sci. 2017;412:132–53.
Li H, Zhang L, Huang B, Zhou X. Sequential three-way decision and granulation for cost-sensitive face recognition. Knowl-Based Syst. 2016;91(C):241–51.
Zhang Y, Zhang Z, Miao D, Wang J. Three-way enhanced convolutional neural networks for sentence-level sentiment classification. Inf Sci. 2019;477:55–64.
Liu H, Feng J, Jie Z, Jayashree K, Zhao B, Qi M, Jiang J, Yan S. Neural person search machines. In Proceedings of the IEEE International Conference on Computer Vision. 2017;493–501.
Munjal B, Amin S, Tombari F, Galasso F. Query-guided end-to-end person search. In Proc IEEE/CVF Conf Comput Vis Pattern Recognit. 2019;811–20.
Dong W, Zhang Z, Song C, Tan T. Bi-directional interaction network for person search. In Proc IEEE/CVF Conf Comput Vis Pattern Recognit. 2020;2839–48.
Chen D, Zhang S, Yang J, Schiele B. Norm-aware embedding for efficient person search. In Proc IEEE/CVF Conf Comput Vis Pattern Recognit. 2020;12615–24.
Chen T, Miao D, Zhang Y. A graph-based keyphrase extraction model with three-way decision. In International Joint Conference on Rough Sets. 2020;111–21.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc IEEE Conf Comput Vis Pattern Recognit. 2016;770–8.
He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2017;2961–9.
Girshick R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2015;1440–8.
Gidaris S, Komodakis N. Attend refine repeat: Active box proposal generation via in-out localization. Proceedings of the British Machine Vision Conference. 2016;90:1–13.
Zheng L, Zhang H, Sun S, Chandraker M, Yang Y, Tian Q. Person re-identification in the wild. In Proc IEEE Conf Comput Vis Pattern Recognit. 2017;1367–76.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conf Comput Vis Pattern Recognit. 2009;248–55.
Han C, Ye J, Zhong Y, Tan X, Zhang C, Gao C, Sang N. Re-id driven localization refinement for person search. In Proc IEEE/CVF International Conference on Computer Vision. 2019;9814–23.
Acknowledgements
The research is supported in part by the National Nature Science Foundation of China (Grant Nos. 61976158, 61673301, and 62076182).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed Consent
Informed consent is obtained from all individual participants included in the study.
Conflict of Interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Yang, Y., Miao, D. & Zhang, H. 3W-AlignNet: a Feature Alignment Framework for Person Search with Three-Way Decision Theory. Cogn Comput 14, 1913–1923 (2022). https://doi.org/10.1007/s12559-021-09898-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09898-7