Skip to main content
Log in

Attribute-Image Person Re-identification via Modal-Consistent Metric Learning

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Attribute-image person re-identification (AIPR) is a cross-modal retrieval task that searches person images who meet a list of attributes. Due to large modal gaps between attributes and images, current AIPR methods generally depend on cross-modal feature alignment, but they do not pay enough attention to similarity metric jitters among varying modal configurations (i.e., attribute probe vs. image gallery, image probe vs. attribute gallery, image probe vs. image gallery, and attribute probe vs. attribute gallery). In this paper, we propose a modal-consistent metric learning (MCML) method that stably measures comprehensive similarities between attributes and images. Our MCML is with favorable properties that differ in two significant ways from previous methods. First, MCML provides a complete multi-modal triplet (CMMT) loss function that pulls the distance between the farthest positive pair as close as possible while pushing the distance between the nearest negative pair as far as possible, independent of their modalities. Second, MCML develops a modal-consistent matching regularization (MCMR) to reduce the diversity of matching matrices and guide consistent matching behaviors on varying modal configurations. Therefore, our MCML integrates the CMMT loss function and MCMR, requiring no complex cross-modal feature alignments. Theoretically, we offer the generalization bound to establish the stability of our MCML model by applying on-average stability. Experimentally, extensive results on PETA and Market-1501 datasets show that the proposed MCML is superior to the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. We use \(i \in [ m ]\) to denote that i is generated from \([ m ] = \{ {1,2,...,m} \}\). The same definition is also applied to \(l_i\!\in \! [c].\)

  2. The single-modal HMT loss function means only images are applied to the HMT loss function, while the cross-modal HMT loss function means both images and attributes are applied to the HMT loss function.

References

  • Andrew, G., Arora, R., Bilmes, J., & Livescu, K. (2013). Deep canonical correlation analysis. In ICML (pp. 1247–1255).

  • Bousquet, O., Klochkov, Y., & Zhivotovskiy, N. (2020). Sharper bounds for uniformly stable algorithms. In PMLR conference on learning theory (pp. 610–626).

  • Cao, Y. T., Wang, J., & Tao, D. (2020). Symbiotic adversarial learning for attribute-based person search. In ECCV.

  • Deng, Y., Luo, P., Loy, C. C., & Tang, X. (2014). Pedestrian attribute recognition at far distance. In ACMMM (pp. 789–792).

  • Dong, Q., Gong, S., & Zhu, X. (2019). Person search by text attribute query as zero-shot learning. In CVPR (pp. 3652–3661).

  • Eisenschtat, A., & Wolf, L. (2017). Linking image and text with 2-way nets. In CVPR (pp. 4601–4611).

  • Feldman, V., & Vondrak, J. (2018). Generalization bounds for uniformly stable algorithms. In NeurIPS (pp. 9770–9780).

  • Feldman, V., & Vondrak, J. (2019). High probability generalization bounds for uniformly stable algorithms with nearly optimal rate. In PMLR conference on learning theory (pp. 1270–1279).

  • Felix, R., Kumar, V. B., Reid, I., & Carneiro, G. (2018). Multi-modal cycle-consistent generalized zero-shot learning. In ECCV (pp. 21–37).

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).

  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In CVPR (pp. 9729–9738).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).

  • Hubert Tsai, Y. H., Huang, L. K., & Salakhutdinov, R. (2017). Learning robust visual-semantic embeddings. In ICCV (pp. 3571–3580).

  • Iodice, S., & Mikolajczyk, K. (2020). Text attribute aggregation and visual feature decomposition for person search. In BMVC (2020).

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML (pp. 448–456).

  • Jeong, B., Park, J., & Kwak, S. (2021). Asmr: Learning attribute-based person search with adaptive semantic margin regularizer. In ICCV (pp. 12016–12025).

  • Ji, Z., He, E., Wang, H., & Yang, A. (2019). Image-attribute reciprocally guided attention network for pedestrian attribute recognition. Pattern Recognition Letters, 120, 89–95.

    Article  Google Scholar 

  • Ji, Z., Hu, Z., He, E., Han, J., & Pang, Y. (2020). Pedestrian attribute recognition based on multiple time steps attention. Pattern Recognition Letters, 138, 170–176.

    Article  Google Scholar 

  • Ji, Z., Sun, Y., Yu, Y., Pang, Y., & Han, J. (2019). Attribute-guided network for cross-modal zero-shot hashing. IEEE Transactions on Neural Networks and Learning Systems, 31(1), 321–330.

    Article  Google Scholar 

  • Layne, R., Hospedales, T.M., & Gong, S. (2012a). Towards person identification and re-identification with attributes. In ECCV (pp. 402–412).

  • Layne, R., Hospedales, T. M., Gong, S., & Mary, Q. (2012b). Person re-identification by attributes. In BMVC (p. 8).

  • Lei, Y., Ledent, A., & Kloft, M. (2020). Sharper generalization bounds for pairwise learning. NeurIPS 33.

  • Li, D., Chen, X., & Huang, K. (2015a). Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR (pp. 111–115).

  • Li, D., Chen, X., & Huang, K. (2015b). Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR (pp. 111–115). IEEE.

  • Li, S., Xiao, T., Li, H., Yang, W., & Wang, X. (2017). Identity-aware textual-visual matching with latent co-attention. In ICCV (pp. 1890–1899).

  • Li, W., Zhu, X., & Gong, S. (2020). Scalable person re-identification by harmonious attention. International Journal of Computer Vision, 128(6), 1635–1653.

    Article  Google Scholar 

  • Li, Z., Min, W., Song, J., Zhu, Y., Kang, L., Wei, X., Wei, X., & Jiang, S. (2022). Rethinking the optimization of average precision: Only penalizing negative instances before positive ones is enough. In AAAI (Vol. 36, pp. 1518–1526).

  • Lin, X., Ren, P., Xiao, Y., Chang, X., & Hauptmann, A. (2021). Person search challenges and solutions: A survey.

  • Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Hu, Z., Yan, C., & Yang, Y. (2019). Improving person re-identification by attribute and identity learning. Pattern Recognition, 95, 151–161.

    Article  Google Scholar 

  • Liu, L., Zhang, H., Xu, X., Zhang, Z., & Yan, S. (2019). Collocating clothes with generative adversarial networks cosupervised by categories and attributes: A multidiscriminator framework. IEEE Transactions on Neural Networks and Learning Systems, 31(9), 3540–3554.

    Article  MathSciNet  Google Scholar 

  • Liu, P., Liu, X., Yan, J., & Shao, J. (2018). Localization guided learning for pedestrian attribute recognition. In BMVC.

  • Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yi, S., Yan, J., & Wang, X. (2017). Hydraplus-net: Attentive deep features for pedestrian analysis. In ICCV (pp. 350–359).

  • Luo, H., Jiang, W., Gu, Y., Liu, F., Liao, X., Lai, S., & Gu, J. (2019). A strong baseline and batch normalization neck for deep person re-identification. IEEE Transactions on Multimedia.

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS (pp. 8026–8037).

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In CVPR (pp. 815–823).

  • Schumann, A., & Stiefelhagen, R. (2017). Person re-identification by deep learning attribute-complementary information. In CVPR Workshop (pp. 20–28).

  • Su, C., Zhang, S., Xing, J., Gao, W., & Tian, Q. (2016). Deep attributes driven multi-camera person re-identification. In ECCV (pp. 475–491).

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR (pp. 1–9).

  • Tan, Z., Yang, Y., Wan, J., Guo, G., & Li, S. Z. (2020). Relation-aware pedestrian attribute recognition with graph convolutional networks. In AAAI (pp. 12055–12062).

  • Tan, Z., Yang, Y., Wan, J., Hang, H., Guo, G., & Li, S. Z. (2019). Attention-based pedestrian attribute analysis. Transactions on Image Processing, 28(12), 6126–6140.

    Article  MathSciNet  MATH  Google Scholar 

  • Vaquero, D. A., Feris, R. S., Tran, D., Brown, L., Hampapur, A., & Turk, M. (2009). Attribute-based people search in surveillance environments. In Workshop on applications of computer vision (pp. 1–8).

  • Wang, B., Yang, Y., Xu, X., Hanjalic, A., & Shen, H. (2017). Adversarial cross-modal retrieval. In ACM MM (pp. 154–162).

  • Wang, J., Zhu, X., Gong, S., & Li, W. (2018). Transferable joint attribute-identity deep learning for unsupervised person re-identification. In CVPR (pp. 2275–2284).

  • Wang, W., Arora, R., Livescu, K., & Bilmes, J. (2015). On deep multi-view representation learning. In ICML (pp. 1083–1092).

  • Wang, X., Han, X., Huang, W., Dong, D., & Scott, M. R. (2019). Multi-similarity loss with general pair weighting for deep metric learning. In CVPR (pp. 5022–5030).

  • Wu, M., Huang, D., Guo, Y., & Wang, Y. (2019). Distraction-aware feature learning for human attribute recognition via coarse-to-fine attention mechanism. In AAAI.

  • Xu, B., Wang, N., Chen, T., & Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853.

  • Yang, Y., Tan, Z., Tiwari, P., Pandey, H. M., Wan, J., Lei, Z., Guo, G., & Li, S. Z. (2021). Cascaded split-and-aggregate learning with feature recombination for pedestrian attribute recognition. International Journal of Computer Vision (pp. 1–14).

  • Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., & Hoi, S.C. (2021). Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence (pp. 1–1).

  • Yin, J., Wu, A., & Zheng, W. S. (2020). Fine-grained person re-identification. International Journal of Computer Vision, 128(6), 1654–1672.

    Article  Google Scholar 

  • Yin, Z., Zheng, W. S., Wu, A., Yu, H. X., Wan, H., Guo, X., Huang, F., & Lai, J. (2018). Adversarial attribute-image person re-identification. In IJCAI (pp. 1100–1106).

  • Yu, K., Leng, B., Zhang, Z., Li, D., & Huang, K. (2017). Weakly-supervised learning of mid-level features for pedestrian attribute recognition and localization. In ECCV.

  • Zeng, H., Ai, H., Zhuang, Z., & Chen, L. (2020). Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In ICME (pp. 1–6).

  • Zhan, Y., Yu, J., Yu, T., & Tao, D. (2019). On exploring undetermined relationships for visual relationship detection. In CVPR (pp. 5128–5137).

  • Zhan, Y., Yu, J., Yu, T., & Tao, D. (2020). Multi-task compositional network for visual relationship detection. International Journal of Computer Vision, 128(8), 2146–2165.

    Article  Google Scholar 

  • Zhan, Y., Yu, J., Yu, Z., Zhang, R., Tao, D., & Tian, Q. (2018). Comprehensive distance-preserving autoencoders for cross-modal retrieval. In ACM international conference on multimedia (pp. 1137–1145).

  • Zhang, J., Chen, Z., & Tao, D. (2021). Towards high performance human keypoint detection. International Journal of Computer Vision, 129(9), 2639–2662.

    Article  Google Scholar 

  • Zhang, S., Song, Z., Cao, X., Zhang, H., & Zhou, J. (2019). Task-aware attention model for clothing attribute prediction. IEEE Transactions on Circuits and Systems for Video, 30(4), 1051–1064.

    Article  Google Scholar 

  • Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV (pp. 1116–1124).

  • Zhu, J., Liao, S., Lei, Z., & Li, S. Z. (2017). Multi-label convolutional neural network based pedestrian attribute classification. Image and Vision Computing, 58, 224–229.

    Article  Google Scholar 

  • Zhu, J., Liao, S., Yi, D., Lei, Z., & Li, S.Z. (2015). Multi-label cnn based pedestrian attribute learning for soft biometrics. In ICB (pp. 535–540).

  • Zhu, J., Zeng, H., Huang, J., Zhu, X., Lei, Z., Cai, C., & Zheng, L. (2019). Body symmetry and part-locality-guided direct nonparametric deep feature enhancement for person reidentification. IEEE Internet of Things Journal, 7(3), 2053–2065.

    Article  Google Scholar 

  • Zhu, J., Zeng, H., Liao, S., Lei, Z., Cai, C., & Zheng, L. (2017). Deep hybrid similarity learning for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology, 28(11), 3183–3193.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jianqing Zhu or Liu Liu.

Additional information

Communicated by Suha Kwak, Ph.D.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Key R &D Program of China under the Grant of 2021YFE0205400, in part by the National Natural Science Foundation of China under the Grants 61976098 and 62002090, in part by the Natural Science Foundation for Outstanding Young Scholars of Fujian Province under the Grant 2022J06023.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, J., Liu, L., Zhan, Y. et al. Attribute-Image Person Re-identification via Modal-Consistent Metric Learning. Int J Comput Vis 131, 2959–2976 (2023). https://doi.org/10.1007/s11263-023-01841-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01841-7

Keywords

Navigation