Language Person Search with Pair-Based Weighting Loss

Zhang, Peng; Ouyang, Deqiang; Jiang, Chunlin; Shao, Jie

doi:10.1007/978-3-030-67832-6_19

Language Person Search with Pair-Based Weighting Loss

Peng Zhang^15,16,
Deqiang Ouyang^16,17,
Chunlin Jiang¹⁷ &
…
Jie Shao^16,17

Conference paper
First Online: 21 January 2021

2590 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Abstract

Language person search, which means retrieving specific person images with natural language description, is becoming a research hotspot in the area of person re-identification. Compared with person re-identification which belongs to image retrieval task, language person search poses challenges due to heterogeneous semantic gap between different modal data of image and text. To solve this problem, most existing methods employ softmax-based classification loss in order to embed the visual and textual features into a common latent space. However, pair-based loss, as a successful approach of metric learning, is hardly mentioned in this task. Recently, pair-based weighting loss for deep metric learning has shown great potential in improving the performance of many retrieval-related tasks. In this paper, to better correlate person image with given language description, we introduce pair-based weighting loss which encourages model to assign appropriate weights to different image-text pairs. We have conducted extensive experiments on the dataset CUHK-PEDES and the experimental results validated the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Antol, S., et al.: VQA: visual question answering. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 2425–2433 (2015)
Google Scholar
Chen, D., et al.: Improving deep visual representation for person re-identification by global and local image-language association. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part XVI. LNCS, vol. 11220, pp. 56–73. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_4
Chapter Google Scholar
Chen, T., Xu, C., Luo, J.: Improving text-based person search by spatial matching and adaptive threshold. In: 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, 12–15 March 2018, pp. 1879–1887 (2018)
Google Scholar
Dai, J., Zhang, P., Wang, D., Lu, H., Wang, H.: Video person re-identification by temporal residual learning. IEEE Trans. Image Process. 28(3), 1366–1377 (2019)
Article MathSciNet Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 4690–4699 (2019)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 17–22 June 2006, New York, NY, USA, pp. 1735–1742 (2006)
Google Scholar
He, X., Zhou, Y., Zhou, Z., Bai, S., Bai, X.: Triplet-center loss for multi-view 3D object retrieval. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 1945–1954 (2018)
Google Scholar
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. CoRR abs/1703.07737 (2017)
Google Scholar
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017)
Google Scholar
Ji, Z., Li, S., Pang, Y.: Fusion-attention network for person search with free-form natural language. Pattern Recogn. Lett. 116, 205–211 (2018)
Article Google Scholar
Jing, Y., Si, C., Wang, J., Wang, W., Wang, L., Tan, T.: Cascade attention network for person search: Both image and text-image similarity selection. CoRR abs/1809.08440 (2018)
Google Scholar
Li, S., Xiao, T., Li, H., Yang, W., Wang, X.: Identity-aware textual-visual matching with latent co-attention. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 1908–1917 (2017)
Google Scholar
Li, S., Xiao, T., Li, H., Zhou, B., Yue, D., Wang, X.: Person search with natural language description. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 5187–5196 (2017)
Google Scholar
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2999–3007 (2017)
Google Scholar
Lin, Y., et al.: Improving person re-identification by attribute and identity learning. Pattern Recogn. 95, 151–161 (2019)
Article Google Scholar
Liu, H., Cheng, J., Wang, W., Su, Y.: The general pair-based weighting loss for deep metric learning. CoRR abs/1905.12837 (2019)
Google Scholar
Liu, J., Zha, Z., Hong, R., Wang, M., Zhang, Y.: Deep adversarial graph attention convolution network for text-based person search. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, 21–25 October 2019, pp. 665–673 (2019)
Google Scholar
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 6738–6746 (2017)
Google Scholar
Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19–24 June 2016, pp. 507–516 (2016)
Google Scholar
Liu, Z., Wang, D., Lu, H.: Stepwise metric promotion for unsupervised video person re-identification. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2448–2457 (2017)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 815–823 (2015)
Google Scholar
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems NIPS 2016, 5–10 December 2016, Barcelona, Spain, pp. 1849–1857 (2016)
Google Scholar
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 4004–4012 (2016)
Google Scholar
Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part II. LNCS, vol. 9906, pp. 475–491. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_30
Chapter Google Scholar
Vaquero, D.A., Feris, R.S., Tran, D., Brown, L.M., Hampapur, A., Turk, M.A.: Attribute-based people search in surveillance environments. In: IEEE Workshop on Applications of Computer Vision (WACV 2009), 7–8 December, 2009, Snowbird, UT, USA, pp. 1–8 (2009)
Google Scholar
Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Process. Lett. 25(7), 926–930 (2018)
Article Google Scholar
Wang, J., Zhu, X., Gong, S., Li, W.: Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 2275–2284 (2018)
Google Scholar
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 5022–5030 (2019)
Google Scholar
Wang, Y., Bo, C., Wang, D., Wang, S., Qi, Y., Lu, H.: Language person search with mutually connected classification loss. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, 12–17 May 2019, pp. 2057–2061 (2019)
Google Scholar
Wei, J., Xu, X., Yang, Y., Ji, Y., Wang, Z., Shen, H.T.: Universal weighting metric learning for cross-modal matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 16–20 June 2020, pp. 13005–13014 (2020)
Google Scholar
Zhang, Y., Lu, H.: Deep cross-modal projection learning for image-text matching. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018, Part I. LNCS, vol. 11205, pp. 707–723. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_42
Chapter Google Scholar
Zheng, Z., Zheng, L., Garrett, M., Yang, Y., Xu, M., Shen, Y.: Dual-path convolutional image-text embeddings with instance loss. ACM Trans. Multimed. Comput. Commun. Appl. 16(2), 511–5123 (2020)
Article Google Scholar
Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, 7–12 August 2016, Berlin, Germany, Volume 2: Short Papers (2016)
Google Scholar

Download references

Acknowledgments

This work was supported by Major Scientific and Technological Special Project of Guizhou Province (No. 20183002) and Sichuan Science and Technology Program (No. 2019YFG0535).

Author information

Authors and Affiliations

Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang, 550025, China
Peng Zhang
Center for Future Media, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Peng Zhang, Deqiang Ouyang & Jie Shao
Sichuan Artificial Intelligence Research Institute, Yibin, 644000, China
Deqiang Ouyang, Chunlin Jiang & Jie Shao

Authors

Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Deqiang Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Chunlin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Shao .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, P., Ouyang, D., Jiang, C., Shao, J. (2021). Language Person Search with Pair-Based Weighting Loss. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-67832-6_19
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics