Improving Pre-trained Masked Autoencoder via Locality Enhancement for Person Re-identification

Lu, Yanzuo; Zhang, Manlin; Lin, Yiqi; Ma, Andy J.; Xie, Xiaohua; Lai, Jianhuang

doi:10.1007/978-3-031-18910-4_41

Yanzuo Lu¹⁵,
Manlin Zhang¹⁵,
Yiqi Lin¹⁵,
Andy J. Ma^15,16,17,
Xiaohua Xie^15,16,17 &
…
Jianhuang Lai^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13535))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1812 Accesses
1 Citations

Abstract

Person Re-identification (ReID) is a computer vision task of retrieving a person of interest across multiple non-overlapping cameras. Most of the existing methods are developed based on convolutional neural networks trained with supervision, which may suffer from the problem of missing subtle visual cues and global information caused by pooling operations and the weight-sharing mechanism. To this end, we propose a novel Transformer-based ReID method by pre-training with Masked Autoencoders and fine-tuning with locality enhancement. In our method, Masked Autoencoders are pre-trained in a self-supervised way by using large-scale unlabeled data such that subtle visual cues can be learned with the pixel-level reconstruction loss. With the Transformer backbone, global features are extracted as the classification token which integrates information from different patches by the self-attention mechanism. To take full advantage of local information, patch tokens are reshaped into a feature map for convolution to extract local features in the proposed locality enhancement module. Both global and local features are combined to obtain more robust representations of person images for inference. To the best of our knowledge, this is the first work to utilize generative self-supervised learning for representation learning in ReID. Experiments show that the proposed method achieves competitive performance in terms of parameter scale, computation overhead, and ReID performance compared with the state of the art. Code is available at https://github.com/YanzuoLu/MALE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep-ReID: deep features and autoencoder assisted image patching strategy for person re-identification in smart cities surveillance

Article 22 January 2021

Learning convolutional multi-level transformers for image-based person re-identification

Article Open access 13 October 2023

Ubiquitous vision of transformers for person re-identification

Article 09 February 2023

References

Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers. In: ICLR (2022)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: ICCV, pp. 9630–9640 (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607 (2020)
Google Scholar
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. In: NeruIPS, vol. 33, pp. 22243–22255 (2020)
Google Scholar
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv:2003.04297 (2020)
Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: ICCV, pp. 9620–9629 (2021)
Google Scholar
Clark, K., Luong, M.T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators. In: ICLR (2020)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186 (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Fu, D., et al.: Unsupervised pre-training for person re-identification. In: CVPR, pp. 14750–14759 (2021)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent a new approach to self-supervised learning. In: NeruIPS, vol. 33, pp. 21271–21284 (2020)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9726–9735 (2020)
Google Scholar
He, L., Liao, X., Liu, W., Liu, X., Cheng, P., Mei, T.: FastReID: a pytorch toolbox for general instance re-identification. arXiv:2006.02631 (2020)
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: transformer-based object re-identification. In: ICCV, pp. 15013–15022 (2021)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: CVPRW (2019)
Google Scholar
Luo, H., et al.: Self-supervised pre-training for transformer-based person re-identification. arXiv:2111.12084 (2021)
Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML, pp. 8821–8831 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeruIPS, vol. 30 (2017)
Google Scholar
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: ACM MM, pp. 274–282 (2018)
Google Scholar
Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer GAN to bridge domain gap for person re-identification. In: CVPR, pp. 79–88 (2018)
Google Scholar
Xie, Z., et al.: Self-supervised learning with Swin transformers. arXiv:2105.04553 (2021)
Xie, Z., et al.: SimMIM: a simple framework for masked image modeling. In: CVPR (2021)
Google Scholar
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. TPAMI 44(6), 2872–2893 (2021)
Article Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV, pp. 1116–1124 (2015)
Google Scholar
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pp. 6877–6886 (2021)
Google Scholar

Download references

Acknowledgments

This work was supported partially by NSFC (No. 61906218), Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515011497), and Science and Technology Program of Guangzhou (No. 202002030371).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Yanzuo Lu, Manlin Zhang, Yiqi Lin, Andy J. Ma, Xiaohua Xie & Jianhuang Lai
Guangdong Province Key Laboratory of Information Security Technology, Guangzhou, China
Andy J. Ma, Xiaohua Xie & Jianhuang Lai
Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China
Andy J. Ma, Xiaohua Xie & Jianhuang Lai

Authors

Yanzuo Lu
View author publications
You can also search for this author in PubMed Google Scholar
Manlin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yiqi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Andy J. Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jianhuang Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andy J. Ma .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Shiqi Yu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
Hong Kong Baptist University, Hong Kong, China
Pong C. Yuen
Northwestern Polytechnical University, Xi'an, China
Junwei Han
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hong Kong Baptist University, Hong Kong, China
Yike Guo
Sun Yat-sen University, Guangzhou, China
Jianhuang Lai
Southern University of Science and Technology, Shenzhen, China
Jianguo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, Y., Zhang, M., Lin, Y., Ma, A.J., Xie, X., Lai, J. (2022). Improving Pre-trained Masked Autoencoder via Locality Enhancement for Person Re-identification. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13535. Springer, Cham. https://doi.org/10.1007/978-3-031-18910-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-18910-4_41
Published: 27 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18909-8
Online ISBN: 978-3-031-18910-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Pre-trained Masked Autoencoder via Locality Enhancement for Person Re-identification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep-ReID: deep features and autoencoder assisted image patching strategy for person re-identification in smart cities surveillance

Learning convolutional multi-level transformers for image-based person re-identification

Ubiquitous vision of transformers for person re-identification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Improving Pre-trained Masked Autoencoder via Locality Enhancement for Person Re-identification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Deep-ReID: deep features and autoencoder assisted image patching strategy for person re-identification in smart cities surveillance

Learning convolutional multi-level transformers for image-based person re-identification

Ubiquitous vision of transformers for person re-identification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation