research-article

Cross-modal Co-occurrence Attributes Alignments for Person Search by Language

Authors:

Kai Niu,

Yanning ZhangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 4426 - 4434

https://doi.org/10.1145/3503161.3547753

Published: 10 October 2022 Publication History

Get Access

Abstract

Person search by language refers to retrieving the interested pedestrian images based on a free-form natural language description, which has important applications in smart video surveillance. Although great efforts have been made to align images with sentences, the challenge of reporting bias, i.e., attributes are only partially matched across modalities, still incurs large noise and influences the accurate retrieval seriously. To address this challenge, we propose a novel cross-modal matching method named Cross-modal Co-occurrence Attributes Alignments (C2A2), which can better deal with noise and obtain significant improvements in retrieval performance for person search by language. First, we construct visual and textual attribute dictionaries relying on matrix decomposition, and carry out cross-modal alignments using denoising reconstruction features to address the noise from pedestrian-unrelated elements. Second, we re-gather pixels of image and words of sentence under the guidance of learned attribute dictionaries, to adaptively constitute more discriminative co-occurrence attributes in both modalities. And the re-gathered co-occurrence attributes are carefully captured by imposing explicit cross-modal one-to-one alignments which consider relations across modalities, better alleviating the noise from non-correspondence attributes. The whole C_2A_2 method can be trained end-to-end without any pre-processing, i.e., requiring negligible additional computation overheads. It significantly outperforms the existing solutions, and finally achieves the new state-of-the-art retrieval performance on two large-scale benchmarks, CUHK-PEDES and RSTPReid datasets.

Supplementary Material

MP4 File (MM22-fp0046.mp4)

Person search by language refers to retrieving the interested pedestrian images based on a free-form natural language sentence. To address the challenge of reporting bias, we propose a novel method named Cross-modal Co-occurrence Attributes Alignments. First, we construct visual and textual attribute dictionaries relying on matrix decomposition, and carry out cross-modal alignments using denoising reconstruction features to address the noise from pedestrian-unrelated elements. Second, we re-gather pixels of image and words of sentence under the guidance of learned attribute dictionaries, to adaptively constitute more discriminative co-occurrence attributes in both modalities, which can better alleviate the noise from non-correspondence attributes by considering relations across modalities. The proposed method significantly outperforms the existing solutions, and finally achieves the new state-of-the-art retrieval performance on two large-scale benchmarks, CUHK-PEDES and RSTPReid datasets.

Download
23.78 MB

References

[1]

Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Jing Shao, Zejian Yuan, and Xiaogang Wang. 2018a. Improving Deep Visual Representation for Person Re-identification by Global and Local Image-Language Association. In European Conference on Computer Vision.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images

Collaborative Subspace Graph Hashing for Cross-modal Retrieval

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations