abstract

A Cross-modality and Progressive Person Search System

Authors:

Xiaodong Chen,

Wu Liu,

Xinchen Liu,

Yongdong Zhang,

Tao MeiAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 4550 - 4552

https://doi.org/10.1145/3394171.3414455

Published: 12 October 2020 Publication History

Get Access

Abstract

This demonstration presents an instant and progressive cross-modality person search system, called 'CMPS'. Through the system, users can instantly find the lost children or elderly persons by simply describing their appearance through speech. Unlike most existing person search applications which have to cost much time to find the probe images, CMPS will save more valuable time in the early stage of losing. The proposed CMPS is one of the first attempts towards instant and progressive person search leveraging the audio, text, and visual modalities together. In detail, the system first takes the speech that describes the appearance of a person as the input to obtain a textual description by speech-to-text conversion. Then the cross-modal search is performed by matching the textual embedding with the visual representations of images in the learned latent space. The searched images can be used as candidates for query expansion. If the candidates are not right, the user can quickly adjust their description through speech. Once a right image is found, the user can directly click it as a new query. Finally the system will give the complete track of the lost person by once-click. On the built CUHK-PEDES-AUDIOS dataset, the system can achieve 82.46% rank-1 accuracy in real-time speed. Our code of CMPS is available at https://github.com/SheldongChen/Search-People-With-Audio.

Supplementary Material

MP4 File (3394171.3414455.mp4)

Person search or re-identification (Re-ID) is an important and challenging task in the multimedia and computer vision communities. With wide real-world applications such as intelligent video surveillance, smart retailing, etc., this task aims at searching for the same person captured by multiple non-overlapping cameras. However, existing person search or Re-ID methods usually use images of a specific person as the probe, which has limitations in real-world urgent scenarios. In this paper, we develop a simple, convenient, and real-time person search system. This system has several featured properties: 1) It provides a convenient input and interaction mode, which takes the audio of speech as the input to search for a target person captured by cameras. 2) This system performs person search in a progressive manner to guarantee both the accuracy and speed, users can interactively input new queries and query expansion, which makes it able to find more accurate results with less time consumption.

Download
71.18 MB

References

[1]

Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, and et al. 2016. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin. In ICML, Vol. 48. 173--182.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Person resolution in person search results: WebHawk

Person search over security video surveillance systems using deep learning methods: A review

Making person search enjoy the merits of person re-identification

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations