Person Identification Based on Multichannel and Multimodality Fusion

Liu, Ming; Tang, Hao; Ning, Huazhong; Huang, Thomas

doi:10.1007/978-3-540-69568-4_21

Ming Liu¹,
Hao Tang¹,
Huazhong Ning¹ &
…
Thomas Huang¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4122))

Included in the following conference series:

International Evaluation Workshop on Classification of Events, Activities and Relationships

1258 Accesses
1 Citations

Abstract

Person ID is a very useful information for high level video analysis and retrieval. In some scenario, the recording is not only multimodality and also multichannel(microphone array, camera array). In this paper, we describe a Multimodal person ID system base on multichannel and multimodal fusion. The audio only system is combining 7 channel microphone recording at decision output individual audio-only system. The modeling technique of audio system is Universal Background Model(UBM) and Maximum a Posterior adaptation framework which is very popular in speaker recognition literature. The visual only system works directly on the appearance space via l ₁ norm and nearest neighbor classifier. The linear fusion is then combining the two modalities to improve the ID performance. The experiments indicate the effectiviness of micropohone array fusion and audio/visual fusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Doddington, G.: Speaker recognition - identifying people by their voices, pp. 1651–1664 (1985)
Google Scholar
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)
Article Google Scholar
Furui, S.: An overview of speaker recognition technology, pp. 31–56 (1996)
Google Scholar
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: A literature survey. ACM Comput. Surv. 35(4), 399–458 (2003)
Article Google Scholar
http://clear-evaluation.org/
Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech ’97, Rhodes, Greece, pp. 963–966 (1997)
Google Scholar
Reynolds, D., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. In: Digital Signal Processing (2000)
Google Scholar
Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Transactions on Multimedia (to appear, 2000)
Google Scholar
Garg, A., Potamianos, G., Neti, C., Huang, T.S.: Frame-dependent multi-stream reliability indicators for audio-visual speech recognition. In: Proc. of international conference on Acoustics, Speech and Signal Processing (ICASSP) (2003)
Google Scholar
Potamianos, G.: Audio-Visual Speech Recognition. In: Encyclopedia of Language and Linguistics (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

IFP Group, University of Illinois at Urbana-Champaign, Urbana, IL 61801,
Ming Liu, Hao Tang, Huazhong Ning & Thomas Huang

Authors

Ming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Huazhong Ning
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen John Garofolo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, M., Tang, H., Ning, H., Huang, T. (2007). Person Identification Based on Multichannel and Multimodality Fusion. In: Stiefelhagen, R., Garofolo, J. (eds) Multimodal Technologies for Perception of Humans. CLEAR 2006. Lecture Notes in Computer Science, vol 4122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69568-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-69568-4_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69567-7
Online ISBN: 978-3-540-69568-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics