An Audio-Video Based IVA Algorithm for Source Separation and Evaluation on the AV16.3 Corpus

Liang, Yanfeng; Chambers, Jonathon

doi:10.1007/978-3-642-28551-6_41

Yanfeng Liang¹⁶ &
Jonathon Chambers¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7191))

Included in the following conference series:

International Conference on Latent Variable Analysis and Signal Separation

2532 Accesses

Abstract

The machine cocktail party problem has been researched for several decades. Although many blind source separation schemes have been proposed to address this problem, few of them are tested by using a real room audio video recording. In this paper, we propose an audio video based independent vector analysis (AVIVA) method, and test it with other independent vector analysis methods by using a real room recording dataset, i.e. the AV16.3 corpus. Moreover, we also use a new method based on pitch difference detection for objective evaluation of the separation performance of the algorithms when applied on the real dataset which confirms advantages of using the visual modality with IVA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Audiovisual Speech Separation Based on Independent Vector Analysis Using a Visual Voice Activity Detector

Speaker Diarization in Vietnamese Voice

Past review, current progress, and challenges ahead on the cocktail party problem

Article 25 January 2018

References

Cherry, C.: Some experiments on the recognition of speech, with one and with two years. The Journal of The Acoustical Society of America 25, 975–979 (1953)
Article Google Scholar
Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Springer Handbook on Speech Processing and Speech Communication, pp. 1–34 (2007)
Google Scholar
Parra, L.C., Spence, C.: Convolutive blind separation of non-statinary sources. IEEE Transcations on Speech and Audio Processing 8, 320–327 (2000)
Article Google Scholar
Kim, T., Attias, H., Lee, S., Lee, T.: Blind Source Separation exploiting higher-order frequency dependencies. IEEE Transcations on Speech and Audio Processing 15, 70–79 (2007)
Article Google Scholar
Liang, Y., Naqvi, M., Chambers, J.: Adaptive step size indepndent vector analysis for blind source separation. In: 17th International Conference on Digital Signal Processing, Corfu, Greece (2011)
Google Scholar
Lee, I., Kim, T., Lee, T.: Fast fixed-point independent vector analysis algorithm for convolutive blind source separation. Signal Processing 87, 1859–1971 (2007)
Article MATH Google Scholar
Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing: learning algorithms and applications. Wiley (2000)
Google Scholar
Vincent, E., Fevotte, C., Gribonval, R.: Performance measurement in blind audio source separation. IEEE Transcations on Speech and Audio Processing 14, 1462–1469 (2006)
Article Google Scholar
Naqvi, S.M., Yu, M., Chambers, J.A.: A Multimodal Approach to Blind Source Separation of Moving Sources. IEEE Journal of Selected Topics in Signal Processing 4(5), 895–910 (2010)
Article Google Scholar
Naqvi, S.M., Zhang, Y., Tsalaile, T., Sanei, S., Chambers, J.A.: A multimodal approach for frequency domain independent component analysis with geometrically-based initialization. In: Proc. EUSIPCO 2008, Lausanne, Switzerland (2008)
Google Scholar
Shabani, H., Kahaei, M.H.: Missing feature mask generation in BSS outputs using pitch frequency. In: 17th International Conference on Digital Signal Processing, Corfu, Greece (2011)
Google Scholar
Camacho, A., Harris, J.G.: A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124(3), 1638–1652 (2008)
Article Google Scholar
Lathoud, G., Odobez, J.-M., Gatica-Perez, D.: AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 182–195. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic, Electrical and Systems Engineering, Loughborough University, UK
Yanfeng Liang & Jonathon Chambers

Authors

Yanfeng Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jonathon Chambers
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Fabian Theis Andrzej Cichocki Arie Yeredor Michael Zibulevsky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, Y., Chambers, J. (2012). An Audio-Video Based IVA Algorithm for Source Separation and Evaluation on the AV16.3 Corpus. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2012. Lecture Notes in Computer Science, vol 7191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28551-6_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-28551-6_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28550-9
Online ISBN: 978-3-642-28551-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics