Abstract
The machine cocktail party problem has been researched for several decades. Although many blind source separation schemes have been proposed to address this problem, few of them are tested by using a real room audio video recording. In this paper, we propose an audio video based independent vector analysis (AVIVA) method, and test it with other independent vector analysis methods by using a real room recording dataset, i.e. the AV16.3 corpus. Moreover, we also use a new method based on pitch difference detection for objective evaluation of the separation performance of the algorithms when applied on the real dataset which confirms advantages of using the visual modality with IVA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cherry, C.: Some experiments on the recognition of speech, with one and with two years. The Journal of The Acoustical Society of America 25, 975–979 (1953)
Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Springer Handbook on Speech Processing and Speech Communication, pp. 1–34 (2007)
Parra, L.C., Spence, C.: Convolutive blind separation of non-statinary sources. IEEE Transcations on Speech and Audio Processing 8, 320–327 (2000)
Kim, T., Attias, H., Lee, S., Lee, T.: Blind Source Separation exploiting higher-order frequency dependencies. IEEE Transcations on Speech and Audio Processing 15, 70–79 (2007)
Liang, Y., Naqvi, M., Chambers, J.: Adaptive step size indepndent vector analysis for blind source separation. In: 17th International Conference on Digital Signal Processing, Corfu, Greece (2011)
Lee, I., Kim, T., Lee, T.: Fast fixed-point independent vector analysis algorithm for convolutive blind source separation. Signal Processing 87, 1859–1971 (2007)
Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing: learning algorithms and applications. Wiley (2000)
Vincent, E., Fevotte, C., Gribonval, R.: Performance measurement in blind audio source separation. IEEE Transcations on Speech and Audio Processing 14, 1462–1469 (2006)
Naqvi, S.M., Yu, M., Chambers, J.A.: A Multimodal Approach to Blind Source Separation of Moving Sources. IEEE Journal of Selected Topics in Signal Processing 4(5), 895–910 (2010)
Naqvi, S.M., Zhang, Y., Tsalaile, T., Sanei, S., Chambers, J.A.: A multimodal approach for frequency domain independent component analysis with geometrically-based initialization. In: Proc. EUSIPCO 2008, Lausanne, Switzerland (2008)
Shabani, H., Kahaei, M.H.: Missing feature mask generation in BSS outputs using pitch frequency. In: 17th International Conference on Digital Signal Processing, Corfu, Greece (2011)
Camacho, A., Harris, J.G.: A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124(3), 1638–1652 (2008)
Lathoud, G., Odobez, J.-M., Gatica-Perez, D.: AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 182–195. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, Y., Chambers, J. (2012). An Audio-Video Based IVA Algorithm for Source Separation and Evaluation on the AV16.3 Corpus. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2012. Lecture Notes in Computer Science, vol 7191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28551-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-28551-6_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28550-9
Online ISBN: 978-3-642-28551-6
eBook Packages: Computer ScienceComputer Science (R0)