Skip to main content

An Audio-Video Based IVA Algorithm for Source Separation and Evaluation on the AV16.3 Corpus

  • Conference paper
Latent Variable Analysis and Signal Separation (LVA/ICA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7191))

  • 2460 Accesses

Abstract

The machine cocktail party problem has been researched for several decades. Although many blind source separation schemes have been proposed to address this problem, few of them are tested by using a real room audio video recording. In this paper, we propose an audio video based independent vector analysis (AVIVA) method, and test it with other independent vector analysis methods by using a real room recording dataset, i.e. the AV16.3 corpus. Moreover, we also use a new method based on pitch difference detection for objective evaluation of the separation performance of the algorithms when applied on the real dataset which confirms advantages of using the visual modality with IVA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cherry, C.: Some experiments on the recognition of speech, with one and with two years. The Journal of The Acoustical Society of America 25, 975–979 (1953)

    Article  Google Scholar 

  2. Pedersen, M.S., Larsen, J., Kjems, U., Parra, L.C.: A survey of convolutive blind source separation methods. In: Springer Handbook on Speech Processing and Speech Communication, pp. 1–34 (2007)

    Google Scholar 

  3. Parra, L.C., Spence, C.: Convolutive blind separation of non-statinary sources. IEEE Transcations on Speech and Audio Processing 8, 320–327 (2000)

    Article  Google Scholar 

  4. Kim, T., Attias, H., Lee, S., Lee, T.: Blind Source Separation exploiting higher-order frequency dependencies. IEEE Transcations on Speech and Audio Processing 15, 70–79 (2007)

    Article  Google Scholar 

  5. Liang, Y., Naqvi, M., Chambers, J.: Adaptive step size indepndent vector analysis for blind source separation. In: 17th International Conference on Digital Signal Processing, Corfu, Greece (2011)

    Google Scholar 

  6. Lee, I., Kim, T., Lee, T.: Fast fixed-point independent vector analysis algorithm for convolutive blind source separation. Signal Processing 87, 1859–1971 (2007)

    Article  MATH  Google Scholar 

  7. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing: learning algorithms and applications. Wiley (2000)

    Google Scholar 

  8. Vincent, E., Fevotte, C., Gribonval, R.: Performance measurement in blind audio source separation. IEEE Transcations on Speech and Audio Processing 14, 1462–1469 (2006)

    Article  Google Scholar 

  9. Naqvi, S.M., Yu, M., Chambers, J.A.: A Multimodal Approach to Blind Source Separation of Moving Sources. IEEE Journal of Selected Topics in Signal Processing 4(5), 895–910 (2010)

    Article  Google Scholar 

  10. Naqvi, S.M., Zhang, Y., Tsalaile, T., Sanei, S., Chambers, J.A.: A multimodal approach for frequency domain independent component analysis with geometrically-based initialization. In: Proc. EUSIPCO 2008, Lausanne, Switzerland (2008)

    Google Scholar 

  11. Shabani, H., Kahaei, M.H.: Missing feature mask generation in BSS outputs using pitch frequency. In: 17th International Conference on Digital Signal Processing, Corfu, Greece (2011)

    Google Scholar 

  12. Camacho, A., Harris, J.G.: A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124(3), 1638–1652 (2008)

    Article  Google Scholar 

  13. Lathoud, G., Odobez, J.-M., Gatica-Perez, D.: AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 182–195. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Fabian Theis Andrzej Cichocki Arie Yeredor Michael Zibulevsky

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liang, Y., Chambers, J. (2012). An Audio-Video Based IVA Algorithm for Source Separation and Evaluation on the AV16.3 Corpus. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2012. Lecture Notes in Computer Science, vol 7191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28551-6_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28551-6_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28550-9

  • Online ISBN: 978-3-642-28551-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics