Skip to main content

Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation

  • Conference paper
Biometric ID Management and Multimodal Communication (BioID 2009)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Included in the following conference series:

Abstract

In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM) approach developed by the authors is used for automatic lip tracking, and an adapted version of our vowel based speech segmentation system is employed to automatically segment speech. Canonical Correlation Analysis (CCA) on segmented and non segmented data in a range of noisy speech environments finds that segmented speech has a significantly better audiovisual correlation, demonstrating the feasibility of our techniques for further development as part of a proposed audiovisual speech enhancement system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Almajai, I., Milner, B.: Maximising Audio-Visual Speech Correlation. In: AVSP 2007 (2007)

    Google Scholar 

  2. Almajai, I., Milner, B., Darch, J., Vaseghi, S.: Visually-Derived Wiener Filters for Speech Enhancement. In: ICASSP 2007, vol. 4, pp. 585–588 (2007)

    Google Scholar 

  3. Sargin, M.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis. IEEE Trans. on Mult. 9(7), 1396–1403 (2007)

    Article  Google Scholar 

  4. Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)

    Article  MATH  Google Scholar 

  5. Girin, L., Feng, G., Schwartz, J.L.: Fusion of Auditory and Visual Information For Noisy Speech Enhancement: A Preliminary Study of Vowel Transition. In: ICASSP 1998, vol. 2, pp. 1005–1008 (1998)

    Google Scholar 

  6. Ringeval, F., Chetouani, M.: A Vowel Based Approach For Acted Emotion Recognition. In: Proc. Interspeech 2008, pp. 2763–2766 (2008)

    Google Scholar 

  7. Sanderson, C.: Biometric Person Recognition: Face, Speech and Fusion. VDM-Verlag (2008)

    Google Scholar 

  8. Pellegrino, F., André-Obrecht, R.: Automatic Language Identification: An Alternative Approach to Phonetic Modelling. Sig. Proc. 80(7), 1231–1244 (2000)

    Article  MATH  Google Scholar 

  9. Nguyen, Q.D., Milgram, M.: Semi Adaptive Appearance Models For Lip Tracking. Submitted to ICIP 2009 (2009)

    Google Scholar 

  10. Levy, A., Lindenbaum, M.: Sequential Karhumen-Loeve basis extraction and its application to images. Image Proc., IEEE Trans. 9(8), 1371–1374 (2000)

    Article  MATH  Google Scholar 

  11. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins Uni. Press (1996)

    Google Scholar 

  12. Cauwenberghs, G., Poggio, T.: Incremental and Decremental Support Vector Machine Learning. In: NIPS, pp. 409–415 (2000)

    Google Scholar 

  13. Cifani, S., Abel, A., Hussain, A., Squartini, S., Piazza, F.: An Investigation Into Audiovisual Speech Correlatio. In: Reverberant Noisy Environments (LNCS): Cross-Modal Analysis of Speech, Gest, Gaze and Facial Expr. (2008) (in press)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abel, A., Hussain, A., Nguyen, QD., Ringeval, F., Chetouani, M., Milgram, M. (2009). Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04391-8_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04390-1

  • Online ISBN: 978-3-642-04391-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics