Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation

Abel, Andrew; Hussain, Amir; Nguyen, Quoc-Dinh; Ringeval, Fabien; Chetouani, Mohamed; Milgram, Maurice

doi:10.1007/978-3-642-04391-8_9

Andrew Abel²⁰,
Amir Hussain²⁰,
Quoc-Dinh Nguyen²¹,
Fabien Ringeval²¹,
Mohamed Chetouani²¹ &
…
Maurice Milgram²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Included in the following conference series:

European Workshop on Biometrics and Identity Management

1128 Accesses
9 Citations

Abstract

In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM) approach developed by the authors is used for automatic lip tracking, and an adapted version of our vowel based speech segmentation system is employed to automatically segment speech. Canonical Correlation Analysis (CCA) on segmented and non segmented data in a range of noisy speech environments finds that segmented speech has a significantly better audiovisual correlation, demonstrating the feasibility of our techniques for further development as part of a proposed audiovisual speech enhancement system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Emotional Speech Recognition Based on Lip-Reading

Audio-visual speech recognition integrating 3D lip information obtained from the Kinect

Article 06 December 2015

References

Almajai, I., Milner, B.: Maximising Audio-Visual Speech Correlation. In: AVSP 2007 (2007)
Google Scholar
Almajai, I., Milner, B., Darch, J., Vaseghi, S.: Visually-Derived Wiener Filters for Speech Enhancement. In: ICASSP 2007, vol. 4, pp. 585–588 (2007)
Google Scholar
Sargin, M.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis. IEEE Trans. on Mult. 9(7), 1396–1403 (2007)
Article Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)
Article MATH Google Scholar
Girin, L., Feng, G., Schwartz, J.L.: Fusion of Auditory and Visual Information For Noisy Speech Enhancement: A Preliminary Study of Vowel Transition. In: ICASSP 1998, vol. 2, pp. 1005–1008 (1998)
Google Scholar
Ringeval, F., Chetouani, M.: A Vowel Based Approach For Acted Emotion Recognition. In: Proc. Interspeech 2008, pp. 2763–2766 (2008)
Google Scholar
Sanderson, C.: Biometric Person Recognition: Face, Speech and Fusion. VDM-Verlag (2008)
Google Scholar
Pellegrino, F., André-Obrecht, R.: Automatic Language Identification: An Alternative Approach to Phonetic Modelling. Sig. Proc. 80(7), 1231–1244 (2000)
Article MATH Google Scholar
Nguyen, Q.D., Milgram, M.: Semi Adaptive Appearance Models For Lip Tracking. Submitted to ICIP 2009 (2009)
Google Scholar
Levy, A., Lindenbaum, M.: Sequential Karhumen-Loeve basis extraction and its application to images. Image Proc., IEEE Trans. 9(8), 1371–1374 (2000)
Article MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins Uni. Press (1996)
Google Scholar
Cauwenberghs, G., Poggio, T.: Incremental and Decremental Support Vector Machine Learning. In: NIPS, pp. 409–415 (2000)
Google Scholar
Cifani, S., Abel, A., Hussain, A., Squartini, S., Piazza, F.: An Investigation Into Audiovisual Speech Correlatio. In: Reverberant Noisy Environments (LNCS): Cross-Modal Analysis of Speech, Gest, Gaze and Facial Expr. (2008) (in press)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computing Science, University of Stirling, Scotland, UK
Andrew Abel & Amir Hussain
Institute of Intelligent Systems and Robotics, University Pierre and Marie Curie (Paris 6), 4 Place Jussieu, Paris, France
Quoc-Dinh Nguyen, Fabien Ringeval, Mohamed Chetouani & Maurice Milgram

Authors

Andrew Abel
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Quoc-Dinh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Ringeval
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Chetouani
View author publications
You can also search for this author in PubMed Google Scholar
Maurice Milgram
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco Tomas y Valiente 11, 28049, Madrid, Spain
Julian Fierrez & Javier Ortega-Garcia &
Second University of Naples, and IIASS, Via Vivaldi 43, 81100, Caserta, Italy
Anna Esposito
EPFL, Speech Processing and Biometrics Group, EPFL-STI-IEL-LIDIAP, ELE 233, Station 11, 1015, Lausanne, Switzerland
Andrzej Drygajlo
Escola Universitària Politècnica de Mataró, Avda. Puig i Cadafalch 101-111, 08303, Mataro (Barcelona), Spain
Marcos Faundez-Zanuy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abel, A., Hussain, A., Nguyen, QD., Ringeval, F., Chetouani, M., Milgram, M. (2009). Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-04391-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics