Abstract
In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM) approach developed by the authors is used for automatic lip tracking, and an adapted version of our vowel based speech segmentation system is employed to automatically segment speech. Canonical Correlation Analysis (CCA) on segmented and non segmented data in a range of noisy speech environments finds that segmented speech has a significantly better audiovisual correlation, demonstrating the feasibility of our techniques for further development as part of a proposed audiovisual speech enhancement system.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Almajai, I., Milner, B.: Maximising Audio-Visual Speech Correlation. In: AVSP 2007 (2007)
Almajai, I., Milner, B., Darch, J., Vaseghi, S.: Visually-Derived Wiener Filters for Speech Enhancement. In: ICASSP 2007, vol. 4, pp. 585–588 (2007)
Sargin, M.E., Yemez, Y., Erzin, E., Tekalp, A.M.: Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis. IEEE Trans. on Mult. 9(7), 1396–1403 (2007)
Hotelling, H.: Relations between two sets of variates. Biometrika 28, 321–377 (1936)
Girin, L., Feng, G., Schwartz, J.L.: Fusion of Auditory and Visual Information For Noisy Speech Enhancement: A Preliminary Study of Vowel Transition. In: ICASSP 1998, vol. 2, pp. 1005–1008 (1998)
Ringeval, F., Chetouani, M.: A Vowel Based Approach For Acted Emotion Recognition. In: Proc. Interspeech 2008, pp. 2763–2766 (2008)
Sanderson, C.: Biometric Person Recognition: Face, Speech and Fusion. VDM-Verlag (2008)
Pellegrino, F., André-Obrecht, R.: Automatic Language Identification: An Alternative Approach to Phonetic Modelling. Sig. Proc. 80(7), 1231–1244 (2000)
Nguyen, Q.D., Milgram, M.: Semi Adaptive Appearance Models For Lip Tracking. Submitted to ICIP 2009 (2009)
Levy, A., Lindenbaum, M.: Sequential Karhumen-Loeve basis extraction and its application to images. Image Proc., IEEE Trans. 9(8), 1371–1374 (2000)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins Uni. Press (1996)
Cauwenberghs, G., Poggio, T.: Incremental and Decremental Support Vector Machine Learning. In: NIPS, pp. 409–415 (2000)
Cifani, S., Abel, A., Hussain, A., Squartini, S., Piazza, F.: An Investigation Into Audiovisual Speech Correlatio. In: Reverberant Noisy Environments (LNCS): Cross-Modal Analysis of Speech, Gest, Gaze and Facial Expr. (2008) (in press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abel, A., Hussain, A., Nguyen, QD., Ringeval, F., Chetouani, M., Milgram, M. (2009). Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-04391-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)