Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition

Kubanek, Mariusz; Bobulski, Janusz; Adrjanowicz, Lukasz

doi:10.1007/978-3-642-29347-4_62

Mariusz Kubanek²³,
Janusz Bobulski²³ &
Lukasz Adrjanowicz²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7267))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2320 Accesses

Abstract

This paper proposes a method of tracking the lips in the system of audio-visual speech recognition. Presented methods consists of a face detector, face tracker, lip detector, lip tracker, and word classifier. In speech recognition systems, the audio signal is exposed to a large amount of acoustic noise, therefor scientists are looking for ways to reduce audio interference on recognition results. Visual speech is one of the sources that is not perturbed by the acoustic environment and noise. To analyze the video speech one has to develop a method of lip tracking. This work presents a method for automatic detection of the outer edges of the lips, which was used to identify individual words in audio-visual speech recognition. Additionally the paper also shows how to use video speech to divide the audio signal into phonemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Computer Vision Based Real Time Lip Tracking for Person Authentication

Enhancing Visual Speech Recognition with Lip Protrusion Estimation

An adaptive approach for lip-reading using image and depth data

Article 09 July 2015

References

Shin, J., Lee, J., Kim, D.: Real-time lip reading system for isolated Korean word recognition. Pattern Recognition 44, 559–571 (2011)
Article MATH Google Scholar
Neti, C., Potamianos, G., Luttin, J., Mattews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio Visual Speech-Recognition. 2000 Final Report (2000)
Google Scholar
Zhi, Q., Kaynak, M.N.N., Sengupta, K., Cheok, A.D., Ko, C.C.: A study of the modeling aspects in bimodal speech recognition. In: Proc. 2001 IEEE International Conference on Multimedia and Expo, ICME 2001 (2001)
Google Scholar
Jian, Z., Kaynak, M.N.N., Cheok, A.D., Chung, K.C.: Real-time Lip-tracking For Virtual Lip Implementation in Virtual Environments and Computer Games. In: Proc. 2001 International Fuzzy Systems Conference (2001)
Google Scholar
Petajan, E.: Automatic lipreading to enhance speech recognition. In: Proceedings of Global Telecommunications Conference, Atlanta, GA, pp. 265–272 (1984)
Google Scholar
Bailly, G., Vatikiotis-Basteson, E., Pierrier, P.: Issues in Visual Speech Processing. MIT Press (2004)
Google Scholar
Park, S., Lee, J., Kim, W.: Face Recognition Using Haar-like feature/LDA. In: Workshop on Image Processing and Image Understanding, IPIU 2004 (January 2004)
Google Scholar
Hong, K., Min, J.-H., Lee, W., Kim, J.: Real Time Face Detection and Recognition System Using Haar-Like Feature/HMM in Ubiquitous Network Environments. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3480, pp. 1154–1161. Springer, Heidelberg (2005)
Chapter Google Scholar
Kukharev, G., Kuzminski, A.: Biometric Technology, Part. 1: Methods for Face Recognition. Szczecin University of Technology, Faculty of Computer Science (2003) (in Polish)
Google Scholar
Choraś, M.: Human Lips as Emerging Biometrics Modality. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2008. LNCS, vol. 5112, pp. 993–1002. Springer, Heidelberg (2008)
Chapter Google Scholar
Kaynak, M.N.N., Zhi, Q., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio - Visual Modeling for Bimodal Speech Recognition. In: Proc. 2001 International Fuzzy Systems Conference (2001)
Google Scholar
Liu, X., Zhao, Y., Pi, X., Liang, L., Nefian, A.V.: Audio-visual continuous speechr ecognition using a coupled hidden Markov model. In: ICSLP 2002, pp. 213–216 (2002)
Google Scholar
Hasegawa-Johnson, M., Livescu, K., Lal, P., Saenko, K.: Audiovisual speech recognition with articulator positions as hidden variables. In: Proc. International Congress of Phonetic Sciences (ICPhS) (2007)
Google Scholar
Shao, X., Barker, J.: Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment. Speech Communication 50, 337–353 (2008)
Article Google Scholar
Nefian, A.V., Liang, L., Pi, X., Xiaoxiang, L., Mao, C., Murphy, K.: A coupled HMM for audio-visual speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
Google Scholar
Nefian, A.V., Liang, L., Pi, X., Liu, X., Mao, C.: An coupled hidden Markov model for audio-visual speech recognition. In: International Conference on Acoustics, Speech and Signal Processing (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego Street 73, 42-200, Czestochowa, Poland
Mariusz Kubanek, Janusz Bobulski & Lukasz Adrjanowicz

Authors

Mariusz Kubanek
View author publications
You can also search for this author in PubMed Google Scholar
Janusz Bobulski
View author publications
You can also search for this author in PubMed Google Scholar
Lukasz Adrjanowicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski , Marcin Korytkowski & Rafał Scherer , &
AGH University of Science and Technology, Mickiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California Berkeley, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Computational Intelligence Laboratory, Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kubanek, M., Bobulski, J., Adrjanowicz, L. (2012). Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7267. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29347-4_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-29347-4_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29346-7
Online ISBN: 978-3-642-29347-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics