Abstract
Visual speech recognition aims at improving speech recognition for human-computer interaction. Motivated by the cognitive ability of humans to lip-read, visual speech recognition systems take into account the movement of visible speech articulators to classify the spoken word. However, while most of the research has been focussed on lip movement, the contribution of other factors has not been much looked into. This paper studies the effect of the movement of the area around the lips on the accuracy of speech classification. Two sets of visual features are derived: one set corresponds to the parameters from an accurate lip contour while the other feature set takes into account the area around the lips also. The features have been classified using data mining algorithms in WEKA. It is observed from results that features incorporating the area around the lips show an improvement in the performance of machines to recognize the spoken word.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active Shape Models: Their Training and Application. Computer Vision and Image Understanding 61(1), 38–59 (1995)
Faruquie, T.A., Majumdar, A., Rajput, N., Subramaniam, L.V.: Large vocabulary audio-visual speech recognition using active shape models. In: International Conference on Pattern Recognition, vol. 3, pp. 106–109 (2000)
Feng, X., Wang, W.: DTCWT-based dynamic texture features for visual speech recognition. In: IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2008), pp. 497–500 (2008)
Gordan, M., Kotropoulos, C., Pitas, I.: A Support Vector Machine-Based Dynamic Network for Visual Speech Recognition Applications. EURASIP Journal on Applied Signal Processing 2002(11), 1248–1259 (2002)
Gupta, D., Singh, P., Laxmi, V., Gaur, M.S.: Comparison of Parametric Visual Features For Speech Recognition. In: Proceedings of the IEEE International Conference on Network Communication and Computer (ICNCC 2011), pp. 432–435 (2011)
Kulkarni, A.D.: Artificial neural networks for image understanding. Van Nostrand Reinhold, New York (1994)
Kumar, K., Chen, T.H., Stern, R.M.: Profile View Lip Reading. In: International Conference on Acoustics, Speech, and Signal Processing, pp. IV: 429–432 (2007)
Lee, J.S., Park, C.H.: Hybrid Simulated Annealing and Its Application to Optimization of Hidden Markov Models for Visual Speech Recognition. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 40(4), 1188–1196 (2010)
Liew, A., Leung, S.H., Lau, W.H.: Lip contour extraction from colour images using a deformable model. Pattern Recognition 35(12), 2949–2962 (2002)
Matthews, I., Cootes, T.F., Bangham, J.A., Cox, S., Harvey, R.: Extraction of Visual Features for Lipreading. IEEE Trans. Pattern Analysis and Machine Intelligence 24(2), 198–213 (2002)
Neely, K.: Effect of visual factors on the intelligibility of speech. Journal of the Acoustical Society of America 28(6), 1275–1277 (1956)
Petajan, E., Bischoff, B., Bodoff, D., Brooke, N.M.: An improved automatic lipreading system to enhance speech recognition. In: Proceedings of the SIGCHI conference on Human factors in computing systems (CHI 1988), pp. 19–25 (1988)
Potamianos, G., Neti, C., Gravier, G., Garg, A., Senior, A.W.: Recent Advances in the Automatic Recognition of Audiovisual Speech. Proceedings of the IEEE 91(9), 1306–1326 (2003)
Saitoh, T., Konishi, R.: Lip Reading using Video and Thermal Images. In: Proceedings of the International Joint Conference (SICE-ICASE 2006), pp. 5011–5015 (2006)
Singh, P., Laxmi, V., Gupta, D., Gaur, M.S.: Lipreading Using Gram Feature Vector. In: Advances in Soft Computing, vol. 85, pp. 81–88. Springer, Heidelberg (2010)
University of Waikato.: Recent Advances in the Automatic Recognition of Audiovisual Speech. Open Source Machine Learning Software WEKA, http://www.cs.waikato.ac.nz/ml/weka/
Yau, W.C., Weghorn, H., Kumar, D.K.: Visual Speech Recognition and Utterance Segmentation Based on Mouth Movement. In: 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications, pp. 7–14 (2007)
Zhang, X., Mersereau, R.M., Clements, M., Broun, C.C.: Visual speech feature extraction for improved speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1993), vol. 2, pp II–II (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, P., Gupta, D., Laxmi, V., Gaur, M.S. (2011). Contribution of Oral Periphery on Visual Speech Intelligibility. In: Abraham, A., Lloret Mauri, J., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 191. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22714-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-22714-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22713-4
Online ISBN: 978-3-642-22714-1
eBook Packages: Computer ScienceComputer Science (R0)