Speaker Dependent Visual Speech Recognition by Symbol and Real Value Assignment

Ju, Jeongwoo; Jung, Heechul; Kim, Junmo

doi:10.1007/978-3-642-37374-9_98

Jeongwoo Ju⁵,
Heechul Jung⁶ &
Junmo Kim⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 208))

207 Accesses

Abstract

In this paper, we propose a visual speech recognition method using symbol or real value assignment. Our method is inspired by Bag of Word (BoW) [1] model which is usually applied to an object matching problem. In the BoW model, a codebook is produced by using K-means clustering, and a feature vector extracted from an image is converted to corresponding symbol. Similarly, we generate codebook by running K-means algorithm on a pool of pHog (Pyramid Histogram of Oriented Gradients) feature vectors extracted from a subset of lip database. Then, the remaining lip images are assigned a particular value after comparing the chi-square distance to each cluster. Based on the type of this value, two methods are suggested so as to assign the value to a lip image frame. The first method is to find the cluster whose element image has the minimum chi square distance to the processing frame, and assign the cluster label to the frame. Second one is to calculate the distances between the frame and all cluster’s centroids, obtain multi-dimensional vector for the frame which directly becomes an assigned value for the frame. Following these methods, each time sequence is converted into symbolized or multi-dimensional real valued sequence. To measure the similarity between two time sequences, we use Dynamic Time Warping for real valued time sequence and Edit distance for symbolized sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Temporal and Spatial Features for Visual Speech Recognition

Designing Advanced Geometric Features for Automatic Russian Visual Speech Recognition

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

References

Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: Proc. Ninth Int’l Conf. Computer Vision, pp. 1470–1478 (2003)
Google Scholar
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)
Article Google Scholar
Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. In: Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) Issues in Visual and Audio-Visual Speech Processing. MIT Press (2004)
Google Scholar
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio-visual speech recognition. In: Final Workshop 2000 Report, vol. 764 (2000)
Google Scholar
Zhao, G., Barnard, M., Pietikainen, M.: Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7), 1254–1265 (2009)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Bai, Y., Guo, L., Jin, L., Huang, Q.: A novel feature extraction method using pyramid histogram of orientation gradients for smile recognition. In: 2009 16th IEEE International Conference on Image Processing (ICIP), vol. 2, pp. 3305–3308. IEEE (2009)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley, New York (2001)
MATH Google Scholar
Ten Holt, G.A., Reinders, M.J.T., Hendriks, E.A.: Multi-dimensional dynamic time warping for gesture recognition. In: Proc. of the Conference of the Advanced School for Computing and Imaging, ASCI 2007 (2007)
Google Scholar
Senin, P.: Dynamic time warping algorithm review, Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA (2008)
Google Scholar
Neuhaus, M., Bunke, H.: Edit distance based kernel functions for structural pattern classification. Pattern Recognition 39(10), 1852–1863 (2006)
Article MATH Google Scholar
Bahlmann, C., Haasdonk, B., Burkhardt, H.: On-line handwriting recognition with support vector machines—a kernel approach. In: Proc. 8th Int. Workshop Front. Handwriting Recognition (IWFHR), pp. 49–54 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Future Vehicle, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Korea
Jeongwoo Ju
Dept. of Electrical Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Korea
Heechul Jung & Junmo Kim

Authors

Jeongwoo Ju
View author publications
You can also search for this author in PubMed Google Scholar
Heechul Jung
View author publications
You can also search for this author in PubMed Google Scholar
Junmo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeongwoo Ju .

Editor information

Editors and Affiliations

, Electrical Engineering and Computer Sci., KAIST, 291 Daehak-ro, Daejeon, 305-701, Korea, Republic of (South Korea)
Jong-Hwan Kim
Dept. Computer &, Information Technology (ICT), Purdue University, N. Grant St. 401, West Lafayette, 47907-1421, Indiana, USA
Eric T. Matson
, Dept. of Civil and Environmental Engg., KAIST, Daehak-ro 291, Daejeon, 305-701, Korea, Republic of (South Korea)
Hyun Myung
, Faculty of Engineering, The University of Auckland, Private Bag, Auckland, 1142, New Zealand
Peter Xu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ju, J., Jung, H., Kim, J. (2013). Speaker Dependent Visual Speech Recognition by Symbol and Real Value Assignment. In: Kim, JH., Matson, E., Myung, H., Xu, P. (eds) Robot Intelligence Technology and Applications 2012. Advances in Intelligent Systems and Computing, vol 208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37374-9_98

Download citation

DOI: https://doi.org/10.1007/978-3-642-37374-9_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37373-2
Online ISBN: 978-3-642-37374-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics