Abstract
In this paper, we propose a visual speech recognition method using symbol or real value assignment. Our method is inspired by Bag of Word (BoW) [1] model which is usually applied to an object matching problem. In the BoW model, a codebook is produced by using K-means clustering, and a feature vector extracted from an image is converted to corresponding symbol. Similarly, we generate codebook by running K-means algorithm on a pool of pHog (Pyramid Histogram of Oriented Gradients) feature vectors extracted from a subset of lip database. Then, the remaining lip images are assigned a particular value after comparing the chi-square distance to each cluster. Based on the type of this value, two methods are suggested so as to assign the value to a lip image frame. The first method is to find the cluster whose element image has the minimum chi square distance to the processing frame, and assign the cluster label to the frame. Second one is to calculate the distances between the frame and all cluster’s centroids, obtain multi-dimensional vector for the frame which directly becomes an assigned value for the frame. Following these methods, each time sequence is converted into symbolized or multi-dimensional real valued sequence. To measure the similarity between two time sequences, we use Dynamic Time Warping for real valued time sequence and Edit distance for symbolized sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: Proc. Ninth Int’l Conf. Computer Vision, pp. 1470–1478 (2003)
Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Trans. Multimedia 2, 141–151 (2000)
Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. In: Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) Issues in Visual and Audio-Visual Speech Processing. MIT Press (2004)
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio-visual speech recognition. In: Final Workshop 2000 Report, vol. 764 (2000)
Zhao, G., Barnard, M., Pietikainen, M.: Lipreading with local spatiotemporal descriptors. IEEE Transactions on Multimedia 11(7), 1254–1265 (2009)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Bai, Y., Guo, L., Jin, L., Huang, Q.: A novel feature extraction method using pyramid histogram of orientation gradients for smile recognition. In: 2009 16th IEEE International Conference on Image Processing (ICIP), vol. 2, pp. 3305–3308. IEEE (2009)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. John Wiley, New York (2001)
Ten Holt, G.A., Reinders, M.J.T., Hendriks, E.A.: Multi-dimensional dynamic time warping for gesture recognition. In: Proc. of the Conference of the Advanced School for Computing and Imaging, ASCI 2007 (2007)
Senin, P.: Dynamic time warping algorithm review, Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA (2008)
Neuhaus, M., Bunke, H.: Edit distance based kernel functions for structural pattern classification. Pattern Recognition 39(10), 1852–1863 (2006)
Bahlmann, C., Haasdonk, B., Burkhardt, H.: On-line handwriting recognition with support vector machines—a kernel approach. In: Proc. 8th Int. Workshop Front. Handwriting Recognition (IWFHR), pp. 49–54 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ju, J., Jung, H., Kim, J. (2013). Speaker Dependent Visual Speech Recognition by Symbol and Real Value Assignment. In: Kim, JH., Matson, E., Myung, H., Xu, P. (eds) Robot Intelligence Technology and Applications 2012. Advances in Intelligent Systems and Computing, vol 208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37374-9_98
Download citation
DOI: https://doi.org/10.1007/978-3-642-37374-9_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37373-2
Online ISBN: 978-3-642-37374-9
eBook Packages: EngineeringEngineering (R0)