Abstract
In order to detect the number of audio sources and improve the speech recognition capability of an intelligent robot auditory system, recognizing human mouth-states, open or closed, is studied in this paper. A discriminative dictionary and sparse representation combined with homotopy based human mouth-state recognition algorithm is proposed. In the algorithm, a label consistent K-SVD (LC-KSVD) algorithm is used to learn a discriminative single over-complete dictionary and an optimal linear classifier simultaneously. Meanwhile, homotopy algorithm is used at the sparse decomposition stage. Experiments are carried out with the database established with the ROI images localized and extracted from the face images downloaded from Google online. Compared with several state-of-the-art methods, the proposed method obtains higher classification rates (CRs), costs less time for recognizing a test sample and has good noise immunity performance. Particularly, superior performance is attained when the training samples are extremely limited, even one sample per class.
Similar content being viewed by others
References
Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Donoho DL (2006) For most large underdetermined systems of linear equations the minimal ℓ1-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829
Donoho D, Tsaig Y (2008) Fast solution of ℓ 1-norm minimization problems when the solution may be sparse. IEEE Trans Inf Theory 54(11):4789–4812
Elad M (2010) Sparse and redundant representations from theory to applications in signal and image processing, Springer
Gonzalez RC, Woods RE (2010) Digital Image Processing (Third Edition), Publishing House of Electronics Industry
Jain AK, Duin RPW, Mao JC (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Jiang ZL, Lin Z, Davis LS (2011) Learning a discriminative dictionary for sparse coding via label consistent K-SVD, IEEE Conference on Computer Vision and Pattern Recognition. 1697–1704
Karahanoglu NB, Erdogan H (2012) A* orthogonal matching pursuit: Best-first search for compressed sensing signal recovery. 22(4): 555–568
Liu Q, Wang W, Jackson P (2012) Use of bimodal coherence to resolve permutation problem in convolutive BSS. Signal Process 92(8):1916–1927
Missaoui I, Lachiri Z (2012) Cepstral smoothing of binary masks for convolutive blind separation of speech mixtures. Int J Digit Content Technol Appl 6(17):532–541
Moussallam M, Daudet L, Richard G (2012) Matching pursuits with random sequential subdictionaries. 92(10): 2532–2544
Pham DS, Venkatesh S (2008) Joint learning and dictionary construction for pattern recognition, IEEE Conference on Computer Vision and Pattern Recognition. 1–8
Qin Q, Jiang ZN, Feng K, He W, Chen S (2012) A novel scheme for fault detection of reciprocating compressor valves based on basis pursuit, wave matching and support vector machine. 45(5): 897–908
Rivet B, Girin L, Jutten C (2007) Visual voice activity detection as a help for speech source separation from convolution mixtures. Speech Comm 45(2):667–677
Shu K, Wang DH (2012) A dictionary learning approach for classification: separating the particularity and the commonality, Computer Vision-ECCV 2012, Springer. 186–199
Stiefelhagen R, Meier U, Yang J (1997) Real-Time Lip-Tracking for Lipreading,Eurospeech’97 5th European Conference on Speech Communication and Technology. 2007–2010
Wang CL, Lan L, Zhang YW, Gu MJ (2011) Face recognition based on principle component analysis and support vector machine, IEEE 3rd International Workshop on Intelligent Systems and Applications, 1-4
Wang SL, Liew AWC (2007) ICA-Based lip feature representation for speaker authentication, Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, 763–767
Wright J, Yang AY, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Zhang Q, Li BX (2010) Discriminative K-SVD for dictionary learning in face recognition, IEEE Conference on Computer Vision and Pattern Recognition. 2691–2698
Zhang Y, Qu S, Wu JH (2013) Human mouth-type recognition via learned dictionary and sparse representation. Int J Digit Content Technol Appl 7(4):599–606
Acknowledgments
Our work was supported by the National Natural Science Foundation of China (61162014, 61210306074), the Natural Science Foundation of Jiangxi Province (20122BAB201029), the Science & Technology Project of Jiangxi Provincial Department of Education (GJJ13008), the Science and Technology Program of Jiangxi Provincial Department of Education (GJJ14135, GJJ14583) and the Graduate Student Innovation Special Funds of Jiangxi Province (YC2012-S016).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, J., Zhu, J., Liu, Q. et al. Human mouth-state recognition based on learned discriminative dictionary and sparse representation combined with homotopy. Multimed Tools Appl 74, 10697–10711 (2015). https://doi.org/10.1007/s11042-014-2199-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2199-4