Skip to main content

Advertisement

Log in

Human mouth-state recognition based on learned discriminative dictionary and sparse representation combined with homotopy

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to detect the number of audio sources and improve the speech recognition capability of an intelligent robot auditory system, recognizing human mouth-states, open or closed, is studied in this paper. A discriminative dictionary and sparse representation combined with homotopy based human mouth-state recognition algorithm is proposed. In the algorithm, a label consistent K-SVD (LC-KSVD) algorithm is used to learn a discriminative single over-complete dictionary and an optimal linear classifier simultaneously. Meanwhile, homotopy algorithm is used at the sparse decomposition stage. Experiments are carried out with the database established with the ROI images localized and extracted from the face images downloaded from Google online. Compared with several state-of-the-art methods, the proposed method obtains higher classification rates (CRs), costs less time for recognizing a test sample and has good noise immunity performance. Particularly, superior performance is attained when the training samples are extremely limited, even one sample per class.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  Google Scholar 

  2. Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685

    Article  Google Scholar 

  3. Donoho DL (2006) For most large underdetermined systems of linear equations the minimal ℓ1-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829

    Article  MATH  MathSciNet  Google Scholar 

  4. Donoho D, Tsaig Y (2008) Fast solution of ℓ 1-norm minimization problems when the solution may be sparse. IEEE Trans Inf Theory 54(11):4789–4812

    Article  MATH  MathSciNet  Google Scholar 

  5. Elad M (2010) Sparse and redundant representations from theory to applications in signal and image processing, Springer

  6. Gonzalez RC, Woods RE (2010) Digital Image Processing (Third Edition), Publishing House of Electronics Industry

  7. Jain AK, Duin RPW, Mao JC (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  8. Jiang ZL, Lin Z, Davis LS (2011) Learning a discriminative dictionary for sparse coding via label consistent K-SVD, IEEE Conference on Computer Vision and Pattern Recognition. 1697–1704

  9. Karahanoglu NB, Erdogan H (2012) A* orthogonal matching pursuit: Best-first search for compressed sensing signal recovery. 22(4): 555–568

  10. Liu Q, Wang W, Jackson P (2012) Use of bimodal coherence to resolve permutation problem in convolutive BSS. Signal Process 92(8):1916–1927

    Article  Google Scholar 

  11. Missaoui I, Lachiri Z (2012) Cepstral smoothing of binary masks for convolutive blind separation of speech mixtures. Int J Digit Content Technol Appl 6(17):532–541

    Article  Google Scholar 

  12. Moussallam M, Daudet L, Richard G (2012) Matching pursuits with random sequential subdictionaries. 92(10): 2532–2544

  13. Pham DS, Venkatesh S (2008) Joint learning and dictionary construction for pattern recognition, IEEE Conference on Computer Vision and Pattern Recognition. 1–8

  14. Qin Q, Jiang ZN, Feng K, He W, Chen S (2012) A novel scheme for fault detection of reciprocating compressor valves based on basis pursuit, wave matching and support vector machine. 45(5): 897–908

  15. Rivet B, Girin L, Jutten C (2007) Visual voice activity detection as a help for speech source separation from convolution mixtures. Speech Comm 45(2):667–677

    Article  Google Scholar 

  16. Shu K, Wang DH (2012) A dictionary learning approach for classification: separating the particularity and the commonality, Computer Vision-ECCV 2012, Springer. 186–199

  17. Stiefelhagen R, Meier U, Yang J (1997) Real-Time Lip-Tracking for Lipreading,Eurospeech’97 5th European Conference on Speech Communication and Technology. 2007–2010

  18. Wang CL, Lan L, Zhang YW, Gu MJ (2011) Face recognition based on principle component analysis and support vector machine, IEEE 3rd International Workshop on Intelligent Systems and Applications, 1-4

  19. Wang SL, Liew AWC (2007) ICA-Based lip feature representation for speaker authentication, Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, 763–767

  20. Wright J, Yang AY, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  Google Scholar 

  21. Zhang Q, Li BX (2010) Discriminative K-SVD for dictionary learning in face recognition, IEEE Conference on Computer Vision and Pattern Recognition. 2691–2698

  22. Zhang Y, Qu S, Wu JH (2013) Human mouth-type recognition via learned dictionary and sparse representation. Int J Digit Content Technol Appl 7(4):599–606

    Article  Google Scholar 

Download references

Acknowledgments

Our work was supported by the National Natural Science Foundation of China (61162014, 61210306074), the Natural Science Foundation of Jiangxi Province (20122BAB201029), the Science & Technology Project of Jiangxi Provincial Department of Education (GJJ13008), the Science and Technology Program of Jiangxi Provincial Department of Education (GJJ14135, GJJ14583) and the Graduate Student Innovation Special Funds of Jiangxi Province (YC2012-S016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J., Zhu, J., Liu, Q. et al. Human mouth-state recognition based on learned discriminative dictionary and sparse representation combined with homotopy. Multimed Tools Appl 74, 10697–10711 (2015). https://doi.org/10.1007/s11042-014-2199-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2199-4

Keywords

Navigation