Skip to main content
Log in

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

  • ICONIP 2011
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper describes an algorithm that enhances speech by independent vector analysis (IVA) using harmonic frequency dependency for robust speech recognition. While the conventional IVA exploits the full-band uniform dependencies of each source signal, a harmonic clique model is introduced to improve the enhancement performance by modeling strong dependencies among multiples of fundamental frequencies. An IVA-based learning algorithm is derived to consider the non-holonomic constraint and the minimal distortion principle to reduce the unavoidable distortion of IVA, and the minimum power distortionless response beamformer is used as a pre-processing step. In addition, the algorithm compares the log-spectral features of the enhanced speech and observed noisy speech to identify time–frequency segments corrupted by noise and restores those with the cluster-based missing feature reconstruction technique. Experimental results demonstrate that the proposed method enhances recognition performance significantly in noisy environments, especially with competing interference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Juang BH (1991) Speech recognition in adverse environments. Comput Speech Lang 5:279–294

    Article  MathSciNet  Google Scholar 

  2. Singh R, Stern RM, Raj B (2002) Model compensation and matched condition methods for robust speech recognition. In: Davis G (ed) Noise reduction in speech applications. CRC Press, Florida

    Google Scholar 

  3. Raj B, Parikh V, Stern RM (1997) The effects of background music on speech recognition accuracy. In: IEEE ICASSP, pp 851–854

  4. Haykin S (2000) Unsupervised adaptive filtering, volume 1: blind source separation. Wiley, New York

    Google Scholar 

  5. Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, Oxford

    Google Scholar 

  6. Hyvärinen A, Harhunen J, Oja E (2001) Independent component analysis. Wiley, New York

    Book  Google Scholar 

  7. Kim T, Attias HT, Lee S-Y, Lee T-W (2007) Blind source separation exploiting higher-order frequency dependencies. IEEE Trans Audio Speech Lang Process 15:70–79

    Article  Google Scholar 

  8. Lee I, Jang G-J, Lee T-W (2009) Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals. IET Elect Lett 45(13):710–711

    Article  Google Scholar 

  9. Choi CH, Chang W, Lee S-Y (2012) Blind source separation of speech and music signals using harmonic frequency dependent independent vector analysis. IET Elect Lett 48(2):124–125

    Article  Google Scholar 

  10. Matsuoka K, Nakashima S (2001) Minimal distortion principle for blind source separation. In: International workshop on ICA and BSS, pp. 722–727

  11. Raj B, Seltzer ML, Stern RM (2004) Reconstruction of missing features for robust speech recognition. Speech Commun 43:275–296

    Article  Google Scholar 

  12. Amari SI, Chen TP, Cichocki A (2000) Nonholonomic orthogonal learning algorithms for blind source separation, Neural computation 12(6). MIT Press Cambridge, MA

    Google Scholar 

  13. Kim L-H, Tashev I, Acero A (2010) Reverberated speech signal separation based on regularized subband feedforward ICA and instantaneous direction of arrival. In: IEEE ICASSP, pp 2678–2681

  14. Raj B, Stern RM (2005) Missing-feature methods for robust automatic speech recognition. IEEE Signal Process Mag 22:101–116

    Article  Google Scholar 

  15. Kim M, Kim J-S, Park H-M (2011) Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique. In: Proceedings of SPIE 8058, 80580D

  16. Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, New Jersey

    Google Scholar 

  17. Price P, Fisher WM, Bernstein J, Pallet DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of IEEE ICASSP, pp 651–654

  18. Young SJ, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK book (for HTK version 3.4). University of Cambridge, Cambridge

  19. Allen JB, Berkley DA (1979) Image method for efficiently simulating small-room acoustics. J Acoust Soc Am 65(4):943–950

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Mid-career Researcher Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology of Korea (No. 2011-0027537). We appreciate valuable comments and advice of Il-Young Jeong.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyung-Min Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jun, S., Kim, M., Oh, M. et al. Robust speech recognition based on independent vector analysis using harmonic frequency dependency. Neural Comput & Applic 22, 1321–1327 (2013). https://doi.org/10.1007/s00521-012-1002-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1002-6

Keywords

Navigation