Robust speech recognition based on independent vector analysis using harmonic frequency dependency

Jun, Soram; Kim, Minook; Oh, Myungwoo; Park, Hyung-Min

doi:10.1007/s00521-012-1002-6

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

ICONIP 2011
Published: 17 June 2012

Volume 22, pages 1321–1327, (2013)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Soram Jun¹,
Minook Kim¹,
Myungwoo Oh¹ &
…
Hyung-Min Park¹

336 Accesses
4 Citations
Explore all metrics

Abstract

This paper describes an algorithm that enhances speech by independent vector analysis (IVA) using harmonic frequency dependency for robust speech recognition. While the conventional IVA exploits the full-band uniform dependencies of each source signal, a harmonic clique model is introduced to improve the enhancement performance by modeling strong dependencies among multiples of fundamental frequencies. An IVA-based learning algorithm is derived to consider the non-holonomic constraint and the minimal distortion principle to reduce the unavoidable distortion of IVA, and the minimum power distortionless response beamformer is used as a pre-processing step. In addition, the algorithm compares the log-spectral features of the enhanced speech and observed noisy speech to identify time–frequency segments corrupted by noise and restores those with the cluster-based missing feature reconstruction technique. Experimental results demonstrate that the proposed method enhances recognition performance significantly in noisy environments, especially with competing interference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Article 06 October 2020

Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

References

Juang BH (1991) Speech recognition in adverse environments. Comput Speech Lang 5:279–294
Article MathSciNet Google Scholar
Singh R, Stern RM, Raj B (2002) Model compensation and matched condition methods for robust speech recognition. In: Davis G (ed) Noise reduction in speech applications. CRC Press, Florida
Google Scholar
Raj B, Parikh V, Stern RM (1997) The effects of background music on speech recognition accuracy. In: IEEE ICASSP, pp 851–854
Haykin S (2000) Unsupervised adaptive filtering, volume 1: blind source separation. Wiley, New York
Google Scholar
Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, Oxford
Google Scholar
Hyvärinen A, Harhunen J, Oja E (2001) Independent component analysis. Wiley, New York
Book Google Scholar
Kim T, Attias HT, Lee S-Y, Lee T-W (2007) Blind source separation exploiting higher-order frequency dependencies. IEEE Trans Audio Speech Lang Process 15:70–79
Article Google Scholar
Lee I, Jang G-J, Lee T-W (2009) Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals. IET Elect Lett 45(13):710–711
Article Google Scholar
Choi CH, Chang W, Lee S-Y (2012) Blind source separation of speech and music signals using harmonic frequency dependent independent vector analysis. IET Elect Lett 48(2):124–125
Article Google Scholar
Matsuoka K, Nakashima S (2001) Minimal distortion principle for blind source separation. In: International workshop on ICA and BSS, pp. 722–727
Raj B, Seltzer ML, Stern RM (2004) Reconstruction of missing features for robust speech recognition. Speech Commun 43:275–296
Article Google Scholar
Amari SI, Chen TP, Cichocki A (2000) Nonholonomic orthogonal learning algorithms for blind source separation, Neural computation 12(6). MIT Press Cambridge, MA
Google Scholar
Kim L-H, Tashev I, Acero A (2010) Reverberated speech signal separation based on regularized subband feedforward ICA and instantaneous direction of arrival. In: IEEE ICASSP, pp 2678–2681
Raj B, Stern RM (2005) Missing-feature methods for robust automatic speech recognition. IEEE Signal Process Mag 22:101–116
Article Google Scholar
Kim M, Kim J-S, Park H-M (2011) Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique. In: Proceedings of SPIE 8058, 80580D
Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, New Jersey
Google Scholar
Price P, Fisher WM, Bernstein J, Pallet DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of IEEE ICASSP, pp 651–654
Young SJ, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK book (for HTK version 3.4). University of Cambridge, Cambridge
Allen JB, Berkley DA (1979) Image method for efficiently simulating small-room acoustics. J Acoust Soc Am 65(4):943–950
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Mid-career Researcher Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology of Korea (No. 2011-0027537). We appreciate valuable comments and advice of Il-Young Jeong.

Author information

Authors and Affiliations

Department of Electronic Engineering, Sogang University, 35 Baekbeom-ro(Sinsu-dong), Mapo-gu, Seoul, 121-742, Republic of Korea
Soram Jun, Minook Kim, Myungwoo Oh & Hyung-Min Park

Authors

Soram Jun
View author publications
You can also search for this author in PubMed Google Scholar
Minook Kim
View author publications
You can also search for this author in PubMed Google Scholar
Myungwoo Oh
View author publications
You can also search for this author in PubMed Google Scholar
Hyung-Min Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyung-Min Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jun, S., Kim, M., Oh, M. et al. Robust speech recognition based on independent vector analysis using harmonic frequency dependency. Neural Comput & Applic 22, 1321–1327 (2013). https://doi.org/10.1007/s00521-012-1002-6

Download citation

Received: 10 March 2012
Accepted: 05 June 2012
Published: 17 June 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s00521-012-1002-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

Abstract

Access this article

Similar content being viewed by others

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

Abstract

Access this article

Similar content being viewed by others

Non-intrusive speech quality prediction based on the blind estimation of clean speech and the i-vector framework

Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation