Skip to main content
Log in

A GMM/HMM model for reconstruction of missing speech spectral components for continuous speech recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents a method for reconstructing unreliable spectral components of speech signals using the statistical distributions of the clean components. Our goal is to model the temporal patterns in speech signal and take advantage of correlations between speech features in both time and frequency domain simultaneously. In this approach, a hidden Markov model (HMM) is first trained on clean speech data to model the temporal patterns which appear in the sequences of the spectral components. Using this model and according to the probabilities of occurring noisy spectral component at each states, a probability distributions for noisy components are estimated. Then, by applying maximum a posteriori (MAP) estimation on the mentioned distributions, the final estimations of the unreliable spectral components are obtained. The proposed method is compared to a common missing feature method which is based on the probabilistic clustering of the feature vectors and also to a state of the art method based on sparse reconstruction. The experimental results exhibits significant improvement in recognition accuracy over a noise polluted Persian corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bijankhan, M., Sheikhzadegan, J., Roohani, M. R., Samareh, Y., Lucas, C., & Tebyani, M. (1994). FARSDAT—The speech data base of Farsi spoken language. In International conference on speech science and technology (SST) (pp. 826–831). Perth.

  • BorgstrÖm, B. J., & Alwan, A. (2009). Utilizing compressibility in reconstructing spectrographic data, with applications to noise robust ASR. IEEE Signal Processing Letters, 16(5), 398–401. Retrieved July 21, 2012 from http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4804947&contentType=Journals+&+Magazines&sortType=asc_p_Sequence&filter=AND(p_IS_Number:4802157)

  • Borgström, B. J., & Alwan, A. (2010). HMM-based reconstruction of unreliable spectrographic data for noise robust speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 18(6), 1612–1623. doi:10.1109/TASL.2009.2038811.

    Article  Google Scholar 

  • Cerisara, C., Demange, S., & Haton, J.-P. (2007). On noise masking for automatic missing data speech recognition: A survey and discussion. Computer Speech & Language, 21(3), 443–457. doi:10.1016/j.csl.2006.08.001.

    Article  Google Scholar 

  • Chiu, Y. -H. B., & Stern, R. M. (2009). Minimum variance modulation filter for robust speech recognition. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3917–3920). IEEE. doi:10.1109/ICASSP.2009.4960484

  • Cooke, M., Green, P., & Crawford, M. (1994). Handling missing data in speech recognition. In International conference on spoken language processing. Yokahama. doi:10.1.1.45.3451

  • Cooke, M., Green, P., Josifovski, L., & Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34, 267–285. Retrieved September 17, 2012 from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.6933

  • Demange, S., Cerisara, C., & Haton, J. -P. (2009). Missing data mask estimation with frequency and temporal dependencies. Computer Speech & Language, 23(1), 25–41. Retrieved December 3, 2013 from http://www.sciencedirect.com/science/article/pii/S0885230808000053

  • Gemmeke, J. F., Cranen, B., & Remes, U. (2011). Sparse imputation for large vocabulary noise robust ASR. Computer Speech & Language, 25(2), 462–479. doi:10.1016/j.csl.2010.06.004.

    Article  Google Scholar 

  • Gillian, M. D. (2002). Noise reduction in speech applications (electrical engineering & applied signal processing series). Maryland: CRC Press. Retrieved September 17, 2012 from http://www.amazon.com/Reduction-Applications-Electrical-Engineering-Processing/dp/0849309492

  • Gonzalez, J. A., Peinado, A. M., Ma, N., Gomez, A. M., & Barker, J. (2013). MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 21(3), 624–635. doi:10.1109/TASL.2012.2229982.

    Article  Google Scholar 

  • Goodarzi, M. M., & Almasganj, F. (2015). Joint sparsity and marginal classification for improving sparse imputation performance in speech recognition. Computers & Electrical Engineering, 46, 56–64. doi:10.1016/j.compeleceng.2015.07.013.

    Article  Google Scholar 

  • Goodarzi, M. M., & Almasganj, F. (2016). Model-based clustered sparse imputation for noise robust speech recognition. Speech Communication, 76, 218–229. doi:10.1016/j.specom.2015.06.009.

    Article  Google Scholar 

  • Goodarzi, M. M., Almasganj, F., & Ahadi, M. (2010). Reconstructing missing speech spectral components using both temporal and statistical correlations. In 10th International conference on information science, signal processing and their applications (ISSPA 2010) (pp. 125–128). IEEE. doi:10.1109/ISSPA.2010.5605492

  • Hansen, J. H. L., Kumar, A., & Angkititrakul, P. (2014). Environment mismatch compensation using average eigenspace-based methods for robust speech recognition. International Journal of Speech Technology, 17(4), 353–364. doi:10.1007/s10772-014-9233-9.

    Article  Google Scholar 

  • Hermansky, H. (2013). Multistream recognition of speech: Dealing with unknown unknowns. Proceedings of the IEEE, 101(5), 1076–1088. doi:10.1109/JPROC.2012.2236871.

    Article  Google Scholar 

  • Kallasjoki, H., Gemmeke, J. F., & Palomaki, K. J. (2014). Estimating uncertainty to improve exemplar-based feature enhancement for noise robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 368–380. doi:10.1109/TASLP.2013.2292328.

    Article  Google Scholar 

  • Keronen, S., Kallasjoki, H., Remes, U., Brown, G. J., Gemmeke, J. F., & Palomäki, K. J. (2013). Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment. Computer Speech & Language, 27(3), 798–819. doi:10.1016/j.csl.2012.06.005.

    Article  Google Scholar 

  • Kuhne, M., Togneri, R., & Nordholm, S. (2011). A new evidence model for missing data speech recognition with applications in reverberant multi-source environments. IEEE Transactions on Audio, Speech and Language Processing, 19(2), 372–384. doi:10.1109/TASL.2010.2048604.

    Article  MATH  Google Scholar 

  • Luan, Y., Saito, D., Kashiwagi, Y., Minematsu, N., & Hirose, K. (2014). Semi-supervised noise dictionary adaptation for exemplar-based noise robust speech recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1745–1748). IEEE. doi:10.1109/ICASSP.2014.6853897

  • Ma, N., & Barker, J. (2013). A fragment-decoding plus missing-data imputation ASR system evaluated on the 2nd CHiME Challenge. In The 2nd CHiME workshop on machine listening in multisource environments. Vancouver.

  • Mohammadi, A., & Almasganj, F. (2011). Reconstruction of missing features by means of multivariate Laplace distribution (MLD) for noise robust speech recognition. Expert Systems with Applications, 38(4), 3918–3930. doi:10.1016/j.eswa.2010.09.053.

    Article  Google Scholar 

  • Mohammadi, A., Almasganj, F., Taherkhani, A., & Naderkhani, F. (2007). Using phoneme segmentation in conjunction with missing feature approaches for noise robust speech recognition. In 2007 IEEE international symposium on signal processing and information technology (pp. 297–301). IEEE. doi:10.1109/ISSPIT.2007.4458075

  • Mporas, I., Ganchev, T., Kocsis, O., & Fakotakis, N. (2011). Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment. Signal Processing, 91(8), 2101–2111. doi:10.1016/j.sigpro.2011.03.020.

    Article  MATH  Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286. doi:10.1109/5.18626.

    Article  Google Scholar 

  • Raj, B., Seltzer, M. L., & Stern, R. M. (2004). Reconstruction of missing features for robust speech recognition. Speech Communication, 43(4), 275–296. doi:10.1016/j.specom.2004.03.007.

    Article  Google Scholar 

  • Raj, B., & Stern, R. M. (2005). Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine, 22(5), 101–116. doi:10.1109/MSP.2005.1511828.

    Article  Google Scholar 

  • Shekofteh, Y., & Almasganj, F. (2013). Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes. Digital Signal Processing, 23(6), 1923–1932. doi:10.1016/j.dsp.2013.06.011.

    Article  Google Scholar 

  • Tan, Q. F., Georgiou, P. G., & Narayanan, S. (2011). Enhanced sparse imputation techniques for a robust speech recognition front-end. IEEE Transactions on Audio, Speech and Language Processing, 19(8), 2418–2429. doi:10.1109/TASL.2011.2136337.

    Article  Google Scholar 

  • Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. doi:10.1016/0167-6393(93)90095-3.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mohsen Goodarzi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goodarzi, M.M., Almasganj, F. A GMM/HMM model for reconstruction of missing speech spectral components for continuous speech recognition. Int J Speech Technol 19, 769–777 (2016). https://doi.org/10.1007/s10772-016-9369-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9369-x

Keywords

Navigation