Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Uluskan, Seçkin; Sangwan, Abhijeet; Hansen, John H. L.

doi:10.1007/s10772-017-9449-6

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Published: 17 August 2017

Volume 20, pages 799–811, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Seçkin Uluskan¹,
Abhijeet Sangwan² &
John H. L. Hansen²

210 Accesses
1 Citation
Explore all metrics

Abstract

Distant speech capture in lecture halls and auditoriums offers unique challenges in algorithm development for automatic speech recognition. In this study, a new adaptation strategy for distant noisy speech is created by the means of phoneme classes. Unlike previous approaches which adapt the acoustic model to the features, the proposed phoneme-class based feature adaptation (PCBFA) strategy adapts the distant data features to the present acoustic model which was previously trained on close microphone speech. The essence of PCBFA is to create a transformation strategy which makes the distributions of phoneme-classes of distant noisy speech similar to those of a close talk microphone acoustic model in a multidimensional MFCC space. To achieve this task, phoneme-classes of distant noisy speech are recognized via artificial neural networks. PCBFA is the adaptation of features rather than adaptation of acoustic models. The main idea behind PCBFA is illustrated via conventional Gaussian mixture model–Hidden Markov model (GMM–HMM) although it can be extended to new structures in automatic speech recognition (ASR). The new adapted features together with the new and improved acoustic models produced by PCBFA are shown to outperform those created only by acoustic model adaptations for ASR and keyword spotting. PCBFA offers a new powerful understanding in acoustic-modeling of distant speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building Automatic Speech Recognition Systems for Moroccan Dialect: A Phoneme-Based Approach

Article 25 July 2024

Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Article 02 June 2018

Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

Article Open access 23 July 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Arslan, L. M., & Hansen, J. H. L. (1994). A minimum cost based phoneme class detector for improved iterative speech enhancement. IEEE ICASSP-94 Proceedings, Adelaide, Australia, Vol. 2 pp. 45–48.
Brandstein, M. S., & Ward, D. B. (2001). Microphone arrays: Signal processing techniques and applications. Berlin: Springer.
Book Google Scholar
Clarkson, P. R., & Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge Toolkit. ESCA Eurospeech Proceedings, Rhodes, Greece, Vol. 1, pp. 2707–2710.
CMU Sphinx - Speech Recognition Toolkit. Open source toolkit for speech recognition project by Carnegie Mellon University. http://cmusphinx.sourceforge.net/.
Demiroglu, C., & Anderson, D. V. (2004). Broad phoneme class recognition in noisy environments using the GEMS. ACSSC Proceedings, Vol. 2, pp. 1805–1808.
Dmochowski, J. P., Zicheng, L., & Chou, P. A. (2008). Blind source separation in a distributed microphone meeting environment for improved teleconferencing. ICASSP IEEE international conference on acoustics, speech and signal processing conference proceedings, pp. 89–92.
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Book Google Scholar
Gehrig, T., Nickel, K., Ekenel, H. K., Klee, U., & McDonough, J. (2005). Kalman filters for audio-video source localization. IEEE workshop of applications of signal processing to audio and acoustics proceedings, pp. 118–121.
Hansen, J. H. L., & Arslan, L. M. (1995). Markov model-based phoneme class partitioning for improved constrained iterative speech enhancement. IEEE Transactions on Speech and Audio Processing, 3(1), 98–104.
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.
Article Google Scholar
Khwaja, M. K., Vikash, P., Arulmozhivarman, P., & Lui, S. (2016). Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model. International Journal of Speech Technology, 19(4), 895–905.
Article Google Scholar
Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). Emotion recognition based on phoneme classes. ICSLP-04 Proceedings, pp. 889–892.
Liao, H. (2013). Speaker adaptation of context dependent deep neural networks. IEEE international conference on acoustics, speech and signal processing proceedings, pp. 7947–7951.
Maas, A. L., Qi, P., Xie, Z., Hannun, A. Y., Lengerich, C. T., Jurafsky, D., & Ng, A. Y. (2017). Building DNN acoustic models for large vocabulary speech recognition. Computer Speech and Language, 41, 195–213.
Article Google Scholar
Mirsamadi, S., & Hansen, J. H. (2015). A study on deep neural network acoustic model adaptation for robust far-field speech recognition. Interspeech Proceedings, Dresden, Germany, pp. 2430–2434.
Mirsamadi, S., & Hansen, J. H. (2016). A generalized nonnegative tensor factorization approach for distant speech recognition with distributed microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10), 1721–1731.
Article Google Scholar
Montanari, A. Principal component analysis, University of Bologna. http://www2.stat.unibo.it/montanari/Didattica/Multivariate/PCA1.pdf.
Palaz, D., Collobert, R., & Magimai, M. (2013). Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks. Interspeech Proceedings, Lyon, France, pp. 1766–1770.
Senior Design Day Presentation Videos of University of Texas at Dallas, Erik Jonsson School of Engineering and Computer Science. http://www.youtube.com/user/EE1Events1UTD/.
Swietojanski, P., Li, J., & Renals, S. (2016). Learning hidden unit contributions for unsupervised acoustic model adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(8), 1450–1463.
Article Google Scholar
The Carnegie Mellon University Pronouncing Dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict.
US English WSJ5K Language Model. https://sourceforge.net/projects /cmusphinx/files/Acoustic%20and%20Language%20Models/Archive/US%20English%20WSJ5K%20Language%20Model/.
Wölfel, M., & McDonough, J. W. (2009). Distant speech recognition. New York: Wiley.
Book Google Scholar
Woodland, P. C. (2001). Speaker adaptation for continuous density HMMs: A review. ISCA Workshop on Adaptation, pp. 11–19.
Zhang, C., Wu, X., Zheng, T. F., Wang, L., & Yin, C. (2012). A K-phoneme-class based multi-model method for short utterance speaker recognition. Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC) Proceedings, pp. 1–4.

Download references

Acknowledgements

This project was funded by AFRL under contract FA8750-12-1-0188, and partially by the University of Texas at Dallas from the Distinguished University Chair in Telecommunications Engineering held by J.H.L. Hansen.

Author information

Authors and Affiliations

Anadolu University, 26555, Eskişehir, Turkey
Seçkin Uluskan
The University of Texas at Dallas, Richardson, TX, 75080, USA
Abhijeet Sangwan & John H. L. Hansen

Authors

Seçkin Uluskan
View author publications
You can also search for this author inPubMed Google Scholar
Abhijeet Sangwan
View author publications
You can also search for this author inPubMed Google Scholar
John H. L. Hansen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Seçkin Uluskan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Uluskan, S., Sangwan, A. & Hansen, J.H.L. Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech. Int J Speech Technol 20, 799–811 (2017). https://doi.org/10.1007/s10772-017-9449-6

Download citation

Received: 01 November 2016
Accepted: 07 August 2017
Published: 17 August 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10772-017-9449-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Building Automatic Speech Recognition Systems for Moroccan Dialect: A Phoneme-Based Approach

Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command

Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now