Skip to main content

Advertisement

Log in

A review of supervised learning algorithms for single channel speech enhancement

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Reducing interfering noise in a noisy speech recording has been a difficult task in many applications related to the voice. From hands-free communication to human–machine interaction, a speech signal of the interest captured by a microphone is always mixed with the interfering noise. The interfering noise appends new frequency components and masks a large portion of the time-varying spectra of the desired speech. This significantly affects our perception of the desired speech when listening to the noisy observations. Therefore, it is extremely desirable and sometimes even crucial to clean the noisy speech signals. This clean-up process is referred to as the speech enhancement (SE). SE aims to improve the speech intelligibility and quality of the voice for the communication. We present a comprehensive review on the supervised single channel speech enhancement (SCSE) algorithms. First, a classification based overview of the supervised SCSE algorithms is provided and the related works is outlined. The recent literature on the SCSE algorithms in supervised perspective is reviewed. Finally, some open research problems are identified that need further research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

From Ref. (Han and Wang 2013)

Fig. 7

from Ref. (Wang et al. 2012)

Fig. 8

From Ref. (Chung et al. 2017)

Fig. 9

From Ref. (Mohammadiha et al. 2013)

Fig. 10

From Ref. (Hussain et al. 2017)

Fig. 11

From Ref. (Xu et al. 2015)

Fig. 12

From Ref. (Mohammed and Tashev 2017)

Similar content being viewed by others

References

  • Ali, S. M., & Gupta, B. Speech enhancement using neural network.

  • Allen, J. B. (1994). How do humans process and recognize speech? IEEE Transactions on Speech and Audio Processing,2(4), 567–577.

    Google Scholar 

  • Arehart, K. H., Hansen, J. H., Gallant, S., & Kalstein, L. (2003). Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners. Speech Communication,40(4), 575–592.

    Google Scholar 

  • Baer, T., Moore, B. C., & Gatehouse, S. (1993). Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: Effects on intelligibility, quality, and response times. Journal of Rehabilitation Research and Development,30, 49.

    Google Scholar 

  • Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters,8(1), 10–12.

    Google Scholar 

  • Bentler, R., Wu, Y. H., Kettel, J., & Hurtig, R. (2008). Digital noise reduction: Outcomes from laboratory and field studies. International Journal of Audiology,47(8), 447–460.

    Google Scholar 

  • Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST),2(3), 27.

    Google Scholar 

  • Chazan, S. E., Goldberger, J., & Gannot, S. (2016). A hybrid approach for speech enhancement using MoG model and neural network phoneme classifier. IEEE/ACM Transactions on Audio, Speech, and Language Processing,24(12), 2516–2530.

    Google Scholar 

  • Chen, J., Wang, Y., & Wang, D. (2014). A feature study for classification-based speech separation at low signal-to-noise ratios. IEEE/ACM Transactions on Audio, Speech, and Language Processing,22(12), 1993–2002.

    Google Scholar 

  • Chen, J., Wang, Y., & Wang, D. (2016). Noise perturbation for supervised speech separation. Speech Communication,78, 1–10.

    Google Scholar 

  • Chiluveru, S. R., & Tripathy, M. (2019). Low SNR speech enhancement with DNN based phase estimation. International Journal of Speech Technology,22(1), 283–292.

    Google Scholar 

  • Chung, H., Plourde, E., & Champagne, B. (2016, March). Basis compensation in non-negative matrix factorization model for speech enhancement. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2249–2253). IEEE.

  • Chung, H., Plourde, E., & Champagne, B. (2017). Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement. Speech Communication,87, 18–30.

    Google Scholar 

  • Cohen, Israel. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters,9(4), 113–116.

    Google Scholar 

  • Cohen, I., & Berdugo, B. (2001). Speech enhancement for non-stationary noise environments. Signal Processing,81(11), 2403–2418.

    MATH  Google Scholar 

  • Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters,9(1), 12–15.

    Google Scholar 

  • Deng, L., Hinton, G., & Kingsbury, B. (2013, May). New types of deep neural network learning for speech recognition and related applications: An overview. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8599–8603). IEEE.

  • Eggert, J., Wersing, H., & Korner, E. (2004, July). Transformation-invariant representation and NMF. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 4, pp. 2535–2539). IEEE.

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,32(6), 1109–1121.

    Google Scholar 

  • Ephraim, Yariv, & Malah, David. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing,33(2), 443–445.

    Google Scholar 

  • Ephraim, Y., & van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,3(4), 251–266.

    Google Scholar 

  • Févotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the β-divergence. Neural Computation,23(9), 2421–2456.

    MathSciNet  MATH  Google Scholar 

  • Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256).

  • Gordon-Salant, S. (1987). Effects of acoustic modification on consonant recognition by elderly hearing-impaired subjects. The Journal of the Acoustical Society of America,81(4), 1199–1202.

    Google Scholar 

  • Han, K., & Wang, D. (2012). A classification based approach to speech segregation. The Journal of the Acoustical Society of America,132(5), 3475–3483.

    Google Scholar 

  • Han, K., & Wang, D. (2013). Towards generalizing classification based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(1), 168–177.

    Google Scholar 

  • Han, W., Zhang, X., Min, G., & Sun, M. (2016). A perceptually motivated approach for speech enhancement based on deep neural network. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(4), 835–838.

    Google Scholar 

  • Han, W., Zhang, X., Min, G., Zhou, X., & Sun, M. (2017). Joint optimization of perceptual gain function and deep neural networks for single-channel speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(2), 714–717.

    Google Scholar 

  • Hansen, J. H., & Clements, M. A. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing,39(4), 795–805.

    Google Scholar 

  • Helfer, K. S., & Wilber, L. A. (1990). Hearing loss, aging, and speech perception in reverberation and noise. Journal of Speech, Language, and Hearing Research,33(1), 149–155.

    Google Scholar 

  • Hermus, K., & Wambacq, P. (2006). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Advances in Signal Processing,2007(1), 045821.

    MathSciNet  MATH  Google Scholar 

  • Hirsch, H. G., & Ehrlicher, C. (1995, May). Noise estimation techniques for robust speech recognition. In 1995 International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 153–156). IEEE.

  • Hu, Y., & Loizou, P. C. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio processing,12(1), 59–67.

    Google Scholar 

  • Hu, Y., & Loizou, P. C. (2007, April). A comparative intelligibility study of speech enhancement algorithms. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (Vol. 4, pp. IV–561). IEEE.

  • Hu, Y., & Loizou, P. C. (2007b). A comparative intelligibility study of single-microphone noise reduction algorithms. The Journal of the Acoustical Society of America,122(3), 1777–1786.

    Google Scholar 

  • Hu, G., & Wang, D. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2067–2079.

    Google Scholar 

  • Hu, Y., Zhang, X., Zou, X., Sun, M., Min, G., & Li, Y. (2016). Improved semi-supervised NMF based real-time capable speech enhancement. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,99(1), 402–406.

    Google Scholar 

  • Hu, Y., Zhang, X., Zou, X., Sun, M., Zheng, Y., & Min, G. (2017). Semi-supervised speech enhancement combining nonnegative matrix factorization and robust principal component analysis. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences,100(8), 1714–1719.

    Google Scholar 

  • Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(12), 2136–2147.

    Google Scholar 

  • Hussain, T., Siniscalchi, S. M., Lee, C. C., Wang, S. S., Tsao, Y., & Liao, W. H. (2017). Experimental study on extreme learning machine applications for speech enhancement. IEEE Access,5, 25542–25554.

    Google Scholar 

  • Jamieson, D. G., Brennan, R. L., & Cornelisse, L. E. (1995). Evaluation of a speech enhancement strategy with normal-hearing and hearing-impaired listeners. Ear and Hearing,16(3), 274–286.

    Google Scholar 

  • Jin, Z., & Wang, D. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech and Language Processing,17(4), 625–638.

    Google Scholar 

  • Joder, C., Weninger, F., Eyben, F., Virette, D., & Schuller, B. (2012, March). Real-time speech separation by semi-supervised nonnegative matrix factorization. In International Conference on Latent Variable Analysis and Signal Separation (pp. 322–329). Berlin, Heidelberg: Springer.

  • Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2080–2090.

    Google Scholar 

  • Kim, G., Lu, Y., Hu, Y., & Loizou, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America,126(3), 1486–1494.

    Google Scholar 

  • Kim, W., & Stern, R. M. (2011). Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise. Speech Communication,53(1), 1–11.

    Google Scholar 

  • Kolbk, M., Tan, Z. H., Jensen, J., Kolbk, M., Tan, Z. H., & Jensen, J. (2017). Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),25(1), 153–167.

    Google Scholar 

  • Koul, R. K., & Allen, G. D. (1993). Segmental intelligibility and speech interference thresholds of high-quality synthetic speech in presence of noise. Journal of Speech, Language, and Hearing Research,36(4), 790–798.

    Google Scholar 

  • Krishnamoorthy, P., & Prasanna, S. M. (2009). Temporal and spectral processing methods for processing of degraded speech: A review. IETE Technical Review,26(2), 137–148.

    Google Scholar 

  • Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for training deep neural networks. Journal of Machine Learning Research,10, 1–40.

    MATH  Google Scholar 

  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436.

    Google Scholar 

  • Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature,401(6755), 788.

    MATH  Google Scholar 

  • Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In M. C. Mozer, M. E. Hasselmo, & D. S. Touretzky (Eds.), Advances in neural information processing systems (pp. 556–562). Cambridge: MIT Press.

    Google Scholar 

  • Levitt, H. (2001). Noise reduction in hearing aids: A review. Journal of Rehabilitation Research and Development,38(1), 111–122.

    MathSciNet  Google Scholar 

  • Li, Y., & Kang, S. (2016). Deep neural network-based linear predictive parameter estimations for speech enhancement. IET Signal Processing,11(4), 469–476.

    Google Scholar 

  • Loizou, Philipos C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC.

    Google Scholar 

  • Loizou, P. C. (2011). Speech quality assessment. In Multimedia analysis, processing and communications (pp. 623–654). Berlin, Heidelberg: Springer.

    Google Scholar 

  • Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing,2005(7), 354850.

    MATH  Google Scholar 

  • Ludeña-Choez, J., & Gallardo-Antolín, A. (2012). Speech denoising using non-negative matrix factorization with kullback-leibler divergence and sparseness constraints. In Advances in Speech and Language Technologies for Iberian Languages (pp. 207–216). Berlin, Heidelberg: Springer.

  • Luts, H., Eneman, K., Wouters, J., Schulte, M., Vormann, M., Buechler, M.,… & Puder, H. (2010). Multicenter evaluation of signal enhancement algorithms for hearing aids. The Journal of the Acoustical Society of America, 127(3), 1491-1505.

    Google Scholar 

  • Lyubimov, N., & Kotov, M. (2013). Non-negative matrix factorization with linear constraints for single-channel speech enhancement. http://arxiv.org/abs//1309.6047.

  • Ma, J., & Loizou, P. C. (2011). SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Communication,53(3), 340–354.

    Google Scholar 

  • Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing,13(5), 845–856.

    Google Scholar 

  • May, T., & Dau, T. (2014). Requirements for the evaluation of computational speech segregation systems. The Journal of the Acoustical Society of America,136(6), 398–404.

    Google Scholar 

  • Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech and Language Processing,21(10), 2140–2151.

    Google Scholar 

  • Mohammed, S., & Tashev, I. (2017, March). A statistical approach to semi-supervised speech enhancement with low-order non-negative matrix factorization. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 546–550). IEEE.

  • Moore, B. C. (2003). Speech processing for the hearing-impaired: Successes, failures, and implications for speech mechanisms. Speech Communication,41(1), 81–91.

    Google Scholar 

  • Mysore, G. J., & Smaragdis, P. (2011, May). A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 17–20). IEEE.

  • Nidhyananthan, S. S., Kumari, R. S. S., & Prakash, A. A. (2014). A review on speech enhancement algorithms and why to combine with environment classification. International Journal of Modern Physics C,25(10), 1430002.

    Google Scholar 

  • Nielsen, M. A. (2015). Neural networks and deep learning(Vol 25). San Francisco, CA: Determination Press.

    Google Scholar 

  • Ozerov, A., Philippe, P., Bimbot, F., & Gribonval, R. (2007). Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Transactions on Audio, Speech and Language Processing,15(5), 1564–1578.

    Google Scholar 

  • Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks,3(5), 683–697.

    Google Scholar 

  • Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing,14(6), 2098–2108.

    Google Scholar 

  • Quackenbush, S. R. (1995). Objective measures of speech quality. (Doctoral dissertation, Georgia Institute of Technology).

  • Raj, B., Virtanen, T., Chaudhuri, S., & Singh, R. (2010). Non-negative matrix factorization based compensation of music for automatic speech recognition. In Eleventh Annual Conference of the International Speech Communication Association.

  • Rehr, R., & Gerkmann, T. (2017). Normalized features for improving the generalization of DNN based speech enhancement. http://arxiv.org/abs//1709.02175.

  • Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing,9(2), 87–95.

    Google Scholar 

  • Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001, May). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752). IEEE.

  • Roberts, S. J., Husmeier, D., Rezek, I., & Penny, W. (1998). Bayesian approaches to Gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence,20(11), 1133–1142.

    Google Scholar 

  • Roweis, S. T. (2001). One microphone source separation. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (pp. 793–799). Cambridge: MIT Press.

    Google Scholar 

  • Ruck, D. W., Rogers, S. K., & Kabrisky, M. (1990a). Feature selection using a multilayer perceptron. Journal of Neural Network Computing,2(2), 40–48.

    Google Scholar 

  • Ruck, D. W., Rogers, S. K., Kabrisky, M., Oxley, M. E., & Suter, B. W. (1990b). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks,1(4), 296–298.

    Google Scholar 

  • Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4580–4584). IEEE.

  • Saleem, N. (2017). Single channel noise reduction system in low SNR. International Journal of Speech Technology,20(1), 89–98.

    MathSciNet  Google Scholar 

  • Saleem, N., & Khattak, M. I. (2019). Deep neural networks for speech enhancement in complex-noisy environments. International Journal of Interactive Multimedia and Artificial Intelligence, vol. In Press, issue In Press, no. In Press, pp. 1–7, In Press.

  • Saleem, N., Irfan Khattak, M., & Qazi, A. B. (2019a). Supervised speech enhancement based on deep neural network. Journal of Intelligent & Fuzzy Systems. https://doi.org/10.3233/JIFS-190047.

    Article  Google Scholar 

  • Saleem, N., Khattak, M. I., Ali, M. Y., & Shafi, M. (2019b). Deep neural network for supervised single-channel speech enhancement. Archives of Acoustics,44(1), 3–12.

    Google Scholar 

  • Sang, J. (2012). Evaluation of the sparse coding shrinkage noise reduction algorithm for the hearing impaired. (Doctoral dissertation, University of Southampton).

  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks,61, 85–117.

    Google Scholar 

  • Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.

    Google Scholar 

  • Seltzer, M. L., Raj, B., & Stern, R. M. (2004). A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication,43(4), 379–393.

    Google Scholar 

  • Sharma, P., Abrol, V., & Sao, A. K. (2015, February). Supervised speech enhancement using compressed sensing. In 2015 Twenty First National Conference on Communications (NCC) (pp. 1–5). IEEE.

  • Smaragdis, P. (2007). Convolutive speech bases and their application to supervised speech separation. IEEE Transactions on Audio, Speech and Language Processing,15(1), 1–12.

    Google Scholar 

  • Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing,14(3), 199–222.

    MathSciNet  Google Scholar 

  • Sun, P., & Qin, J. (2016). Semi-supervised speech enhancement in envelop and details subspaces. http://arxiv.org/abs//1609.09443.

  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. http://arxiv.org/abs//1312.6199.

  • Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010, March). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4214–4217). IEEE.

  • Tang, J., Deng, C., & Huang, G. B. (2016). Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems,27(4), 809–821.

    MathSciNet  Google Scholar 

  • Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing,11(3), 184–192.

    Google Scholar 

  • Tsoukalas, D. E., Mourjopoulos, J. N., & Kokkinakis, G. (1997). Speech enhancement based on audible noise suppression. IEEE Transactions on Speech and Audio Processing,5(6), 497–514.

    MATH  Google Scholar 

  • Vary, P., & Martin, R. (2006). Digital speech transmission: Enhancement, coding and error concealment. Hoboken: Wiley.

    Google Scholar 

  • Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing,7(2), 126–137.

    Google Scholar 

  • Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing,15(3), 1066–1074.

    Google Scholar 

  • Wang, Y., Han, K., & Wang, D. (2012). Acoustic features for classification based speech separation. In Thirteenth Annual Conference of the International Speech Communication Association.

  • Wang, Y., Han, K., & Wang, D. (2013). Exploring monaural features for classification-based speech segregation. IEEE Transactions on Audio, Speech and Language Processing,21(2), 270–279.

    Google Scholar 

  • Wang, Y., & Wang, D. (2013). Towards scaling up classification-based speech separation. IEEE Transactions on Audio, Speech and Language Processing,21(7), 1381–1390.

    Google Scholar 

  • Weninger, F., Roux, J. L., Hershey, J. R., & Watanabe, S. (2014). Discriminative NMF and its application to single-channel source separation. In Fifteenth Annual Conference of the International Speech Communication Association.

  • Wiest, J., Höffken, M., Kreßel, U., & Dietmayer, K. (2012, June). Probabilistic trajectory prediction with Gaussian mixture models. In 2012 IEEE Intelligent Vehicles Symposium (pp. 141–146). IEEE.

  • Xiao, X., Zhao, S., Nguyen, D. H. H., Zhong, X., Jones, D. L., Chng, E. S., et al. (2016). Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP Journal on Advances in Signal Processing,2016(1), 4.

    Google Scholar 

  • Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters,21(1), 65–68.

    Google Scholar 

  • Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing,23(1), 7–19.

    Google Scholar 

Download references

Acknowledgements

The authors would like to express the highest gratitude to the Journal Editor and the anonymous Reviewers for their supportive, helpful and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nasir Saleem.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saleem, N., Khattak, M.I. A review of supervised learning algorithms for single channel speech enhancement. Int J Speech Technol 22, 1051–1075 (2019). https://doi.org/10.1007/s10772-019-09645-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09645-2

Keywords

Navigation