Skip to main content
Log in

Fundamentals, present and future perspectives of speech enhancement

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Speech enhancement has substantial interest in the utilization of speaker identification, video-conference, speech transmission through communication channels, speech-based biometric system, mobile phones, hearing aids, microphones, voice conversion etc. Pattern mining methods have a vital step in the growth of speech enhancement schemes. To design a successful speech enhancement system consideration to the background noise processing is needed. A substantial number of methods from traditional techniques and machine learning have been utilized to process and remove the additive noise from a speech signal. With the advancement of machine learning and deep learning, classification of speech has become more significant. Methods of speech enhancement consist of different stages, such as feature extraction of the input speech signal, feature selection, feature selection followed by classification. Deep learning techniques are also an emerging field in the classification domain, which is discussed in this review. The intention of this paper is to provide a state-of-the-art summary and present approaches for using the widely used machine learning and deep learning methods to detect the challenges along with future research directions of speech enhancement systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abd El-Fattah, M. A., Dessouky, M. I., Abbas, A. M., Diab, S. M., El-Rabaie, E.-S. M., Al-Nuaimy, W., Abd El-samie, F. E. (2013). Speech enhancement with an adaptive Wiener filter. International Journal of Speech Technology, 17(1), 53–64.

    Google Scholar 

  • Ahmed, J.& Ikram, N. (2003). Frequency-domain speech scrambling/descrambling techniques implementation & evaluation on DSP. In 7th International Multi Topic Conference, 2003. INMIC 2003 (pp. 781–789).

  • Al-Shoshan, A. I. (2006). Speech and music classification and separation: A review. Journal of King Saud University—WEngineering Sciences, 19(1), 95–132.

    Google Scholar 

  • Ando, Y. (2013). Autocorrelation-based features for speech representation. The Journal of the Acoustical Society of America, 133(5), 1–8.

    Google Scholar 

  • Ang, L. M., Seng, K. P., & Heng, T. Z. (2016). Information communication assistive technologies for visually impaired people. International Journal of Ambient Computing and Intelligence, 7(1), 45–68.

    Google Scholar 

  • Araki, S., Ono, N., Kinoshita, K., & Delcroix, M. (2018). Comparison of reference microphone selection algorithms for distributed microphone array based speech enhancement in meeting recognition scenarios. In 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC) (pp. 316–320).

  • Arslan, L. M., & Hansen, J. H. L. (1997). Speech enhancement for crosstalk interference. IEEE Signal Processing Letters, 4(4), 92–95.

    Google Scholar 

  • Atmaja, B. T., Farid, M. N., & Arifianto, D. (2016). Speech enhancement on smartphone voice recording, 8th international conference on physics & its applications (ICOPIA). Journal of Physics: Conference Series, 776, 1–6.

    Google Scholar 

  • Bachu, R., Kopparthi, S., Adapa, B., & Barkana, B. (2010). Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. In K. Elleithy (Ed.), Advanced techniques in computing sciences and software engineering (pp. 279–284). Dordrecht: Springer.

    Google Scholar 

  • Bai, H. & Wan, E.A. (2003). Two-pass quantile based noise spectrum estimation. Center of Spoken Language Understanding, OGI School of Science & Engineering at OHSU (pp. 12–16).

  • Baishya, A., & Kumar, P. (2018). Speech de-noising using wavelet based methods with focus on classification of speech into voiced, unvoiced and silence regions. In 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN).

  • Barman, P. C., & Lee, S.-Y. (2008). Nonnegative matrix factorization (NMF) based supervised feature selection and adaptation. In Intelligent Data Engineering and Automated Learning—IDEAL 2008 (pp. 120–127).

  • Baumgarten, M., Mulvenna, M. D., Rooney, N., & Reid, J. (2013). Keyword-based sentiment mining using twitter. International Journal of Ambient Computing and Intelligence, 5(2), 56–69.

    Google Scholar 

  • Beh, J., Baran, R. H., & Ko, H. (2006). Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment. IEEE Transactions on Consumer Electronics, 52(2), 583–589.

    Google Scholar 

  • Berouti, M., Schwartz, R. & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings on IEEE ICASSP’79, Washington, DC, Apr. 1979 (pp. 208–211).

  • Bhat, G. S., Shankar, N., Reddy, C. K. A., & Panahi, I. M. S. (2019). A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone. IEEE Access, 7, 78421–78433. https://doi.org/10.1109/access.2019.2922370.

    Article  Google Scholar 

  • Biem, A., Katagiri, S., & Juang, B.-H. (1993). Discriminative feature extraction for speech recognition. In Neural Networks for Signal Processing III—Proceedings of the 1993 IEEE-SP Workshop.

  • Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, & Signal Processing, 27(2), 113–120.

    Google Scholar 

  • Brandstein, M. S., & Griebel, S. M. (2000). Nonlinear, model-based microphone array speech enhancement. In Acoustic signal processing for telecommunication (pp. 261–279).

  • Bureš, V., Tučník, P., Mikulecký, P., Mls, K., & Blecha, P. (2016). Application of ambient intelligence in educational institutions: Visions and architectures. International Journal of Ambient Computing Intelligence, 7, 94–120.

    Google Scholar 

  • Chaudhari, A., & Dhonde, S. B. (2015). A review on speech enhancement techniques. In 2015 International Conference on Pervasive Computing (ICPC) (pp. 272–275).

  • Chawla, M. P. S. (2011). PCA and ICA processing methods for removal of artifacts and noise in electrocardiograms: A survey and comparison. Applied Soft Computing, 11(2), 2216–2226.

    Google Scholar 

  • Chen, Z., & Hohmann, V. (2015). Online monaural speech enhancement based on periodicity analysis & a priori SNR estimation. IEEE/ACM Transactions on Audio, Speech, & Language Processing, 23(11), 1904–1916.

    Google Scholar 

  • Chmayssani, T., Baudoin, G., & Hendryckx, G. (2008). Secure communications through speech dedicated channels using digital modulations. In 2008 42nd Annual IEEE International Carnahan Conference on Security Technology (pp. 312–317).

  • Christiansen, T.U. Dau, T. Greenberg, S. (2007). Spectro-temporal processing of speech—An information-theoretic framework. In Hearing—From sensory processing to perception (pp. 59–523).

  • Cichocki, A., & Thawonmas, R. (2000). On-line algorithm for blind signal extraction of arbitrarily distributed, but temporally correlated sources using second order statistics. Neural Processing Letters, 12(1), 91–98.

    MATH  Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, & Signal Processing, 28(4), 357–366.

    Google Scholar 

  • Deshmukh, O. D., & Espy-Wilson, C. Y. (2007). Speech enhancement using the modified phase-opponency model. Journal of the Acoustical Society of America, 121(6), 3886–3898.

    Google Scholar 

  • Deshpande, G., Viraraghavan, V. S., Duggirala, M., Reddy, V. R., & Patel, S. (2017). Empirical evaluation of emotion classification accuracy for non-acted speech. In 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP).

  • Dey, N., Ashour, A. S., Shi, F., Fong, S. J., & Tavares, J. M. R. S. (2018). Medical cyber-physical systems: A survey. Journal of Medical Systems, 42(4), 1–13.

    Google Scholar 

  • Dhanj, S. & Eng, J.P. (2001). Artificial neural networks in speech processing: Problems & challenges. In 2001 IEEE Pacific Rim Conference on Communications, Computers & signal Processing. PACRIM (vol. 2, pp. 510–514).

  • Doi, H., Nakamura, K., Toda, T., Saruwatari, H., & Shikano, K. (2011). An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. In 2011 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP) (pp. 5136–5140).

  • Donahue, C., Li, B., & Prabhavalkar, R. (2018). Exploring speech enhancement with generative adversarial networks for robust speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2018.8462581

  • El-Solh, A. &Cuhadar, A. &Goubran, R. (2008). Evaluation of speech enhancement techniques for speaker identification in noisy environments. In Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007) (pp. 235–239).

  • Ephraim, Y., & Malah, D. (1983). Speech enhancement using optimal non-linear spectral amplitude estimation. ICASSP ’83. In IEEE International Conference on Acoustics, Speech, and Signal Processing. https://doi.org/10.1109/icassp.1983.1171938

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions of ASSP, 32(6), 1109–1121.

    Google Scholar 

  • Even, J., Saruwatari H., Shikano, K., Takatani, T. (2010). Speech enhancement in presence of diffuse background noise: Why using blind signal extraction. In 2010 IEEE International Conference on Acoustics, Speech & Signal Processing (pp. 4770–4774).

  • Faúndez-Zanuy, M. M., Esposito, S., Hussain, A., Schoentgen, J., Kubin, G., Kleijn, W. B., et al. (2002). Nonlinear speech processing: Overview & applications. Control & Intelligent Systems, 30(1), 1–9.

    Google Scholar 

  • Fakhri, M., Poorjam, A.H., Christensen, M.G. (2018). Speech enhancement by classification of noisy signals decomposed using NMF & Wiener filtering. In 2018 26th European Signal Processing Conference (EUSIPCO) (pp. 16–21).

  • Flamand, J., Le Bihan, N., Martin, A. V., & Manton, J. H. (2016). Low-resolution reconstruction of intensity functions on the sphere for single-particle diffraction imaging. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

  • Flynn, R., & Jones, E. (2008). Speech enhancement for distributed speech recognition in mobile devices. In 2008 Digest of Technical Papers—International Conference on Consumer Electronics (pp. 1459–1463).

  • Foth, M., Schroeter, R., & Ti, J. (2013). Opportunities of public transport experience enhancements with mobile services and urban screens. International Journal of Ambient Computing and Intelligence, 5(1), 1–18. https://doi.org/10.4018/jaci.2013010101.

    Article  Google Scholar 

  • Fu, Q. & Wan, E. (2003). Perceptual wavelet adaptive denoising of speech. In 8th European Conference on Speech Communication & Technology, Euro Speech 2003, September 1–4, 2003 (pp. 577–580).

  • Fukane, A. R., & Sahare, S. L. (2011). Enhancement of noisy speech signals for hearing aids. In 2011 International Conference on Communication Systems & Network Technologies (pp. 490–494).

  • Gabbay, A., Shamir, A. & Peleg, S. (2018). Visual speech enhancement. In Interspeech 2018 2–6 September 2018, Hyderabad (pp. 1–5).

  • Gao, D., & Zhao, X. (2013). A speech coding error control transmission scheme based on UEP for bandwidth-limited channels. In 2013 International Conference on Computational & Information Sciences (pp. 318–321).

  • Giacobello, D., Christensen, M. G., Dahl, J., Jensen, S., Moonen, M. (2005). Sparse linear predictors for speech processing. In Proceedings of the International Conference on Spoken Language Processing, 2008 (pp. 4–7).

  • Goalic, A., Trubuil, J., Lapierre, G., Labat, J. (2005). Real time low bit rate speech transmission through underwater acoustic channel. In Europe Oceans 2005, IEEE Xplore 03 October 2005 (pp. 319–321).

  • Goh, Z., Tan, K., & Tan, B. T. G. (1999). Kalman-filtering speech enhancement method based on a voiced-unvoiced speech model. IEEE Transactions on Speech & Audio Processing, 7(5), 510–524.

    Google Scholar 

  • Gupta, S., Khosravy, M., Gupta, N., & Darbari, H. (2019a). In-field failure assessment of tractor hydraulic system operation via pseudospectrum of acoustic measurements. Turkish Journal of Electrical Engineering & Computer Sciences, 27(4), 2718–2729.

    Google Scholar 

  • Gupta, S., Khosravy, M., Gupta, N., Darbari, H., & Patel, N. (2019b). Hydraulic system onboard monitoring and fault diagnostic in agricultural machine. Brazilian Archives of Biology and Technology. https://doi.org/10.1590/1678-4324-2019180363.

    Article  Google Scholar 

  • Hong Kook, K., & Cox, R. (2000).Bitstream-based feature extraction for wireless speech recognition. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat.No.00CH37100).

  • Hou, J.C., Wang, S.S., Lai, Y.H., Lin, J.C., Tsao, Y., Chang, H.W., & Wang, H.M. (2016). Audio-visual speech enhancement using deep neural networks. In 2016 Asia-Pacific Signal & Information Processing Association Annual Summit & Conference (APSIPA) (pp. 16–21).

  • Lee, H., Hu, T., Jing, H., Chang, Y., Tsao, Y., Kao, Y., & Pao, T. (2013). Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition. INTERSPEECH.

  • Hu, Y., & Loizou, P. C. (2004a). Incorporating a psycho acoustical model in frequency domain speech enhancement. IEEE Signal Processing Letters, 11(2), 270–273.

    Google Scholar 

  • Hu, Y., & Loizou, P. C. (2004b). Speech enhancement based onwavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio Processing, 12(1), 59–67. https://doi.org/10.1109/tsa.2003.819949.

    Article  Google Scholar 

  • Huang, H., Lee, T., Kleijn, W. B., & Kong, Y.-Y. (2015). A method of speech periodicity enhancement using transform-domain signal decomposition. Speech Communication, 67, 102–112.

    Google Scholar 

  • Islam, M. T., Shahnaz, C., & Fattah, S. A. (2014). Speech enhancement based on a modified spectral subtraction method. In 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS).

  • Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions of SAP, 11(6), 700–708.

    Google Scholar 

  • Jalil, M., Butt, F. A., & Malik, A. (2013). Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals. In 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE) (pp. 208–212).

  • Jiang, Y., & Liu, R. (2017). A dual microphone speech enhancement method with a smoothing parameter mask. In 2017 10th International Congress on Image & Signal Processing, BioMedical Engineering & Informatics (CISP-BMEI) (pp. 386–391).

  • Jiang Y., Lu, X., Zu Y., Zhou, H. (2013). Classification-based close talk speech enhancement. In 2013 3rd International Conference on Consumer Electronics, Communications & Networks, 20–22 Nov. 2013 (pp. 192–197).

  • Johnstone, I. M., & Silverman, B. W. (1997). Wavelet threshold estimators for data with correlated noise. Journal of Royal Statistical Society, 59(2), 319–351.

    MathSciNet  MATH  Google Scholar 

  • Kalamani, M., Valarmathy, S., Poonkuzhali, C., Catherine, J.N. (2014). Feature selection algorithms for automatic speech recognition. In 2014 International Conference on Computer Communication & Informatics (pp. 2352–2356).

  • Kamper, H., Jansen, A., King, S., & Goldwater, S. (2014). Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings. In 2014 IEEE Spoken Language Technology Workshop (SLT). https://doi.org/10.1109/slt.2014.7078557

  • Karjol, P., Kumar, M.A., Ghosh, P.K. (2018). Speech enhancement using multiple deep neural networks. In 2018 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP) (pp. 5049–5054).

  • Kesarkar, M. P. (2003). Feature extraction for speech recogntion, M.Tech. Credit seminar report, Electronic Systems Group, EE. Dept, IIT Bombay, November, 2003.

  • Khosravy, M., Asharif, M. R., & Yamashita, K. (2010). A theoretical discussion on the foundation of Stone’s blind source separation. Signal, Image and Video Processing, 5(3), 379–388.

    Google Scholar 

  • Khosravy, M., Gupta, N., Marina, N., Asharif, M. R., Asharif, F., & Sethi, I. K. (2015). Blind components processing a novel approach to array signal processing: A research orientation. In 2015 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS).

  • Kobayashi, K., & Toda, T. (2018). Electrolaryngeal speech enhancement with statistical voice conversion based on CLDNN. In 2018 26th European Signal Processing Conference (EUSIPCO) (pp. 1–5).

  • Koniaris, C., Chatterjee, S., & Kleijn, W. B. (2010). Selecting static and dynamic features using an advanced auditory model for speech recognition. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/icassp.2010.5495648

  • Kopparapu, S. K. (2009). A robust speech biometric system for vehicle access. In 2009 IEEE International Conference on Vehicular Electronics & Safety (ICVES) (pp. 174–177).

  • Krishnamoorthy, P., Mahadeva Prasanna, S. R. (2008). Temporal & spectral processing of degraded speech. In 16th International Conference on Advanced Computing & Communications (pp. 9–14).

  • Kulkarni, N., & Bairagi, V. (2018). Use of complexity features for diagnosis of Alzheimer disease. In EEG-Based Diagnosis of Alzheimer Disease (pp. 47–59). https://doi.org/10.1016/b978-0-12-815392-5.00004-6

  • Lai, Y.-H., Su, Y.-C., Tsao, Y., & Young, S.-T.(2013). Evaluation of generalized maximum a posteriori spectral amplitude (GMAPA) speech enhancement algorithm in hearing aids. In 2013 IEEE International Symposium on Consumer Electronics (ISCE) (pp. 245–248).

  • Lee, S., & Lee, G. (2016). Noise estimation and suppression using nonlinear function with A Priori speech absence probability in speech enhancement. Journal of Sensors, 2016, 1–7. https://doi.org/10.1155/2016/5352437.

    Article  Google Scholar 

  • Leng, X., Chen, J., Benesty, J., Cohen, I. (2018). On speech enhancement using microphone arrays in the presence of co-directional interference. In 2018 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP) (pp. 675–680).

  • Li, H., Mäntymäki, M., & Zhang, X. (2014). Digital services and information intelligence. IFIP Advances in Information and Communication Technology. https://doi.org/10.1007/978-3-662-45526-5.

    Article  Google Scholar 

  • Li, W. (2008). Effective post-processing for single-channel frequency-domain speech enhancement. In 2008 IEEE International Conference on Multimedia & Expo (pp. 149–157).

  • Ma, R., Liu, G., Hao, Q., & Wang, C. (2017). Smart microphone array design for speech enhancement in financial VR & AR. In 2017 IEEE SENSORS (pp. 1012–1017).

  • Maina, C., & Walsh, J. M. (2011). Joint speech enhancement & speaker identification using approximate bayesian inference. IEEE Transactions on Audio, Speech, & Language Processing, 19(6), 1517–1529.

    Google Scholar 

  • Malathi, P., Sureshw, G. R., & Moorthi, M. (2018). Enhancement of electrolaryngeal speech using Frequency auditory masking & GMM based voice conversion. In 2018 Fourth International Conference on Advances in Electrical, Electronics, Information, Communication & Bio-Informatics (AEEICB) (pp. 978–981).

  • Manohar, K., & Rao, P. (2006). Speech enhancement in nonstationary noise environments using noise properties. Speech Communication, 48, 96–109.

    Google Scholar 

  • Manolov, A., Boumbarov, O., Manolova, A., Poulkov, V., Tonchev, K. (2017). Feature selection in affective speech classification. In 40th International Conference on Telecommunications & Signal Processing (TSP) (pp. 354–359).

  • Marchi, E., Ferroni, G., Eyben, F., Gabrielli, L., Squartini, S., & Schuller, B. (2014). Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

  • Matheja, T., Buck, M., & Fingscheidt, T. (2013). A dynamic multi-channel speech enhancement system for distributed microphones in a car environment. EURASIP Journal on Advances in Signal Processing, 2013(1), 144–149. https://doi.org/10.1186/1687-6180-2013-191.

    Article  Google Scholar 

  • Modhave, N., Karuna, Y., &Tonde, S. (2016). Design of matrix wiener filter for noise reduction & speech enhancement in hearing aids. In 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) (pp. 843–847).

  • Modhave, N., Karuna, Y., & Tonde, S. (2016). Design of multichannel wiener filter for speech enhancement in hearing aids & noise reduction technique. In 2016 Online International Conference on Green Engineering & Technologies (IC-GET) (pp. 556–559).

  • Mporas, I. Ganchev, T., Kocsis, O., Fakotakis, N. (2011). Dynamic selection of a speech enhancement method for robust speech recognition in moving motorcycle environment. In 2011 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP) (pp. 5176–5180).

  • Mustière, F., Bouchard M. & Bolić, M. (2010). Bandwidth extension for speech enhancement. In CCECE (pp. 76–84).

  • Nabi, W., Aloui, N., &Cherif, A. (2016). An improved speech enhancement algorithm based on wavelets for mobile communication. In 2016 2nd International Conference on Advanced Technologies for Signal & Image Processing (ATSIP) (pp. 622–626).

  • Nakanishi, I., Nagata, Y., Itoh, Y., Fukui, Y. (2006). Single-channel speech enhancement based on frequency domain ALE. In 2006 IEEE International Symposium on Circuits & Systems (pp. 389–393).

  • Nakatani, T., Araki, S., Yoshioka, T., Delcroix, M., & Fujimoto, M. (2013). Dominance based integration of spatial & spectral features for speech enhancement. IEEE Transactions on Audio, Speech, & Language Processing, 21(12), 2516–2531.

    Google Scholar 

  • Nesbitt, D., Crookes, D., & Ji, M. (2018). Speech segment clustering for real-time exemplar-based speech enhancement. In 2018 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP) (pp. 5419–5423).

  • Ortega-Garcia, J., Gonzalez-Rodriguez, J. (1996). Overview of speech enhancement techniques for automatic speaker recognition. Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 (pp. 929–933).

  • Paliwal, K. K. (2003). Usefulness of phase in speech processing. In Proceedings IPSJ Spoken Language Processing Workshop (pp. 1–6).

  • Panahi, I., Kehtarnavaz, N., & Thibodeau, L. (2016). Smartphone-based noise adaptive speech enhancement for hearing aid applications. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 85–89).

  • Panahi, I. M., Reddy, C. K. A., & Thibodeau, L. (2017). Noise suppression & speech enhancement for hearing aid applications using smartphones. In 2017 51st Asilomar Conference on Signals, Systems, & Computers (pp. 1890–1894).

  • Pandey, A., Wang, D. L., & Fellow, I. E. E. E. (2019). A new framework for CNN-based speech enhancement in the time domain. IEEE Transactions on Audio, Speech, & Language Processing, 27(7), 1179.

    Google Scholar 

  • Parchami, M., Zhu, W. P., Champagne, B., & Plourde, E. (2016). Recent developments in speech enhancement in the short-time fourier transform domain. IEEE Circuits & Systems Magazine, 16(3), 45–77.

    Google Scholar 

  • Pascual, S., Serra, J., & Bonafonte, A. (2019). Time-domain speech enhancement using generative adversarial networks. Speech Communication, 114, 10–21.

    Google Scholar 

  • Petrovie, P.M. (1985). Digitized speech transmission through Vhf Fm repeaters. In 35th IEEE Vehicular Technology Conference (pp. 205–210).

  • Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, & Language Processing, 14(6), 2098–2108.

    Google Scholar 

  • Prabhu, C., Chellappan, C., & Ramachandran, B. (2012). Conference management & speech enhancement for multiparty video conference over the MPLS Networks. Information Technology Journal, 11(1), 85–93.

    Google Scholar 

  • Premananda, B. S., & Uma, B. V. (2013). Speech enhancement algorithm to reduce the effect of background noise in mobile phones. International Journal of Wireless & Mobile Networks (IJWMN), 5(1), 177–189.

    Google Scholar 

  • Priyanka, S.S. (2017). A review on adaptive beamforming techniques for speech enhancement. In International Conference on Innovations in Powerand Advanced Computing Technologies [i-PACT2017] (pp. 1–6).

  • Purushotham, U,. Suresh, K. (2016). Feature extraction in enhancing speech signal for mobile communication. In 2016 1st India International Conference on Information Processing (IICIP) (pp. 978–983).

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.

    Google Scholar 

  • Rezvani, M., Kahaei, M.H. (2015). Speech enhancement using transient components in frequency domain. In 2015 23rd Iranian Conference on Electrical Engineering (pp. 164–170).

  • Sadjadi, S.O. & Hansen, J.H.L. (2010). Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26–30, 2010 (pp. 2138–2141).

  • Sahu, P. K., & Ganesh, D. S. (2015).A study on automatic speech recognition toolkits. In 2015 International Conference on Microwave, Optical and Communication Engineering (ICMOCE). doi:10.1109/icmoce.2015.7489768

  • Saki, F. & Kehtarnavaz, N. (2016). Automatic switching between noise classification & speech enhancement for hearing aid devices. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 736–740)

  • Santos, E., Khosravy, M., Lima, M. A., Cerqueira, A. S., Duque, C. A., & Yona, A. (2019). High accuracy power quality evaluation under a colored noisy condition by filter bank ESPRIT. Electronics, 8(11), 1259.

    Google Scholar 

  • Santosh, K. C., Borra, S., Joshi, A., & Dey, N. (2019). Advances in speech, music and audio signal processing. International Journal of Speech Technology, 22(2), 293–296.

    Google Scholar 

  • Sarria-Paja, M., Senoussaoui, M., & Falk, T. H. (2015). The effects of whispered speech on state-of-the-art voice based biometrics systems. In 2015 IEEE 28th Canadian Conference on Electrical & Computer Engineering (CCECE) (pp. 1254–1259).

  • Sasaoka, N., Shimada, K., Sonobe, S., Itoh, Y., & Fujii, K. (2009). Speech enhancement based on adaptive filter with variable step size for wideband and periodic noise. In: 2009 52nd IEEE International Midwest Symposium on Circuits and Systems. https://doi.org/10.1109/mwscas.2009.5236011.

  • Scalart, P. & Vieira-Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of IEEE ICASSP’96, Atlanta, GA, May 1996 (pp. 629–632).

  • Sedani, B. S., Kotak, N. A., Borisagar, K. R., & Kulkarni, G. R. (2012).Implementation & Performance analysis of efficient wireless channels in WiMAX using image & speech transmission. In 2012 International Conference on Communication Systems & Network Technologies (pp. 630–634).

  • Sen, S., Dutta, A., Dey, N. (2019). Audio indexing. In Audio processing and speech recognition. SpringerBriefs in applied sciences and technology (pp. 1–11). Singapore: Springer

  • Sen, S., Dutta, A., Dey, N. (2019), Speech processing and recognition system. In Audio processing and speech recognition. SpringerBriefs in applied sciences and technology (pp. 13–43). Singapore: Springer.

  • Sen S., Dutta A., Dey, N. (2019) Audio classification. In Audio processing and speech recognition. SpringerBriefs in applied sciences and technology (pp. 67–93). Singapore: Springer.

  • Sharma, U., Maheshkar, S., Mishra, A. N. (2015). Study of robust feature extraction techniques for speech recognition system. In 2015 International Conference on Futuristic Trends on Computational Analysis & Knowledge Management (ABLAZE) (pp. 654–659).

  • Shen, L., Zheng, N., Zheng, S., & Li, W. (2010). Secure mobile services by face & speech based personal authentication. In 2010 IEEE International Conference on Intelligent Computing & Intelligent Systems (pp. 97–100).

  • Shrawankar, U. & Thakare, V. (2010). Noise estimation & noise removal techniques for speech recognition in adverse environment, ifip international federation for information processing 1310. In IIP 1310, IFIP AICT 340 (pp. 336–342).

  • Shukla, A., Tiwari, R., & Rathore, C. P. (2010). Neuro-fuzzy-based biometric system using speech features. International Journal of Biometrics, 2(4), 391–406.

    Google Scholar 

  • Shujau, M., Ritz, C. H., & Burnett, I. S. (2010). Speech enhancement via separation of sources from co-located microphone recordings. In 2010 IEEE International Conference on Acoustics, Speech & Signal Processing (pp. 137–140).

  • Soliman, N. F., Mostfa, Z., El-Samie, F. E. A., & Abdalla, M. I. (2017). Performance enhancement of speaker identification systems using speech encryption & cancelable features. International Journal of Speech Technology, 20(9), 977–1004.

    Google Scholar 

  • Srinonchat, J. (2005). Improvement of the clustering technique to design a codebook in speech coding. In 2005 5th International Conference on Information Communications & Signal Processing (pp. 833–837).

  • Thomas, S., Ganapathy, S., & Hermansky, H. (2008). Recognition of reverberant speech using frequency domain linear prediction. IEEE Signal Processing Letters, 15, 681–684.

    Google Scholar 

  • Thulasimani, L. (2012). Text dependent speech based biometric for mobile security. International Journal of Computer Applications, 51(17), 35–40.

    Google Scholar 

  • Toda, T. (2014). Augmented speech production based on real-time statistical voice conversion. In 2014 IEEE Global Conference on Signal & Information Processing (GlobalSIP) (pp. 592–597).

  • Treichler, J., & Agee, B. (1983). A new approach to multipath correction of constant modulus signals. IEEE Transactions on Acoustics, Speech, and Signal Processing, 31(2), 459–472.

    Google Scholar 

  • Tu, M. & Zhang, X. (2017). Speech enhancement based on deep neural networks with skip connections. In 2017 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP) (pp. 5565–5570).

  • Vijayan, K. Xiaoxue, G. Li, H. (2018). Analysis of speech & singing signals for temporal alignment. In Conference: Asia-Pacific Signal & Information Processing Association Annual Summit & Conference (pp. 1–5).

  • Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 7(2), 126–137. https://doi.org/10.1109/89.748118.

    Article  Google Scholar 

  • Vu, N.-V., Ye, H., Whittington, J., Devlin, J., & Mason, M. (2010). Small footprint implementation of dual-microphone delay-and-sum beamforming for in-car speech enhancement. In 2010 IEEE International Conference on Acoustics, Speech & Signal Processing (pp. 1482–1485).

  • Wan, E. A. and van der Merwe, R. (2001). Kalman filtering and neural networks. In Adaptive and learning systems for signal processing, communications, and control. Wiley, 2001, ch. 7—The Unscented Kalman Filter (pp. 221–280).

  • Wang, D., Fan, Z., & Li, B. (2010). An adaptive beamforming method based on post-multistage wiener filter for the speech enhancement. In 2010 2nd International Conference on Signal Processing Systems (ICSPS) (pp. 360–362).

  • Xu, Y., Du, J., Li-Rong, D., & Lee, C.-H. (2014). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters, 21(1), 65–68.

    Google Scholar 

  • Yamin, M., & Sen, A. A. A. (2018). Improving privacy and security of user data in location based services. International Journal of Ambient Computing and Intelligence, 9(1), 19–42. https://doi.org/10.4018/ijaci.2018010102.

    Article  Google Scholar 

  • Yan, Z., Zhenmin, T., Yanping, L. (2009). Combining speech enhancement & discriminative feature extraction for robust speaker recognition. In 2009 WRI World Congress on Computer Science & Information Engineering (pp. 274–279).

  • Yelwande, A., Kansal, S., & Dixit, A. (2017). Adaptive wiener filter for speech enhancement. In 2017 International Conference on Information, Communication, Instrumentation and Control (ICICIC). doi:10.1109/icomicon.2017.8279110

  • Yoshizawa, T., Hirobayashi, S. & Misawa, T. (2011). Noise reduction for periodic signals using high-resolution frequency analysis. In EURASIP Journal on Audio, Speech, and Music Processing volume, 2011, 5 (2011) (pp. 1–19).

  • Yu, C., & Su, L. (2015). Speech enhancement based on the generalized sidelobe cancellation & spectral subtraction for a microphone array. In 2015 8th International Congress on Image & Signal Processing (CISP) (pp. 1318–1323).

  • Yu, H., Ouyang, Z., Zhu, W.P., Champagne, B. & Ji, Y. (2019). A deep neural network based Kalman filter for time domain speech enhancement. In 2019 IEEE International Symposium on Circuits & Systems (ISCAS) (pp. 397–403).

  • Yu, W., He, H., & Zhang, N. (Eds.). (2009). A probabilistic short-length linear predictability approach to blind source separation. In 23rd International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC 2008), Yamaguchi, Japan; Advances in Neural Networks—ISNN 2009. Lecture Notes in Computer Science.

  • Zhang, E., Antoni, J., Dong, B., & Snoussi, H. (2012). Bayesian space-frequency separation of wide-band sound sources by a hierarchical approach. TheJournal of the Acoustical Society of America, 132(5), 3240–3250. https://doi.org/10.1121/1.4754530.

    Article  Google Scholar 

  • Zhang, L., & Zhang, B. (1999). A geometrical representation of McCulloch–Pitts neural model and its applications. IEEE Transactions on Neural Networks, 10(4), 925–928.

    Google Scholar 

  • Zhang, S., Shao, F., & Yu, Y. (2009). Unequal error protection of MELP compressed speech based on plotkin type LDPC code. In 2009 WRI International Conference on Communications & Mobile Computing (pp. 166–169). https://doi.org/10.1109/cmc.2009.94.

  • Zhang, Q., Wang, M., & Zhang, L. (2017). A robust speech enhancement method based on microphone array. In 2017 IEEE 17th International Conference on Communication Technology (ICCT) (pp. 1673–1678).

  • Zhao, Q., Yang, Y., & Li, H. (2014). A novel and efficient voice activity detector using shape features of speech wave. In Lecture Notes in Computer Science (pp. 375–384). https://doi.org/10.1007/978-3-319-12484-1_42

  • Zhou, H, Sadka, A. & Richard M. J. (2008). Speech enhancement in noisy environmets for video retrieval. In 9th International Workshop on Image Analysis for Multimedia Interactive Services. IEEE, AUT (pp. 197–200).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabanita Das.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, N., Chakraborty, S., Chaki, J. et al. Fundamentals, present and future perspectives of speech enhancement. Int J Speech Technol 24, 883–901 (2021). https://doi.org/10.1007/s10772-020-09674-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09674-2

Keywords

Navigation