Skip to main content
Log in

Blind signal separation with Noise Reduction for efficient speaker identification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Generally, most blind signal separation algorithms deal with the separation problem in the absence of noise. The presence of noise degrades the performance of separated signals. This paper deals with the problem of blind separation of audio signals from noisy mixtures. Blind signal separation algorithm is applied on the discrete cosine transform, the discrete sine transform or the discrete wavelet transform of the mixed signals, instead of performing the separation on the mixtures in the time domain. All of these transforms have an energy compaction property, which concentrates most of the signal energy in a few coefficients in the transform domain, leaving most of the transform-domain coefficients close to zero. As a result, the separation is performed on a few coefficients in the transform domain. Another advantage of signal separation in transform domains is that the effect of noise on the signals in the transform domains is smaller than that in the time domain. The paper presents also an investigation of the rule of the speech enhancement techniques as pre- and post-processing steps for the blind signal separation process, instead of performing the separation on the mixtures in the time domain. The considered speech enhancement techniques are the spectral subtraction, the Wiener filtering, the adaptive Wiener filtering, and the wavelet denoising techniques. Both blind signal separation and noise reduction are applied within a real speaker identification system to reduce the effect of interference and noise on the system performance. The simulation results confirm the superiority of transform domain separation to time domain separation and the importance of the wavelet denoising technique, when used as a pre-processing step for noise reduction. Moreover, the speaker identification system performance is enhanced with blind signal separation and noise reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Abd El-Fattah, M. A., Dessouky, M. I., Diab, S. M., & Abd El-Samie, F. E. (2008). Speech enhancement using an adaptive wiener filtering approach. Progress in Electromagnetics Research M, 4, 167–184.

    Article  Google Scholar 

  • Beerends, J. G., Buuren, R. V., Vugt, J. V., & Verhave, J. (2009). Objective speech intelligibility measurement on the basis of natural speech in combination with perceptual modeling. Journal of the Audio Engineering Society, 57(5), 299–308.

    Google Scholar 

  • Chan, D. C. (1997). Blind signal separation. A PhD dissertation. University of Cambridge.

  • Curnew, S. R., & How, J. (2007). Blind signal separation in MIMO OFDM systems using ICA and fractional sampling. In International symposium on signals, systems and electronics (pp. 67–70). ISSSE ‘07.

  • Dam, H. H., Nordholm, S., Low, S. Y., & Cantoni, A. (2007). Blind signal separation using steepest descent method. IEEE Trans Signal Processing, 55(8), 4198–4207.

    Article  MathSciNet  Google Scholar 

  • Debals, O., Van Barel, M., & De Lathauwer, L. (2016). Löwner-based blind signal separation of rational functions with applications. IEEE Transactions on Signal Processing, 64(8), 1909–1918.

    Article  MathSciNet  Google Scholar 

  • Deller, J. R., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-time processing of speech signals (2nd ed.). New York: IEEE Press.

    Google Scholar 

  • Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on audio, Speech, and Language Processing, 16(6), 1097–1111.

    Article  Google Scholar 

  • Gupta, V. K., Chandra, M., & Sharan, S. N. (2013). Acoustic echo and noise cancellation system for hand-free telecommunication using variable step size algorithms. Radioengineering, 22(1), 200–207.

    Google Scholar 

  • Hayati, M., Shirvany, Y. (2007). Artificial neural network approach for short term load forecasting for Illam Region. In Processing of world academy of science, engineering and technology (Vol. 22).

  • Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014). Deep learning for monaural speech separation. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1562–1566). IEEE.‏

  • Jensen, J., & Hansen, J. H. (2001). Speech enhancement using a constrained iterative sinusoidal model. IEEE Transactions on Speech and Audio Processing, 9(7), 731–740.

    Article  Google Scholar 

  • Keighrey, C., Flynn, R., Murray, S., & Murray, N. (2017). A QoE evaluation of immersive augmented and virtual reality speech & language assessment applications. In 2017 ninth international conference on quality of multimedia experience (QoMEX) (pp. 1–6). IEEE.‏

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  • Kleijn, W. B., Lim, F. S., Luebs, A., Skoglund, J., Stimberg, F., Wang, Q., & Walters, T. C. (2018). Wavenet based low rate speech coding. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 676–680). IEEE.‏

  • Kozlowski, S. W. (2015). Advancing research on team process dynamics: Theoretical, methodological, and measurement considerations. Organizational Psychology Review, 5(4), 270–299.

    Article  Google Scholar 

  • Manmontri, U., & Naylor, P. A. (2008). A class of frobenius norm-based algorithms using penalty term and natural gradient for blind signal separation. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1181–1193.

    Article  Google Scholar 

  • Moreau, E., Pesquet, J. C., & Thirion-Moreau, N. (2007). Convolutive blind signal separation based on asymmetrical contrast functions. IEEE Transactions on Signal Processing, 55(1), 356–371.

    Article  MathSciNet  Google Scholar 

  • Pillai, S., Madhavan. (2006). Robust speaker identification using artificial neural network. Dissertation of Master degree in computer science, University of Nevada, Las Vegas.

  • Prochazka, A., Uhlir, J., Rayner, P. J. W., & Kingsbury, N. J. (1998). Signal analysis and prediction. Switzerland: Birkhauser Inc.

    Book  Google Scholar 

  • Pullella, D. (2006). Speaker identification using higher order spectra. Dissertation of Bachelor of Electrical and Electronic Engineering, University of Western Australia.

  • Ramakrishnan, A. G., Abhiram, B., & Mahadeva Prasanna, S. R. (2015). Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. The Journal of the Acoustical Society of America, 137(6), EL469–EL475.

    Article  Google Scholar 

  • Rao, K. R., & Yip, P. (2014). Discrete cosine transform: Algorithms, advantages, applications. Boston: Academic Press.

    MATH  Google Scholar 

  • Sadhu, A., Narasimhan, S., & Antoni, J. (2017). A review of output-only structural mode identification literature employing blind source separation methods. Mechanical Systems and Signal Processing, 94, 415–431.

    Article  Google Scholar 

  • Unser, M., & Van De Ville, D. (2008). The pairing of a wavelet basis with a mildly redundant analysis via subband regression. IEEE Transactions on Image Processing, 17(11), 2040–2052.

    Article  MathSciNet  Google Scholar 

  • Upadhyay, N., & Karmakar, A. (2015). Speech enhancement using spectral subtraction-type algorithms: A comparison and simulation study. Procedia Computer Science, 54, 574–584.

    Article  Google Scholar 

  • Valentini-Botinhao, C., Wu, Z., & King, S. (2015). Towards minimum perceptual error training for DNN-based speech synthesis. In Sixteenth annual conference of the international speech communication association.‏

  • Yang, W., Benbouchta, M., & Yantorno, R. (1998). Performance of the modified bark spectral distortion as an objective speech quality measure. In Proceedings of the IEEE international conf. on acoustic, speech and signal processing (ICASSP) (Vol. 1, pp. 541–544), Washington, USA.

  • Zheng-you, H., Xiaoqing, C., & Guoming, L. (2006). Wavelet entropy measure definition and its application for transmission line fault detection and identification; (Part I: Definition and methodology). In International conference on power system technology (pp. 1–6). PowerCon 2006.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Walid El-Shafai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hammam, H., El-Shafai, W., Hassan, E. et al. Blind signal separation with Noise Reduction for efficient speaker identification. Int J Speech Technol 24, 235–250 (2021). https://doi.org/10.1007/s10772-019-09641-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-019-09641-6

Keywords

Navigation