A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

Shoba, S.; Rajavel, R.

doi:10.1007/s12652-019-01309-y

A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

Original Research
Published: 06 May 2019

Volume 11, pages 433–446, (2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

S. Shoba¹ &
R. Rajavel¹

205 Accesses
13 Citations
Explore all metrics

Abstract

This research work proposes a new Genetic Algorithm (GA) based fusion scheme to effectively fuse the Time–Frequency (T–F) binary mask of voiced and unvoiced speech. The perceptual cues such as correlogram, cross-correlogram and pitch are commonly used to obtain the T–F binary mask of voiced speech. Recently, researchers use speech onset and offset to segment the unvoiced speech from the noisy speech mixture. Most of the research work which uses speech onset and offset to represent the unvoiced speech, combine the segments of unvoiced speech with the segments of voiced speech to obtain the T–F binary mask. This research work effectively fuses the T–F binary mask of voiced and unvoiced speech, instead of combining the segments of voiced and unvoiced speech using a Genetic Algorithm (GA). Moreover, a new method is proposed in this research work to obtain a T–F binary mask from the segments of unvoiced speech. The performance of the proposed GA based fusion scheme is evaluated using measures such as quality and intelligibility. The experimental results show that the proposed system enhances the speech quality by increasing the SNR with an average value of 10.74 dB and decreases the noise residue with an average value of 26.15% when compared with noisy speech mixture and enhances the speech intelligibility by increasing the CSII, NCM and STOI with an average value of 0.22, 0.20 and 0.17 as compared with the conventional speech segregation systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments

Article 20 December 2018

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Article 29 August 2023

Improving the Performance of Noise Reduction in Hearing Aids Based on the Genetic Algorithm

References

Boll SF (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Audio Speech Signal Process 27:113–120
Article Google Scholar
Brown GJ, Cooke MP (1994) Computational auditory scene analysis. Comput Speech Lang 8:297–336
Article Google Scholar
Brown GJ, Wang DL (2005) Separation of speech by computational auditory scene analysis. In: Benesty J, Makino S, Chen J (eds) Speech enhancement. Springer, Berlin, pp 371–402
Chapter Google Scholar
Cooke MP (1993) Modeling auditory processing and organization. Dissertation, University of Sheffield, UK
Dharmalingam M, JohnWiselin MC (2017) CASA for improving speech intelligibility in monaural speech separation. Int J Perform Eng 13(3):259–263
Google Scholar
Donald S, Wang D (2017) Time-frequency masking in the complex domain for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 25(7):1492–1501
Article Google Scholar
Ephraim Y, Trees HL (1995) A signal subspace approach for speech enhancement. IEEE Trans Speech Audio Process 3:251–266
Article Google Scholar
Ellis DPW, Weiss RJ (2006) Model-based monaural source separation using a vector-quantized phase-vocoder representation. In Proceedings on IEEE international conference on acoust speech and signal processing (ICASSP,) pp 957–960
Gibak K, Loizou PC (2010) Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Trans Audio Speech Lang Process 18(8):2080–2090
Article Google Scholar
Harish N, Rajavel R (2014) Monaural speech separation system based on optimum soft mask. IEEE Int Conf Comput Intell Comput Res. https://doi.org/10.1109/ICCIC.2014.7238420
Article Google Scholar
Hu G, Wang D (2006) An auditory scene analysis approach to monaural speech segregation. In: Hansler E, Schmidt G (eds) Topics in acoustic echo and noise control. Springer, New York, pp 485–515
Google Scholar
Hu G, Wang D (2007) Auditory segmentation based on onset and offset analysis. IEEE Trans Audio Speech Lang Process 15(2):396–405
Article Google Scholar
Hu K, Wang D (2011) Unvoiced speech segregation from non-speech interference via CASA and spectral subtraction. IEEE Trans Audio Speech Lang Process 19(6):1600–1609
Article Google Scholar
Hu K, Wang D (2004) Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans Neural Netw 15(5):1135–1150
Article Google Scholar
Hu Y, Loizou PC (2007) A comparative intelligibility study of speech enhancement algorithms. In: Proceedings of IEEE international conference on acoustics speech and signal processing (ICASSP), pp 561–564
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New York
Book Google Scholar
Jensen J, Hansen HL (2001) Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans Speech Audio Process 9:731–740
Article Google Scholar
Hu K, Wang D (2013) An unsupervised approach to cochannel speech separation. IEEE Trans Audio Speech Lang Process 21(1):122–131
Article MathSciNet Google Scholar
Yi-nan L, xiong-wei Zhang, Zeng L, Huang JJ (2014) An improved monaural speech enhancement algorithm based on sparse dictionary learning. J Signal Process 30(1):44–50
Google Scholar
Ma J, Hu Y, Loizou P (2009) Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J Acoust Soc Am 125(5):3387–3405
Article Google Scholar
Meddis R (1988) Simulation of auditory-neural transduction: further studies. J Acoust Soc Am 83(3):1056–1063
Article Google Scholar
Naik R, Ganesh R, Wang W (2012) Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions. Int J Electron 99(10):1333–1350
Article Google Scholar
Naik R, Ganesh R (2012) Measure of quality of source separation for sub and super-Gaussian audio mixtures. Informatica 23(4):581–599
MathSciNet MATH Google Scholar
Nilesh M, Ann S et al (2013) The potential for speech intelligibility improvement using the ideal binary mask and the ideal Wiener filter in single channel noise reduction systems: application to auditory prostheses. IEEE Trans Audio Speech Lang Process 21(1):63–72
Article Google Scholar
Patterson RD, Nimmo-Smith I, Holdsworth J et al (1988) An efficient auditory filterbank based on the gammatone function. MRC Applied Psych Unit
Phapatanaburi K, Wang L, Oo Z et al (2017) Noise robust voice activity detection using joint phase and magnitude based feature enhancement. J Ambient Intell Hum Comput 8(6):845–859
Article Google Scholar
Pichevar R, Rouat J (2005) A quantitative evaluation of a bio-inspired sound segregation technique for two and three-source mixtures. In: Chollet G, Esposito A, Faundez-Zanuy M, Marinaro M (eds) Nonlinear speech modeling and applications, vol 3445. Lecture notes in computer science. Springer, Berlin, pp 430–435
Chapter Google Scholar
Qazi KA, Nawaz T, Mehmood Z, Rashid M, Habib HA (2018) A hybrid technique for speech segregation and classification using a sophisticated deep neural network. PLoS ONE 13(3):e0194151. https://doi.org/10.1371/journal.pone.0194151
Article Google Scholar
Rajavel R, Sathidevi PS (2012) Adaptive reliability measure and optimum integration weight for decision fusion audio-visual speech recognition. J Signal Process System 68(1):83–93
Article Google Scholar
Rajavel R, Sathidevi PS (2011) A new GA optimised reliability ratio based integration weight estimation scheme for decision fusion audio-visual speech recognition. Int J Signal Imaging Syst Eng 4(2):123–131
Article Google Scholar
Sameti H, Sheikhzadeh H, Deng L, Brennan RL (1998) HMM-based strategies for enhancement of speech signals embedded in non-stationary noise. IEEE Trans Speech Audio Process 6:445–455
Article Google Scholar
Shoba S, Rajavel R (2017) Adaptive energy threshold selection for monaural speech separation. In: International conference on communication and signal processing (ICCSP), India, pp 905–908
Shoba S, Rajavel R (2017) Image processing techniques for segments grouping in monaural speech separation. Circ Syst Signal Process 37(8):3651–3670
Article MathSciNet MATH Google Scholar
Shoba S, Rajavel R (2018) Improving speech intelligibility in monaural segregation system by fusing voiced and unvoiced speech segments circuits systems and signal process. Circ Syst Signal Process. https://doi.org/10.1007/s00034-018-1005-3
Article MATH Google Scholar
Shoba S, Rajavel R (2018) Performance improvement of monaural speech separation system using image analysis techniques. IET Signal Process 12(7):896–906
Article MATH Google Scholar
Singhal S, Passricha V, Sharma P et al (2018) Multi-level region-of-interest CNNs for end to end speech recognition. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-018-1146-z
Article Google Scholar
Taal CH, Hendriks RC, Heusdens R et al (2011) An algorithm for intelligibility prediction of time frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136
Article Google Scholar
Therese SS, Lingam C (2017) A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-017-0653-7
Article Google Scholar
Trowitzsch Ivo (2017) Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans Audio Speech Lang Process 25(6):1344–1356
Article Google Scholar
Wang DL, Kun H (2013) Towards generalizing classification based speech separation. IEEE Trans Audio Speech Lang Process 21(1):68–77
Google Scholar
Wang D (2012) Tandem algorithm for pitch estimation and voiced speech segregation. IEEE Trans Audio Speech Lang Process 18(8):2067–2079
Google Scholar
Wang DL, Brown GJ (1999) Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans Neural Netw 10:684–697
Article Google Scholar
Wang Y, Lin J, Chen N, Yuan W (2013) Improved monaural speech segregation based on computational auditory scene analysis. J Audio Speech Music Process. https://doi.org/10.1186/1687-4722-2013-2
Article Google Scholar
Weintraub M (1985) A theory and computational model of auditory monaural sound separation. Ph.D. dissertation, Dept Elect Eng, Stanford University
Yu J, Xie L, Xiao X et al (2017) A hybrid neural network hidden Markov model approach for automatic story segmentation. J Ambient Intell Hum Comput 8(6):925–936
Article Google Scholar
Zhang X, Wang DL (2017) Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans Audio Speech Lang Process 25(5):1075–1084
Article Google Scholar

Download references

Author information

Authors and Affiliations

SSN College of Engineering, Old Mahabalipuram Road, Chennai, 603 110, India
S. Shoba & R. Rajavel

Authors

S. Shoba
View author publications
You can also search for this author in PubMed Google Scholar
R. Rajavel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Shoba.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shoba, S., Rajavel, R. A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech. J Ambient Intell Human Comput 11, 433–446 (2020). https://doi.org/10.1007/s12652-019-01309-y

Download citation

Received: 30 May 2018
Accepted: 28 April 2019
Published: 06 May 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s12652-019-01309-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

Abstract

Access this article

Similar content being viewed by others

Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Improving the Performance of Noise Reduction in Hearing Aids Based on the Genetic Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech

Abstract

Access this article

Similar content being viewed by others

Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Improving the Performance of Noise Reduction in Hearing Aids Based on the Genetic Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation