Effects of Soft-Masking Function on Spectrogram-Based Instrument - Vocal Separation

Tran, Duc Chung; Ahamed Khan, M. K. A.

doi:10.1007/978-981-15-6168-9_28

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1215))

Included in the following conference series:

International Conference of the Pacific Association for Computational Linguistics

824 Accesses

Abstract

This paper presents an analysis of effects of soft-masking function on spectrogram-based instrument - vocal separation for audio signals. The function taken into consideration is of 1st-order with two masking magnitude parameters: one for background and one foreground separation. It is found that as the masking magnitude increases, the signal estimations are improved. The background signal’s spectrogram becomes closer to that of the original signal while the foreground signal’s spectrogram represents better the vocal wiggle lines compared to the original signal spectrogram. With the same increase in the masking magnitude (up to ten-fold), the effect on background signal spectrogram is more significant compared to that of foreground signal. This is evident through the significant ($\approx $three times) reduction of background signal’s root-mean-square (RMS) values and the less significant reduction (approximately one-third) of foreground signal’s RMS values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

Article 18 November 2023

Iterative-processed multiband speech enhancement for suppressing musical sounds

Article 21 October 2023

Binary mask based method for enhancement of mixed noise speech of low SNR input

Article 14 September 2015

References

Andersen, K.T., Moonen, M.: Robust speech-distortion weighted interframe wiener filters for single-channel noise reduction. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 97–107 (2018). https://doi.org/10.1109/TASLP.2017.2761699
Article Google Scholar
Arık, S.O., Jun, H., Diamos, G.: Multi-head convolutional neural networks. IEEE Signal Process. Lett. 26(1), 94–98 (2019). https://doi.org/10.1109/LSP.2018.2880284
Article Google Scholar
Badawy, D.E., Duong, N.Q.K., Ozerov, A.: On-the-fly audio source separation-a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017). https://doi.org/10.1109/TASLP.2016.2632528
Article Google Scholar
Braun, S., Habets, E.A.P.: Linear prediction-based online dereverberation and noise reduction using alternating kalman filters. IEEE/ACM Trans. Audio Speech Lang. Process. 26(6), 1119–1129 (2018). https://doi.org/10.1109/TASLP.2018.2811247
Article Google Scholar
Buades, A., Coll, B., Morel, J.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65, June 2005. https://doi.org/10.1109/CVPR.2005.38
Cheer, J., Daley, S.: An investigation of delayless subband adaptive filtering for multi-input multi-output active noise control applications. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 359–373 (2017). https://doi.org/10.1109/TASLP.2016.2637298
Article Google Scholar
Chung, T.D., Ibrahim, R.B., Asirvadam, V.S., Saad, N.B., Hassan, S.M.: Adopting ewma filter on a fast sampling wired link contention in wirelesshart control system. IEEE Trans. Instrum. Meas. 65(4), 836–845 (2016). https://doi.org/10.1109/TIM.2016.2516321
Article Google Scholar
Chung, T.D., Ibrahim, R., Asirvadam, V.S., Saad, N., Hassan, S.M.: Wireless HART: Advanced EWMA Filter Design for Industrial Wireless Networked Control Systems, 1st edn. Taylor & Francis Group, LLC, Abingdon (2017)
Google Scholar
Crocco, M., Martelli, S., Trucco, A., Zunino, A., Murino, V.: Audio tracking in noisy environments by acoustic map and spectral signature. IEEE Trans. Cybernet. 48(5), 1619–1632 (2018). https://doi.org/10.1109/TCYB.2017.2711497
Article Google Scholar
Duong, T.T.H., Duong, N.Q.K., Nguyen, P.C., Nguyen, C.Q.: Gaussian modeling-based multichannel audio source separation exploiting generic source spectral model. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 32–43 (2019). https://doi.org/10.1109/TASLP.2018.2869692
Article Google Scholar
Ekpo, S.C., Adebisi, B., Wells, A.: Regulated-element frost beamformer for vehicular multimedia sound enhancement and noise reduction applications. IEEE Access 5, 27254–27262 (2017). https://doi.org/10.1109/ACCESS.2017.2775707
Article Google Scholar
Foundation, P.S.: Python software foundation (2019). https://www.python.org/
Google: Welcome to colaboratory (2019). https://colab.research.google.com
He, Q., Bao, F., Bao, C.: Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 457–468 (2017). https://doi.org/10.1109/TASLP.2016.2636445
Article Google Scholar
Itakura, K., Bando, Y., Nakamura, E., Itoyama, K., Yoshii, K., Kawahara, T.: Bayesian multichannel audio source separation based on integrated source and spatial models. IEEE/ACM Trans. Audio Speech Lang. Process. 26(4), 831–846 (2018). https://doi.org/10.1109/TASLP.2017.2789320
Article Google Scholar
Koluguri, N.R., Meenakshi, G.N., Ghosh, P.K.: Spectrogram enhancement using multiple window savitzky-golay (MWSG) filter for robust bird sound detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1183–1192 (2017). https://doi.org/10.1109/TASLP.2017.2690562
Article Google Scholar
Laufer, Y., Gannot, S.: A bayesian hierarchical model for speech enhancement with time-varying audio channel. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 225–239 (2019). https://doi.org/10.1109/TASLP.2018.2876177
Article Google Scholar
Xia, L., Chung, T.D., Kassim, K.A.A.: An automobile detection algorithm development for automated emergency braking system. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6, June 2014. https://doi.org/10.1145/2593069.2593083
Liu, Y., Jaw, D., Huang, S., Hwang, J.: Desnownet: context-aware deep network for snow removal. IEEE Trans. Image Process. 27(6), 3064–3073 (2018). https://doi.org/10.1109/TIP.2018.2806202
Article MathSciNet Google Scholar
Luis-Valero, M., Habets, E.A.P.: Low-complexity multi-microphone acoustic echo control in the short-time fourier transform domain. IEEE/ACM Trans. Audio Speech Lang. Proces. 27(3), 595–609 (2019). https://doi.org/10.1109/TASLP.2018.2885786
Article Google Scholar
Mahé, G., Jaïdane, M.: Perceptually controlled reshaping of sound histograms. IEEE/ACM Trans. Audio Speech Lang. Proces. 26(9), 1671–1683 (2018). https://doi.org/10.1109/TASLP.2018.2836143
Article Google Scholar
Marquardt, D., Doclo, S.: Interaural coherence preservation for binaural noise reduction using partial noise estimation and spectral postfiltering. IEEE/ACM Trans. Audio Speech Lang. Proces. 26(7), 1261–1274 (2018). https://doi.org/10.1109/TASLP.2018.2823081
Article Google Scholar
Rafii, Z., Pardo, B.: Online repet-sim for real-time speech enhancement. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 848–852, May 2013. https://doi.org/10.1109/ICASSP.2013.6637768
Raguraman, P.R.M., Vijayan, M.: Librosa based assessment tool for music information retrieval systems. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 109–114, March 2019. https://doi.org/10.1109/MIPR.2019.00027
Shimada, K., Bando, Y., Mimura, M., Itoyama, K., Yoshii, K., Kawahara, T.: Unsupervised speech enhancement based on multichannel nmf-informed beamforming for noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Proces. 27(5), 960–971 (2019). https://doi.org/10.1109/TASLP.2019.2907015
Article Google Scholar
Sienko, M.: Loop-filter design and analysis for delta-sigma modulators and oversampled IIR filters. IEEE Trans. Circuits Syst. I Regul. Pap. 65(12), 4121–4132 (2018). https://doi.org/10.1109/TCSI.2018.2838021
Article Google Scholar
Stallmann, C.F., Engelbrecht, A.P.: Gramophone noise detection and reconstruction using time delay artificial neural networks. IEEE Trans. Syst. Man Cybernet. Syst. 47(6), 893–905 (2017). https://doi.org/10.1109/TSMC.2016.2523927
Article Google Scholar
Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: ArtGAN: artwork synthesis with conditional categorical GANs. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3760–3764, September 2017. https://doi.org/10.1109/ICIP.2017.8296985
L.D. Team: Librosa (2019). https://librosa.github.io/librosa/
Torcoli, M., Herre, J., Fuchs, H., Paulus, J., Uhle, C.: The adjustment/satisfaction test (a/st) for the evaluation of personalization in broadcast services and its application to dialogue enhancement. IEEE Trans. Broadcast. 64(2), 524–538 (2018). https://doi.org/10.1109/TBC.2018.2832458
Article Google Scholar
Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P.J.B., Plumbley, M.D.: Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1230–1241 (2017). https://doi.org/10.1109/TASLP.2017.2690563
Article Google Scholar
Zheng, C., Deleforge, A., Li, X., Kellermann, W.: Statistical analysis of the multichannel wiener filter using a bivariate normal distribution for sample covariance matrices. IEEE/ACM Trans. Audio Speech Lang. Process. 26(5), 951–966 (2018). https://doi.org/10.1109/TASLP.2018.2800283
Article Google Scholar

Download references

Acknowledgment

The authors would thank FPT University, Hanoi, Vietnam and UCSI University, Kuala Lumpur, Malaysia for supporting this research.

Author information

Authors and Affiliations

Computing Fundamental Department and FPT Technology Research Institute, FPT University, Hoa Lac Hi-Tech Park, Hanoi, 155300, Vietnam
Duc Chung Tran
Faculty of Engineering Technology and Built Environment, UCSI University, Kuala Lumpur, Malaysia
M. K. A. Ahamed Khan

Authors

Duc Chung Tran
View author publications
You can also search for this author in PubMed Google Scholar
M. K. A. Ahamed Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duc Chung Tran .

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Le-Minh Nguyen
University of Engineering and Technology, Hanoi, Vietnam
Xuan-Hieu Phan
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Kôiti Hasida
Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Satoshi Tojo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, D.C., Ahamed Khan, M.K.A. (2020). Effects of Soft-Masking Function on Spectrogram-Based Instrument - Vocal Separation. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_28

Download citation

DOI: https://doi.org/10.1007/978-981-15-6168-9_28
Published: 02 July 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6167-2
Online ISBN: 978-981-15-6168-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effects of Soft-Masking Function on Spectrogram-Based Instrument - Vocal Separation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

Iterative-processed multiband speech enhancement for suppressing musical sounds

Binary mask based method for enhancement of mixed noise speech of low SNR input

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Effects of Soft-Masking Function on Spectrogram-Based Instrument - Vocal Separation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

Iterative-processed multiband speech enhancement for suppressing musical sounds

Binary mask based method for enhancement of mixed noise speech of low SNR input

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation