Abstract
This paper presents an analysis of effects of soft-masking function on spectrogram-based instrument - vocal separation for audio signals. The function taken into consideration is of 1st-order with two masking magnitude parameters: one for background and one foreground separation. It is found that as the masking magnitude increases, the signal estimations are improved. The background signal’s spectrogram becomes closer to that of the original signal while the foreground signal’s spectrogram represents better the vocal wiggle lines compared to the original signal spectrogram. With the same increase in the masking magnitude (up to ten-fold), the effect on background signal spectrogram is more significant compared to that of foreground signal. This is evident through the significant (\(\approx \)three times) reduction of background signal’s root-mean-square (RMS) values and the less significant reduction (approximately one-third) of foreground signal’s RMS values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andersen, K.T., Moonen, M.: Robust speech-distortion weighted interframe wiener filters for single-channel noise reduction. IEEE/ACM Trans. Audio Speech Lang. Process. 26(1), 97–107 (2018). https://doi.org/10.1109/TASLP.2017.2761699
Arık, S.O., Jun, H., Diamos, G.: Multi-head convolutional neural networks. IEEE Signal Process. Lett. 26(1), 94–98 (2019). https://doi.org/10.1109/LSP.2018.2880284
Badawy, D.E., Duong, N.Q.K., Ozerov, A.: On-the-fly audio source separation-a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017). https://doi.org/10.1109/TASLP.2016.2632528
Braun, S., Habets, E.A.P.: Linear prediction-based online dereverberation and noise reduction using alternating kalman filters. IEEE/ACM Trans. Audio Speech Lang. Process. 26(6), 1119–1129 (2018). https://doi.org/10.1109/TASLP.2018.2811247
Buades, A., Coll, B., Morel, J.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65, June 2005. https://doi.org/10.1109/CVPR.2005.38
Cheer, J., Daley, S.: An investigation of delayless subband adaptive filtering for multi-input multi-output active noise control applications. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 359–373 (2017). https://doi.org/10.1109/TASLP.2016.2637298
Chung, T.D., Ibrahim, R.B., Asirvadam, V.S., Saad, N.B., Hassan, S.M.: Adopting ewma filter on a fast sampling wired link contention in wirelesshart control system. IEEE Trans. Instrum. Meas. 65(4), 836–845 (2016). https://doi.org/10.1109/TIM.2016.2516321
Chung, T.D., Ibrahim, R., Asirvadam, V.S., Saad, N., Hassan, S.M.: Wireless HART: Advanced EWMA Filter Design for Industrial Wireless Networked Control Systems, 1st edn. Taylor & Francis Group, LLC, Abingdon (2017)
Crocco, M., Martelli, S., Trucco, A., Zunino, A., Murino, V.: Audio tracking in noisy environments by acoustic map and spectral signature. IEEE Trans. Cybernet. 48(5), 1619–1632 (2018). https://doi.org/10.1109/TCYB.2017.2711497
Duong, T.T.H., Duong, N.Q.K., Nguyen, P.C., Nguyen, C.Q.: Gaussian modeling-based multichannel audio source separation exploiting generic source spectral model. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 32–43 (2019). https://doi.org/10.1109/TASLP.2018.2869692
Ekpo, S.C., Adebisi, B., Wells, A.: Regulated-element frost beamformer for vehicular multimedia sound enhancement and noise reduction applications. IEEE Access 5, 27254–27262 (2017). https://doi.org/10.1109/ACCESS.2017.2775707
Foundation, P.S.: Python software foundation (2019). https://www.python.org/
Google: Welcome to colaboratory (2019). https://colab.research.google.com
He, Q., Bao, F., Bao, C.: Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 457–468 (2017). https://doi.org/10.1109/TASLP.2016.2636445
Itakura, K., Bando, Y., Nakamura, E., Itoyama, K., Yoshii, K., Kawahara, T.: Bayesian multichannel audio source separation based on integrated source and spatial models. IEEE/ACM Trans. Audio Speech Lang. Process. 26(4), 831–846 (2018). https://doi.org/10.1109/TASLP.2017.2789320
Koluguri, N.R., Meenakshi, G.N., Ghosh, P.K.: Spectrogram enhancement using multiple window savitzky-golay (MWSG) filter for robust bird sound detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1183–1192 (2017). https://doi.org/10.1109/TASLP.2017.2690562
Laufer, Y., Gannot, S.: A bayesian hierarchical model for speech enhancement with time-varying audio channel. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 225–239 (2019). https://doi.org/10.1109/TASLP.2018.2876177
Xia, L., Chung, T.D., Kassim, K.A.A.: An automobile detection algorithm development for automated emergency braking system. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6, June 2014. https://doi.org/10.1145/2593069.2593083
Liu, Y., Jaw, D., Huang, S., Hwang, J.: Desnownet: context-aware deep network for snow removal. IEEE Trans. Image Process. 27(6), 3064–3073 (2018). https://doi.org/10.1109/TIP.2018.2806202
Luis-Valero, M., Habets, E.A.P.: Low-complexity multi-microphone acoustic echo control in the short-time fourier transform domain. IEEE/ACM Trans. Audio Speech Lang. Proces. 27(3), 595–609 (2019). https://doi.org/10.1109/TASLP.2018.2885786
Mahé, G., Jaïdane, M.: Perceptually controlled reshaping of sound histograms. IEEE/ACM Trans. Audio Speech Lang. Proces. 26(9), 1671–1683 (2018). https://doi.org/10.1109/TASLP.2018.2836143
Marquardt, D., Doclo, S.: Interaural coherence preservation for binaural noise reduction using partial noise estimation and spectral postfiltering. IEEE/ACM Trans. Audio Speech Lang. Proces. 26(7), 1261–1274 (2018). https://doi.org/10.1109/TASLP.2018.2823081
Rafii, Z., Pardo, B.: Online repet-sim for real-time speech enhancement. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 848–852, May 2013. https://doi.org/10.1109/ICASSP.2013.6637768
Raguraman, P.R.M., Vijayan, M.: Librosa based assessment tool for music information retrieval systems. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 109–114, March 2019. https://doi.org/10.1109/MIPR.2019.00027
Shimada, K., Bando, Y., Mimura, M., Itoyama, K., Yoshii, K., Kawahara, T.: Unsupervised speech enhancement based on multichannel nmf-informed beamforming for noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Proces. 27(5), 960–971 (2019). https://doi.org/10.1109/TASLP.2019.2907015
Sienko, M.: Loop-filter design and analysis for delta-sigma modulators and oversampled IIR filters. IEEE Trans. Circuits Syst. I Regul. Pap. 65(12), 4121–4132 (2018). https://doi.org/10.1109/TCSI.2018.2838021
Stallmann, C.F., Engelbrecht, A.P.: Gramophone noise detection and reconstruction using time delay artificial neural networks. IEEE Trans. Syst. Man Cybernet. Syst. 47(6), 893–905 (2017). https://doi.org/10.1109/TSMC.2016.2523927
Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: ArtGAN: artwork synthesis with conditional categorical GANs. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3760–3764, September 2017. https://doi.org/10.1109/ICIP.2017.8296985
L.D. Team: Librosa (2019). https://librosa.github.io/librosa/
Torcoli, M., Herre, J., Fuchs, H., Paulus, J., Uhle, C.: The adjustment/satisfaction test (a/st) for the evaluation of personalization in broadcast services and its application to dialogue enhancement. IEEE Trans. Broadcast. 64(2), 524–538 (2018). https://doi.org/10.1109/TBC.2018.2832458
Xu, Y., Huang, Q., Wang, W., Foster, P., Sigtia, S., Jackson, P.J.B., Plumbley, M.D.: Unsupervised feature learning based on deep models for environmental audio tagging. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1230–1241 (2017). https://doi.org/10.1109/TASLP.2017.2690563
Zheng, C., Deleforge, A., Li, X., Kellermann, W.: Statistical analysis of the multichannel wiener filter using a bivariate normal distribution for sample covariance matrices. IEEE/ACM Trans. Audio Speech Lang. Process. 26(5), 951–966 (2018). https://doi.org/10.1109/TASLP.2018.2800283
Acknowledgment
The authors would thank FPT University, Hanoi, Vietnam and UCSI University, Kuala Lumpur, Malaysia for supporting this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tran, D.C., Ahamed Khan, M.K.A. (2020). Effects of Soft-Masking Function on Spectrogram-Based Instrument - Vocal Separation. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_28
Download citation
DOI: https://doi.org/10.1007/978-981-15-6168-9_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6167-2
Online ISBN: 978-981-15-6168-9
eBook Packages: Computer ScienceComputer Science (R0)