Abstract
In this paper, we propose a deep recurrent neural network (DRNN), based on the Long Short-Term Memory (LSTM) unit, for the separation of drum and bass sources from a monaural audio track. In particular, a single DRNN with a total of six hidden layers (three feedforward and three recurrent) is used for each original source to be separated. In this work, we limit our attention to the case of only two, challenging sources: drum and bass. Some experimental results show the effectiveness of the proposed approach with respect to another state-of-the-art method. Results are expressed in terms of well-known metrics in the field of source separation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at: http://medleydb.weebly.com/.
References
Asari, H., Olsson, R.K., Pearlmutter, B.A.: Sparsification for monaural source separation. In: Makino, S., Lee, T.W., Sawada, H. (eds.) Blind Speech Separation, Chap. 14, pp. 387–410. Springer (2007)
Beierholm, T., Dam Pedersen, B., Winthert, O.: Low complexity bayesian single-channel source separation. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004) (2004)
Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: 15th International Society for Music Information Retrieval Conference, pp. 1–6. Taipei, Taiwan (2014)
Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. Wiley (2002)
Comon, P., Jutten, C. (eds.): Handbook of Blind Source Separation. Springer (2010)
Gao, B., Woo, W.L., Dlay, S.S.: Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Trans. Audio Speech Lang. Process. 19(4), 961–976 (2011)
Grais, E.M., Sen, M.U., Erdogan, H.: Deep neural networks for single channel source separation. In: 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP 2014), pp. 1–5. Florence, Italy, 4–9 May 2014
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio, Speech Lang. Process. 23(12), 1–12 (2015)
Jang, G.J., Lee, T.W.: A maximum likelihood approach to single-channel source separation. J. Mach. Learn. Res. 4(12), 1365–1392 (2003)
Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401(6755), 788–791 (1999)
Litvin, Y., Cohen, I.: Source separation using Bark-scale wavelet packet decompostion. In: Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP2009), pp. 1–4. Grenoble, France, 1–4 Sept 2009
Molla, K., Hirose, K.: Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Trans. Audio Speech Lang. Process. 15(3), 893–900 (2004)
Patki, K.: Review of single channel source separation techniques. In: Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), pp. 1–5. Curitiba, Brasil, 4–8 Nov 2013
Reddy, A.M., Raj, B.: Soft mask methods for single-channel speaker separation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1766–1776 (2007)
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180, 19–22 Oct 2003
Tieleman, T., Hinton, G.: Lecture 6.5—RMSProp. Tech. rep., COURSERA: Neural Networks for Machine Learning (2012)
Uncini, A.: Fundamentals of adaptive signal processing. In: Signals and Communication Technology. Springer International Publishing, Switzerland (2015)
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Weninger, F., Eyben, F., Schuller, B.: Single-channel speech separation with memory-enhanced recurrent neural networks. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), pp. 3709–3713. Florence, Italy, 4–9 May 2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Scarpiniti, M., Scardapane, S., Comminiello, D., Parisi, R., Uncini, A. (2019). Separation of Drum and Bass from Monaural Tracks. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Advances in Processing Nonlinear Dynamic Signals. WIRN 2017 2017. Smart Innovation, Systems and Technologies, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-95098-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-95098-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95097-6
Online ISBN: 978-3-319-95098-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)