Skip to main content
Log in

A selection function for pitched instrument source separation

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

There exist a large number of methods for pitched instrument source separation. The core problem is to separate the time-frequency overlapping harmonics. To yield better results, we propose a function to select fine harmonic separation results from existing methods. Our strategy is based on the discovery that a fine separation result usually has a low total amplitude fluctuation. For source harmonics overlapped in a frequency band, each method produces a separation result. Employing the harmonics separated by each method, we can estimate the total amplitude fluctuation of each group of overlapping harmonics. Our selection function maps the band index to the method index by selecting the method with the minimum total amplitude fluctuation. Experiments are conducted on sample mixtures from the University of Iowa Musical Instrument Sample Database. Three advanced separation techniques are compared, including common amplitude modulation (CAM), harmonic bandwidth companding (HBW-comp) and ideal binary mask (IBM) filtering. Experiment results indicate that the proposed selection function is able to boost the separation performance significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Koteswararao, Y.V., Rao, C.B.R.: Multichannel speech separation using hybrid GOMF and enthalpy-based deep neural networks. Multimedia Syst. 27, 271–286 (2021)

    Article  Google Scholar 

  2. Xie, L., Fu, Z., Feng, W., Luo, Y.: Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news. Multimedia Syst. 17, 101–11 (2011)

    Article  Google Scholar 

  3. Rafii, Z., Liutkus, A., Stoter, F. R., Mimilakis, S. I., FitzGerald, D., Pardo, B.: “An Overview of Lead and Accompaniment Separation in Music,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 8, 2018

  4. Li, Y., Woodruff, J.: Monaural musical sound separation based on pitch and common amplitude modulation. IEEE Trans. Audio Speech Lang. Process. 17(7), 1361–1371 (2009)

    Article  Google Scholar 

  5. Zivanovic, M.: Harmonic bandwidth companding for separation of overlapping harmonics in pitched signals. IEEE/ACM Trans. Audio Speech Lang. Process. 23(5), 898–908 (2015)

    Google Scholar 

  6. Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Networks 15(5), 1135–1150 (2004)

    Article  Google Scholar 

  7. Stoter, F. R., Liutkus, A., Badeau, R., Edler, B., Magron, P.: “Common fate model for unison source separation,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2016

  8. Pishdadian, F., Pardo, B.: Multi-resolution common fate transform. IEEE/ACM Trans. Audio Speech Lang. Process. 27(2), 342–354 (2019)

    Article  Google Scholar 

  9. Tachibana, H., Ono, N., Sagayama, S.: Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 228–237 (2014)

    Article  Google Scholar 

  10. Brian, C.J.: Moore. Academic Press, An introduction to the psychology of hearing (1997)

  11. The University of IOWA Musical Instrument Sample Database. [Online]. Available: http//:theremin.music.uiowa.edu/

  12. Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  13. Wang, D.L., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley/IEEE Press, Hoboken (2006)

    Book  Google Scholar 

  14. Li, Y., Wang, D.L.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Trans. Audio Speech Lang. Process. 15(4), 1475–1487 (2007)

    Article  Google Scholar 

  15. Serra, X.: “Musical sound modeling with sinusoids plus noise,” in Musical Signal Processing, 1997

  16. McAulay, R., Quatieri, T.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoustic Speech Signal Process. 34(4), 744–754 (1986)

    Article  Google Scholar 

  17. Fevotte, C., Godsill, S.J.: A Bayesian approach for blind separation of sparse sources. IEEE Trans. Audio Speech Lang. Process. 14(6), 2174–2188 (2006)

    Article  Google Scholar 

  18. Ozerov, A., Philippe, P., Bimbot, F., Gribonval, R.: Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Trans. Audio Speech Lang. Process. 15(5), 1564–1578 (2007)

    Article  Google Scholar 

  19. Casey, M. A., Westner, W.: “Separation of mixed audio sources by independent subspace analysis,” in Proceedings of International Computer Music Conference, 2000

  20. Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)

    Article  Google Scholar 

  21. Abdallah, S.A., Plumbley, M.D.: Unsupervised analysis of polyphonic music by sparse coding. IEEE Trans. Neural Networks 17(1), 1066–1074 (2007)

    Google Scholar 

  22. Huang, P. S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: “Joint optimization of masks and deep recurrent neural networks for monaural source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, 2015

  23. Chandna, P., Miron, M., Janer, J., Gómez, E.: “Monoaural audio source separation using deep convolutional neural networks,” in 13th International Conference on Latent Variable Analysis and Signal Separation, 2017

  24. Hershey, J. R., Chen, Z., Roux, J. L., Watanabe, S.: “Deep clustering: Discriminative embeddings for segmentation and separation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2016

  25. Luo, Y., Chen, Z., Hershey, J. R., Roux, J. L., Mesgarani, N.: “Deep clustering and conventional networks for music separation: Stronger together,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2017

  26. Grais, E.M., Roma, G., Simpson, A.J.R., Plumbley, M.D.: Two-stage single-channel audio source separation using deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 25(9), 1773–1783 (2017)

    Article  Google Scholar 

  27. Every, M.R., Szymanski, J.E.: Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Trans. Audio Speech Lang. Process. 14(5), 1845–1856 (2006)

    Article  Google Scholar 

  28. Virtanen, T., Klapuri, A.: “Separation of harmonic sounds using multipitch analysis and iterative parameter estimation,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 83–86, 2001

  29. Bay, M., Beauchamp, J. W.: “Harmonic source separation using prestored spectra,” Independent Component Analysis and Blind Signal Separation, pp. 561–568, 2006

  30. Duan, Z., Zhang, Y., Zhang, C., Shi, Z.: Unsupervised single-channel music source separation by average harmonic structure modeling. IEEE Trans. Audio Speech Lang. Process. 16(4), 766–778 (2008)

    Article  Google Scholar 

  31. Gong, Y., Shu, X., Tang, J.: “Recovering overlapping partials for monaural perfect harmonic musical sound separation using modified common amplitude modulation,” in Pacific Rim Conference on Multimedia, pp. 903–912, 2017

  32. Jensen, K.: “Timbre models of musical sounds,” Ph.D. dissertation, University of Copenhagen, 1999

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longquan Dai.

Additional information

Communicated by X. Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, Y., Dai, L. & Tang, J. A selection function for pitched instrument source separation. Multimedia Systems 28, 311–319 (2022). https://doi.org/10.1007/s00530-021-00836-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00836-z

Keywords

Navigation