Abstract
Blind separation of musical sounds contained in sound mixtures is a challenging and difficult task. It is due to the fact that in Western music, mixed harmonic sources may be correlated with each other, i.e. their harmonic partials might be overlapping in the frequency domain if the signals remain in harmonic relation. Evaluation of the separation results is also problematic, since analysis of the energy-based error between the original signals used for mixing and the separated ones, in some cases, do not correspond with perceptual evaluation results. In this paper, four separation algorithms, engineered by the Authors, are presented. Then, musical instrument sound identification based on artificial neural networks is performed as a means of evaluating the performance of the separation algorithms. Results are discussed and conclusions are derived.
Similar content being viewed by others
References
Back, A.D. and Weigend, A.S. (1998). A First Application of Independent Component Analysis to Extracting Structure from Stock Returns. Int. J. on Neural Systems, 8(4), 474–484.
Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Massachusetts: {MIT Press, Cambridge}.
Brown, J.C. (1999). Computer Identification of Musical Instruments using Pattern Recognition with Cepstral Coefficients as Features. J. Acoust. Soc. Am., 105, 1933–1941.
Cardoso, J.F. (1997). Informax and Maximum Likelihood for Source Separation. IEEE Letters on Signal Processing, 4, 112–114.
Casey, M. and Westner, A. (2000). Separation of Mixed audio Sources by Independent Subspace Analysis. In Proc. International Computer Music Conference, ICMC Berlin.
de Cheveigné, A. (1993). Separation of Concurrent Harmonic Sounds: Fundamental Frequency Estimation and a Time-Domain Cancellation Model of Auditory Processing. J. Acoust. Soc. Am., 93(6), 3271– 3290.
Comon, P. (1994). Independent Component Analysis—A New Concept? Signal Processing, 36, 287–314.
Goto, M. (2000). A Robust Predominant-F0 Estimation Method for Real-Time Detection of Melody and Bass Lines in CD Recordings. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. II-757–760), Istanbul, Turkey.
Herrera, P., Amatriain, X., Batlle, E., and Serra, X. (2000). Towards Instrument Segmentation for Music Content Description: A Critical Review of Instrument Classification Techniques. In Proc. International Symp. on Music Information Retrieval, Plymouth, Massachusetts, October 23–25.
Herrera, P., Peeters, G., and Dubnov, S. (2003). Automatic Classification of Musical Instrument Sounds. J. Music Research, 32 (19), 3–21.
Kiviluoto, K. and Orja, E. (1998). Independent Component Analysis for Parallel Financial Time Series. In Proc. International Conf. on Neural Information Processing, 2 (pp. 895–898), Tokyo, Japan.
Klapuri, A. (1998). Number Theoretical Means of Resolving a Mixture of Several Harmonic Sounds. In Proc. European Signal Processing Conference.
Klapuri, A. (2001). Multipitch Estimation and Sound Separation by the Spectral Smoothness Principle, In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3381–3384), Salt Lake City, USA.
Klapuri, A., Virtanen, T., and Holm, J.-M. (2000). Robust Multipitch Estimation for the Analysis and Manipulation of Polyphonic Musical Signals. In Proc. COST-G6 Conf. on Digital Audio Effects, Verona, Italy.
Kostek, B. (1999). Soft Computing in Acoustics, Applications of Neural Networks, Fuzzy Logic and Rough Sets to Musical Acoustics, Studies in Fuzziness and Soft Computing. New York: Physica Verlag, Heidelberg.
Kostek, B. (2004). Musical Instrument Classification and Duet Analysis Employing Music Information Retrieval Techniques. Proc. of the IEEE, 92(4), 712–729.
Kostek, B. and Czyzewski, A. (2001). Representing Musical Instrument Sounds for Their Automatic Classification. J. Audio Eng. Soc., 49(9) 768–785.
Kostek, B., Dziubinski, M., and Zwan, P. (2002a). Further Developments of Methods for Searching Optimum Musical and Rhythmic Feature Vectors. 21st Audio Engineering Society Conference, St. Petersburg.
Kostek, B., Zwan, P., and Dziubinski, M. (2002b). Statistical Analysis of Musical Sound Features Derived from Wavelet Representation. 112th Audio Engineering Society Convention, Munich, Germany.
Kowalski, M.A., Sikorski, K.A., and Stenger, F. (1995). Selected Topics in Approximation and Computation. New York, USA: Oxford University Press.
Lakatos, S. (2000). A Common Perceptual Space for Harmonic and Percussive Timbres. Perception & Psychophysics, 62, 1426–1439.
Liu, Z., Wang, Y., and Chen, T. (1998). Audio Feature Extraction and Analysis for Scene Segmentation and Classification. J. of VLSI Signal Processing, 20(1/2), 61–79.
Marques, J. and Moreno, P.J. (1999). A Study of Musical Instrument Classification Using Gaussian Mixture Models and Support Vector Machines. Cambridge Research Laboratory Technical Report, 99/4.
Martin, K.D. and Kim, Y.E. (1998). Musical Instrument Identification: A Pattern-Recognition Approach. 136th Meeting of Acoustic Society of America.
McAdams, S., Winsberg, S., de Soete, G., and Krimphoff, J. (1995). Perceptual Scaling of Synthesized Musical Timbres: Common Dimensions, Specificities, and Latent Subject Classes. Psychological Research, 58, 177–192.
McAulay, R.J. and Quatieri, T.F. (1986). Speech Analysis/Synthesis Based on Sinusoidal Representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(6), 744–754.
Pawlak, Z. (1982). Rough Sets. Int. J. Computer and Information Sciences, 11, 341–356.
Ristaniemi, T. and Joutsensalo, J. (1999). On the Performance of Blind Source Separation in CDMA Downlink. In Proc. Int. Workshop on Independent Component Analysis and Signal Separation (ICA’99) (pp. 437–441), France.
Serra, X. and Smith, J.O. (1990). Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition. Computer Music Journal, 14(4), 12–24.
Smith, J.O. and Serra, X. (1987). PARSHL: An Analysis/Synthesis Program for Non Harmonic Sounds Based on a Sinusoidal Representations. IN Proc. Int. Computer Music Conf. (pp. 290–297), Urbana-Champaign, Illinois, USA.
Szczuko, P., Dalka, P., Dabrowski, P., and Kostek, B. (2004). MPEG-7-based Low-Level Descriptor Effectiveness in the Automatic Musical Sound Classification. 116th Audio Eng. Conv., Preprint No. 6105, Berlin.
Tolonen, T. (1999). Methods for Separation of Harmonic Sound Sources using Sinusoidal Modeling. 106th Audio Engineering Society Conv., Munich, Germany.
Vigario, R., Sarela, J., and Oja, E. (1988). Independent Component Analysis in Wave Decomposition of Auditory Evoked Fields. In Proc. Int. Conf. On Artificial Neural Networks (ICANN’98), (pp. 287–292), Skovde, Sweden.
Virtanen, T. and Klapuri, A. (2000). Separation of Harmonic Sound Sources Using Sinusoidal Modeling. In IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey.
Virtanen, T. and Klapuri, A. (2001). Separation of Harmonic Sounds Using Multipitch Analysis and Iterative Parameter Estimation. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, USA.
ISO/IEC JTC 1/SC 29 (2001). Information Technology—Multimedia Content Description Interface—Part 4: Audio, International Organization For Standardization.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dziubinski, M., Dalka, P. & Kostek, B. Estimation of Musical Sound Separation Algorithm Effectiveness Employing Neural Networks. J Intell Inf Syst 24, 133–157 (2005). https://doi.org/10.1007/s10844-005-0320-x
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10844-005-0320-x