Abstract
The search for a flexible and concise alternate representation for digital musical sound leads to the proposal for the use of the MIDI (Musical Instrument Digital Interface) protocol. The problem becomes one of automating the conversion process from sound to MIDI. This requires processing musical sound and extracting the information necessary to represent the sound as MIDI data. We have conducted studies which have led to algorithms for segmentation of the sound and pitch detection of the individual notes. We describe a novel method for pitch detection using subset selection with dictionaries containing harmonic spectra from samples of musical sounds. Examples demonstrating applicability to monophonic sounds as well as signals with multiple sound sources are given, including detection of objects in a complex background scene.
Similar content being viewed by others
References
A. Ghias, J. Logan, D. Chamberlin, and B.C. Smith, “Query by humming: Musical information retrieval in an audio database,” Preprint, Department of Computer Science, Cornell University, 1997.
H. Helmholtz, On the Sensations of Tone (4th edition, 1877) Dover, New York, 1954.
D. Luce and C. Melville, “Duration of attack transients of nonpercussive orchestral instruments,” J. Audio Eng. Soc., Vol. 13, No.3, p. 194, 1965.
M.D. Freedman, “Analysis of musical instrument tones,” J. Acoust. Soc. Am., Vol. 41, p. 793, 1967.
J.W. Beauchamp, “Acomputer system for time-variant harmonic analysis and synthesis of musical tones,” Music by Computers, Wiley, New York, 1969.
J.A. Moorer, “On the segmentation and analysis of continuous musical sound by digital computer,” PhD thesis, Stanford University, 1975.
S. Foster, W. Andrew Schloss, and A. Joseph Rockmore, “Toward an intelligent editor of digital audio: Signal processing methods,” Computer Music Journal, Vol. 6, No.1, 1982.
C. Chafe, D. Jaffe, K. Kashima, B. Mont-Reynaud, and J. Smith, “Techniques for note identification in polyphonic music,” Proc. ICMC, pp. 399-405, 1985.
R. Wilson, A.D. Calway, and E.R.S. Pearson, “A generalized wavelet transform for Fourier analysis: The multiresolution fourier transform and its application to image and audio signal analysis,” IEEE Trans. Info. Theory,Vol. 38, No.2, pp. 674-690, March 1992.
A.S. Tanguiane, Artificial Perception and Music Recognition, Springer-Verlag, Berlin, 1993.
E.D. Scheirer, “Bregman's chimerae: Music perception as auditory scene analysis,” Technical report, MIT Media Lab, 1996.
D.P.W. Ellis, “A computer implementation of psychoacoustic grouping rules,” Technical report 224, MIT Media Lab, 1994.
A.S. Bregman, Auditory Scene Analysis, MIT Press, Cambridge, MA, 1990.
B.C.J. Moore, An Introduction to the Psychology of Hearing, Academic Press, London, 1989.
E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models, Springer Verlag, Berlin, 1990.
S. Handel, Listening, MIT Press, Cambridge, MA, 1989.
J. Rothstein, MIDI: A Comprehensive Introduction, 2nd edition, A-R Editions, Madison, WI, 1995.
J. Heckroth, “Tutorial on MIDI and music synthesis,” World Wide Web, 1995. http://www.harmony-central.com/MIDI/Docs/tutorial.html.
D.J. Thomson, “Spectrum estimation and harmonic analysis,” Proc. IEEE, Vol. 70, No.9, pp. 1055-1096, Sept. 1982.
D. Slepian, “Prolate spheroidalwave functions, Fourier analysis, and uncertainty V: The discrete case,” Bell Syst. Tech. J., Vol. 57, pp. 1371-1429, 1978.
R.J. McAulay and T.F. Quatieri, “Speech analysis/synthesis based on a sinusoidal representation,” IEEE Trans. Acoust. Speech and Signal Proc.,Vol. 34, No.4, pp. 744-754, Aug. 1986.
E. Terhardt, G. Stoll, and M. Seewann, “Algorithm for extraction of pitch and pitch salience from complex tonal signals,” J. Acoust. Soc. Am., Vol. 71, No.3, pp. 679-688, March 1982.
E. Terhardt, G. Stoll, and M. Seewann, “Pitch of complex signals according to virtual-pitch theory: Tests, examples and predictions,” J. Acoust. Soc. Am., Vol. 71, No.3, pp. 671-678, March 1982.
I. Daubechies, “Time-frequency localization operators: A geometric phase space approach,” IEEE Trans. Info. Theory,Vol. 34, No.4, pp. 605-612, 1988.
S. Mallat and Z. Zhang, “Matching pursuit in a time-frequency dictionary,” IEEE Trans. Signal Proc., Vol. 41, pp. 3397-3415, 1993.
S. Chen and D.L. Donoho, “Atomic decomposition by basis pursuit,” Technical report, Stanford University, May 1995.
N. Sieger and A. Tewfik, “Audio coding for conversion to MIDI,” Proceedings of the IEEE Workshop on Multimedia Signal Processing, IEEE, June 1997.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sieger, N.J., Tewfik, A.H. Audio Coding for Representation in MIDI via Pitch Detection Using Harmonic Dictionaries. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 20, 45–59 (1998). https://doi.org/10.1023/A:1008074130468
Published:
Issue Date:
DOI: https://doi.org/10.1023/A:1008074130468