Abstract
Automatic speech and sound recognition typically involves some measure of distance between training and (possibly time-warped) test samples. Special problems arise when the spectral samples of interest are intermittent and contain temporal patterns of alternating periods of sounds and pauses that are significant for recognition. In such cases a recognizer must be capable of distinguishing between the end-points and the pauses of digitized samples and economically searching the segmented sounds for the occurrence of significant spectral patterns. The usual distance metrics based on conventional dynamic time warping algorithms may be inappropriate because time-warping often corrupts the temporal structure of the sound. The problem can be solved by first searching a test sample for distinctive temporal patterns and, if more than one match is obtained, using a spectral distance measure to classify the sample with its nearest neighbor among these. Computational advantages can be obtained if both the temporal and spectral templates are maintained in a binary format reflecting the important sound components.
Preview
Unable to display preview. Download preview PDF.
References
J. S. Bridle. An efficient elastic-template method for detecting given words in running speech. Spring Meeting, British Acoust. Soc, 1973.
P. deSouza. A statistical approach to the design of an adaptive self-normalizing silence detection. IEEE Trans. ASSP, ASSP-31:678–684, 1983.
J. Doherty and R. Hoy. Communication in insects. III. The auditory behavior of crickets: some views of genetic coupling, song recognition, and predator detection. Quarterly Review of Biology, 60:457–472, 1985.
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, New York, New York, 1973.
J. L. Elman and D. Zipser. Learning the hidden structure of speech. Journal Accoust. Soc. Amer., 83:1615–1626, 1988.
A. L. Higgins and R. Wohlford. Keyword recognition using template concatenation. Proc. IEEE Int. Conf. ASSP, pages 1233–1236, 1985.
R. R. Hoy. Acoustic communication in crickets: a model system for the study of feature detection. Federation Proc., 37:2316–2323, 1978.
M. James. Pattern Recognition. John Wiley and Sons, New York, New York, 1988.
L. R. Lamel, L. Rabiner, A. Rosenberg, and J. Wilpon. An improved endpoint detector for isolated word recognition. IEEE Trans. ASSP, ASSP-29:777–785, 1981.
D. O'Shaughnessy. Speech Communication Human and Machine. Addison-Wesley Publishing Company, Reading, Massachusetts, 1987.
B. Pinkowski. Discrete discriminant models: A performance simulation with reference to expert systems applications. In 20th Annual Simulation Symposium, pages 103–119. IEEE, 1987.
B. Pinkowski. A rule-based approach for simulating errors in discrete sequential processes. In 22nd Annual Simulation Symposium, pages 145–152. IEEE, 1989.
G. S. Pollack and R. R. Hoy. Temporal pattern as a cue for species-specific calling song recognition in crickets. Science, 204:429–432, 1979.
L. R. Rabiner. On creating reference templates for speaker independent recognition of isolated words. IEEE Trans. ASSP, ASSP-26:34–42, 1978.
L. R. Rabiner and M. R. Sambur. An algorithm for determining the endpoints of isolated utterances. Bell Sys. Tech. Journal, 54:297–315, 1975.
J. J. Schwartz. The importance of spectral and temporal properties in species and call recognition in a neotropical treefrog with a complex vocal repertoire. Animal Behavior, 35:340–347, 1987.
N. Sugamura, K. Shikano, and S. Furui. Isolated word recognition using phoneme-like templates. ICASSP, pages 732–726, 1983.
J. Thorson, T. Weber, and F. Huber. Auditory behavior of the cricket. II. Simplicity of calling-song recognition in gryllus, and anomalous phonotaxis at abnormal carrier frequencies. Journal Comp. Physiol., 146:361–378, 1982.
J. D. Tubbs. A note on binary template-matching. Pattern Recognition, 22:359–365, 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pinkowski, B. (1991). A template-based approach for recognition of intermittent sounds. In: Sherwani, N.A., de Doncker, E., Kapenga, J.A. (eds) Computing in the 90's. Great Lakes CS 1989. Lecture Notes in Computer Science, vol 507. Springer, New York, NY. https://doi.org/10.1007/BFb0038472
Download citation
DOI: https://doi.org/10.1007/BFb0038472
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-97628-0
Online ISBN: 978-0-387-34815-5
eBook Packages: Springer Book Archive