Abstract
Sound-imitation words (SIWs), or onomatopoeia, are important for computer human interactions and the automatic tagging of sound archives. The main problem in automatic SIW recognition is ambiguity in the determining phonemes, since different listener hears the same environmental sound as a different SIW even under the same situation. To solve this problem, we designed a set of new phonemes, called the basic phoneme-group set, to represent environmental sounds in addition to a set of the articulation-based phoneme-groups. Automatic SIW recognition based on Hidden Markov Model (HMM) with the basic phoneme-groups is allowed to generate plural SIWs in order to absorb ambiguities caused by listener- and situation-dependency. Listening experiments with seven subjects proved that automatic SIW recognition based on the basic phoneme-groups outperformed that based on the articulation-based phoneme-groups and that based on Japanese phonemes. The proposed system proved more adequate to use computer interactions.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jahns, G., et al.: Sound Analysis to Recognize Individuals and Animal Conditions, XIII CIGR Congress on Agricultural (1998)
Nagahata, K.: A study of how visually impaired persons identify a place using environmental sounds. Journal of the Acoustic Society of Japan 56(6), 406–417 (2000)
Zhang, T., Jay Kuo, C.C.: Audio-guided audiovisual data segmentation, indexing, and retrieval. In: Proc. of the SPIE The International Society for Optical Emphasis Engineering, vol. 3656, pp. 316–327 (1998)
Darvishi, A.: World Wide Web access for blind people: problems, available solutions and an approach for using environmental sounds. In: Proc. of the 5th International conference on Computers helping people with special needs, vol. 1, pp. 369–373 (1996)
Ashiya, T., et al.: IOSES: An Indoor Observation System Based on Environmental Sounds Recognition Using a Neural Network. Trans. of the Institute of Electrical Engineers of Japan 116-C(3), 341–349 (1996)
Tanaka, K.: Study of Onomatopoeia Expressing Strange Sounds (Case if Impulse Sounds and Beat Sounds). Trans. of the Japan Society of Mechanical Engineers Series C 61(592) (1995) (in Japanese)
Wake, S., Asahi, T.: Sound Retrieval with Intuitive Verbal Descriptions, IEICE 2001. Trans. on Information and Systems E84-D(11), 1568–1576 (2001)
Ishihara, K., Tsubota, Y., Okuno, H.G.: Automatic Transformation of Environmental Sounds into Sound-ImitationWords Based on Japanese Syllable Structure. In: Proc. of EUROSPEECH 2003, pp. 3185–3188 (2003)
HTK3.0, http://htk.eng.cam.ac.uk/
Hiyane, K.: Study of Spectrum Structure of Short-time Sounds and its Onomatopoeia Expression, IEICE Technical Report, SP97-125 (1998) (in Japanese)
Ladefoged, P.: A Cours In Phonetics. Harcourt Brace College Publishers (1993)
Hattori, Y., et al.: Repeat recognition of Continuous Environmental Sound. Information Processing Society of Japan (2003) (in Japanese)
Cowling, M., Sitte, R.: Comparison of techniques for environmental sound recognition. Pattern Recognition Letter 24, 2895–2907 (2003)
Tamori, I., Schourup, L.: Onomatopoeia – ke-i-ta-i to i-mi –. Kuroshio Publisher (1999)
RWCP Sound Scene Database in Real Acoustical Environments, http://tosa.mri.co.jp/sounddb/indexe.htm
SHI-N KO-KA-O-N DA-I-ZE-N-SHU, KING RECORD (in Japanese)
KO-KA-O-N DA-I-ZE-N-SHU, KING RECORD (in Japanese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishihara, K., Nakatani, T., Ogata, T., Okuno, H.G. (2004). Automatic Sound-Imitation Word Recognition from Environmental Sounds Focusing on Ambiguity Problem in Determining Phonemes. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_96
Download citation
DOI: https://doi.org/10.1007/978-3-540-28633-2_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22817-2
Online ISBN: 978-3-540-28633-2
eBook Packages: Springer Book Archive