Abstract
This paper presents a baseline spoken document retrieval system in Finnish that is based on unlimited vocabulary continuous speech recognition. Due to its agglutinative structure, Finnish speech can not be adequately transcribed using the standard large vocabulary continuous speech recognition approaches. The definition of a sufficient lexicon and the training of the statistical language models are difficult, because the words appear transformed by many inflections and compounds. In this work we apply the recently developed language model that enables n-gram models of morpheme-like subword units discovered in an unsupervised manner. In addition to word-based indexing, we also propose an indexing based on the subword units provided directly by our speech recognizer, and a combination of the both. In an initial evaluation of newsreading in Finnish, we obtained a fairly low recognition error rate and average document retrieval precisions close to what can be obtained from human reference transcripts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval track: A success story. In: Proc. Content Based Multimedia Information Access Conference (2000)
Ekman, I.: Finnish speech retrieval. Master’s thesis, University of Tampere, Finland (2003) (in Finnish)
Siivola, V., Hirsimäki, T., Creutz, M., Kurimo, M.: Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner. In: Proc. Eurospeech, pp. 2293–2296 (2003)
Creutz, M.: Unsupervised discovery of morphemes. In: Proc. Workshop on Morphological and Phonological Learning of ACL 2002, pp. 21–30 (2002)
Kneissler, J., Klakow, D.: Speech recognition for huge vocabularies by using optimized sub-word units. In: Proc. Eurospeech, pp. 69–72 (2001)
Byrne, W., Hacič, J., Ircing, P., Jelinek, F., Khudanpur, S., Krbec, P., Psutka, J.: On large vocabulary continuous speech recognition of highly inflectional language — Czech. In: Proc. Eurospeech, pp. 487–489 (2001)
Hacioglu, K., Pellom, B., Ciloglu, T., Ozturk, O., Kurimo, M., Creutz, M.: On lexicon creation for turkish LVCSR. In: Proc. Eurospeech, pp. 1165–1168 (2003)
Renals, S., Abberley, D., Kirby, D., Robinson, T.: Indexing and retrieval of broadcast news. Speech Communication 32, 5–20 (2000)
Zhou, B., Hansen, J.: Speechfind: An experimental on-line spoken document retrieval system for historical audio archives. In: Proc. ICSLP (2002)
Pylkkönen, J., Kurimo, M.: Using phone durations in Finnish large vocabulary continuous speech recognition. In: Proc. Nordic Signal Processing Symposium, NORSIG (2004)
Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Proc. ICSLP (2002)
Koskenniemi, K.: Two-level morphology: A general computational model for word-form recognition and production. PhD thesis, University of Helsinki (1983)
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishing, San Francisco (1999)
Sormunen, E.: A method for measuring wide range performance of Boolean queries in full-text databases, PhD thesis, University of Tampere (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kurimo, M., Turunen, V., Ekman, I. (2005). Speech Transcription and Spoken Document Retrieval in Finnish. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-30568-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)