Speech Transcription and Spoken Document Retrieval in Finnish

Kurimo, Mikko; Turunen, Ville; Ekman, Inger

doi:10.1007/978-3-540-30568-2_22

Mikko Kurimo¹⁸,
Ville Turunen¹⁸ &
Inger Ekman¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

935 Accesses
1 Citations

Abstract

This paper presents a baseline spoken document retrieval system in Finnish that is based on unlimited vocabulary continuous speech recognition. Due to its agglutinative structure, Finnish speech can not be adequately transcribed using the standard large vocabulary continuous speech recognition approaches. The definition of a sufficient lexicon and the training of the statistical language models are difficult, because the words appear transformed by many inflections and compounds. In this work we apply the recently developed language model that enables n-gram models of morpheme-like subword units discovered in an unsupervised manner. In addition to word-based indexing, we also propose an indexing based on the subword units provided directly by our speech recognizer, and a combination of the both. In an initial evaluation of newsreading in Finnish, we obtained a fairly low recognition error rate and average document retrieval precisions close to what can be obtained from human reference transcripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval track: A success story. In: Proc. Content Based Multimedia Information Access Conference (2000)
Google Scholar
Ekman, I.: Finnish speech retrieval. Master’s thesis, University of Tampere, Finland (2003) (in Finnish)
Google Scholar
Siivola, V., Hirsimäki, T., Creutz, M., Kurimo, M.: Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner. In: Proc. Eurospeech, pp. 2293–2296 (2003)
Google Scholar
Creutz, M.: Unsupervised discovery of morphemes. In: Proc. Workshop on Morphological and Phonological Learning of ACL 2002, pp. 21–30 (2002)
Google Scholar
Kneissler, J., Klakow, D.: Speech recognition for huge vocabularies by using optimized sub-word units. In: Proc. Eurospeech, pp. 69–72 (2001)
Google Scholar
Byrne, W., Hacič, J., Ircing, P., Jelinek, F., Khudanpur, S., Krbec, P., Psutka, J.: On large vocabulary continuous speech recognition of highly inflectional language — Czech. In: Proc. Eurospeech, pp. 487–489 (2001)
Google Scholar
Hacioglu, K., Pellom, B., Ciloglu, T., Ozturk, O., Kurimo, M., Creutz, M.: On lexicon creation for turkish LVCSR. In: Proc. Eurospeech, pp. 1165–1168 (2003)
Google Scholar
Renals, S., Abberley, D., Kirby, D., Robinson, T.: Indexing and retrieval of broadcast news. Speech Communication 32, 5–20 (2000)
Article Google Scholar
Zhou, B., Hansen, J.: Speechfind: An experimental on-line spoken document retrieval system for historical audio archives. In: Proc. ICSLP (2002)
Google Scholar
Pylkkönen, J., Kurimo, M.: Using phone durations in Finnish large vocabulary continuous speech recognition. In: Proc. Nordic Signal Processing Symposium, NORSIG (2004)
Google Scholar
Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Proc. ICSLP (2002)
Google Scholar
Koskenniemi, K.: Two-level morphology: A general computational model for word-form recognition and production. PhD thesis, University of Helsinki (1983)
Google Scholar
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishing, San Francisco (1999)
Google Scholar
Sormunen, E.: A method for measuring wide range performance of Boolean queries in full-text databases, PhD thesis, University of Tampere (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Neural Networks Research Centre, Helsinki University of Technology, FI-02150, Espoo, Finland
Mikko Kurimo & Ville Turunen
Department of Information Studies, University of Tampere, Finland
Inger Ekman

Authors

Mikko Kurimo
View author publications
You can also search for this author in PubMed Google Scholar
Ville Turunen
View author publications
You can also search for this author in PubMed Google Scholar
Inger Ekman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
IDIAP Research Institute, CH-1920, Martigny, Switzerland
Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurimo, M., Turunen, V., Ekman, I. (2005). Speech Transcription and Spoken Document Retrieval in Finnish. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-30568-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics