Skip to main content

Speech Transcription and Spoken Document Retrieval in Finnish

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Included in the following conference series:

Abstract

This paper presents a baseline spoken document retrieval system in Finnish that is based on unlimited vocabulary continuous speech recognition. Due to its agglutinative structure, Finnish speech can not be adequately transcribed using the standard large vocabulary continuous speech recognition approaches. The definition of a sufficient lexicon and the training of the statistical language models are difficult, because the words appear transformed by many inflections and compounds. In this work we apply the recently developed language model that enables n-gram models of morpheme-like subword units discovered in an unsupervised manner. In addition to word-based indexing, we also propose an indexing based on the subword units provided directly by our speech recognizer, and a combination of the both. In an initial evaluation of newsreading in Finnish, we obtained a fairly low recognition error rate and average document retrieval precisions close to what can be obtained from human reference transcripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval track: A success story. In: Proc. Content Based Multimedia Information Access Conference (2000)

    Google Scholar 

  2. Ekman, I.: Finnish speech retrieval. Master’s thesis, University of Tampere, Finland (2003) (in Finnish)

    Google Scholar 

  3. Siivola, V., Hirsimäki, T., Creutz, M., Kurimo, M.: Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner. In: Proc. Eurospeech, pp. 2293–2296 (2003)

    Google Scholar 

  4. Creutz, M.: Unsupervised discovery of morphemes. In: Proc. Workshop on Morphological and Phonological Learning of ACL 2002, pp. 21–30 (2002)

    Google Scholar 

  5. Kneissler, J., Klakow, D.: Speech recognition for huge vocabularies by using optimized sub-word units. In: Proc. Eurospeech, pp. 69–72 (2001)

    Google Scholar 

  6. Byrne, W., Hacič, J., Ircing, P., Jelinek, F., Khudanpur, S., Krbec, P., Psutka, J.: On large vocabulary continuous speech recognition of highly inflectional language — Czech. In: Proc. Eurospeech, pp. 487–489 (2001)

    Google Scholar 

  7. Hacioglu, K., Pellom, B., Ciloglu, T., Ozturk, O., Kurimo, M., Creutz, M.: On lexicon creation for turkish LVCSR. In: Proc. Eurospeech, pp. 1165–1168 (2003)

    Google Scholar 

  8. Renals, S., Abberley, D., Kirby, D., Robinson, T.: Indexing and retrieval of broadcast news. Speech Communication 32, 5–20 (2000)

    Article  Google Scholar 

  9. Zhou, B., Hansen, J.: Speechfind: An experimental on-line spoken document retrieval system for historical audio archives. In: Proc. ICSLP (2002)

    Google Scholar 

  10. Pylkkönen, J., Kurimo, M.: Using phone durations in Finnish large vocabulary continuous speech recognition. In: Proc. Nordic Signal Processing Symposium, NORSIG (2004)

    Google Scholar 

  11. Stolcke, A.: SRILM-an extensible language modeling toolkit. In: Proc. ICSLP (2002)

    Google Scholar 

  12. Koskenniemi, K.: Two-level morphology: A general computational model for word-form recognition and production. PhD thesis, University of Helsinki (1983)

    Google Scholar 

  13. Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishing, San Francisco (1999)

    Google Scholar 

  14. Sormunen, E.: A method for measuring wide range performance of Boolean queries in full-text databases, PhD thesis, University of Tampere (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kurimo, M., Turunen, V., Ekman, I. (2005). Speech Transcription and Spoken Document Retrieval in Finnish. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30568-2_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24509-4

  • Online ISBN: 978-3-540-30568-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics