Abstract
The paper describes the development of a large vocabulary continuous speech recogniser for Slovenian language with SNABI database. The problems with inflectional languages when speech recognition is performed are presented. The system is based on hidden Markov models. For acoustic modeling biphones were used whereas for language modeling bigrams and trigrams were used. To improve the recognition result and to enable fast operation of the recogniser, speaker adaptation is also used. The optimal system with the adapted acoustic model and bigram language model achieved word accuracy of 91.30% at near 10× real time. The unadapted system with the trigram language model achieved the word accuracy of 89.56%, but it was also slower than the optimal system. Its run time was 15.3× real time.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Žibert, J., Mihelič, F.: Govorna zbirka vremenskih napovedi. Information Society multiconference: Language Technologies, Ljubljana, Slovenia, 2000.
Kaiser, J., Kačič, Z.: Development of the Slovenian SpeechDat database. Proc. First International Conference on Language Resources and Evaluation (LREC-1998), Granada, Spain, 1998.
Johansen, F.T., Warakagoda, N., Lindberg, B., Lehtinen, G., Kačič, Z., Žgank, A., Elenius, K., Salvi, G.: The COST 249 SpeechDat Multilingual Reference Recogniser. Proc. Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, May, 2000.
Lindberg, B., Johansen, F.T., Warakagoda, N., Lehtinen, G., Kačič, Z., Žgank, A., Elenius, K., Salvi, G.: A noise robust multilingual reference recogniser based on SpeechDat(II). ICSLP 2000: the proceedings of the 6th conference, Beijing, China, 2000.
Imperl, B., Köhler, J., Kačič, Z.: On the use of semi-continuous HMM for the isolated digits recognition over the telephone. Proceedings of the COST 249, 250, 258 workshop: Speech technology in the public telephone network: Where are we today? Rhodes, Greece, 26–27 September 1997, 41–44.
Ipšič, I., Mihelič, F., Dobrišek, S., Gros, J., Pavešić, N.: A Slovenian Spoken Dialog System for Air Flight Inquires. Proceedings of the Eurospeech’ 99, Budapest, Hungary, 1999, 2659–2662.
Kačič, Z., Horvat, B., Zögling A.: Issues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language. Proc. Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, May, 2000.
Byrne, W., Hajič, J., Ircing, P., Jelinek, F., Khudanpur, S., McDonough, J., Peterek, N., Psutka, J.: Large Vocabulary Speech Recognition for Read and Broadcast Czech. In: Proceedings of the Second Workshop on Text, Speech, Dialogue-TSD99, Pilsen, Czech Republic, September 1999.
Žgank, A.: The Development of UMB Broadcast News 1996 Transcription System. In: Advances in Speech Technology: International Workshop, Maribor, Slovenia, 4–5 July 2000.
Byrne, W., Hajič, J., Ircing, P., Krbec, P., Psutka, J.: Morpheme Based Language Models for Speech Recognition of Czech. In: Proceedings of the Third Workshop on Text, Speech, Dialogue-TSD 2000, Brno, Czech Republic, September 2000, 211–216.
Malkovsky, M.G., Subbotin, A.V.: NL-Processor and Linguistic Knowledge Base in a Speech Recognition System. In: Proceedings of the Third Workshop on Text, Speech, Dialogue-TSD 2000, Brno, Czech Republic, September 2000, 237–242.
Young, S., Ollason, D., Valtchev, V., Woodland, P.: The HTK book (for HTK version 2.1). Entropic Cambridge Research Laboratory, March 1997.
Clarkson, P.R., Rosenfeld, R.: Statistical Language Modeling Using the CMU-Cambridge Toolkit. Proc. of the Eurospeech’ 97, Rhodes, Greece, 1997.
Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD Thesis, 1995.
Leggetter, C.J., Woodland, P.C.: Flexible Speaker Adaptation using Maximum Likelihood Linear Regression. Proc. ARPA Spoken Language Technology Workshop, Austin, Texas, February, 1995, 104–109.
Niemöller, M., Hauenstein, A., Marschall, E., Witschel, P., Harke, U.: A PC-Based Real-Time Large Vocabulary Continuous Speech Recognizer for German. ICASSP’97: the proceedings of the conference, Munich, Germany, 1997.
Nouza, J., A Large Czech Vocabulary Recognition System for Real-Time Applications. In: Proceedings of the Third Workshop on Text, Speech, Dialogue-TSD 2000, Brno, Czech Republic, September 2000, 217–222.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Žgank, A., Ka7#x010D;ič, Z., Horvat, B. (2001). Large Vocabulary Continuous Speech Recognizer for Slovenian Language. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_32
Download citation
DOI: https://doi.org/10.1007/3-540-44805-5_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive