A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

Ri, Hyok-Chol

doi:10.1007/s10772-019-09637-2

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

Published: 25 September 2019

Volume 22, pages 971–977, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Hyok-Chol Ri¹

151 Accesses
7 Citations
Explore all metrics

Abstract

In large vocabulary continuous speech recognition (LVCSR), it is important in improving the system’s performance to determine reasonably the recognition unit. In Korean continuous speech recognition, a morph rather than a word is used basically as the recognition unit due to Korean’s agglutinative property and a good performance is provided by combining high-frequency morph sequences, which leading to an increase of vocabulary size and high out-of-vocabulary (OOV) rate. Sub-lexical units such as a syllable and a graphone are widely used for inflectional languages, while they have not been introduced successfully for Korean speech recognition, due to a weakness of their linguistic information. In this paper, we investigate a usage of a syllable unit to resolve a mismatch problem between the recognition unit and vocabulary size that have occurred frequently in Korean large vocabulary speech recognition. We apply the local segmentation into syllables based on morphological statistics and perform experiments using the language model (LM) constructed from mixed unit types of morpheme, combined morpheme and syllable. By the proposed model, an absolute reduction of around 0.4% in word error rate (WER) is obtained compared to a traditional LM consisting of morphemes and combined morphemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Adda-Decker, M. (2003). A corpus-based decompounding algorithm for German lexical modeling in LVCSR. Proceedings European Conference on Speech Communication and Technology (pp. 257–260). Geneva, Switzerland.
Bisani, M., & Ney, H. (2005). Open vocabulary speech recognition with flat hybrid models. Interspeech (pp. 725–728), Lisbon, Portugal.
Byrne, W., Hajič, J., Ircing, P., Krbec, P., & Psutka, J. (2000). Morpheme based language models for speech recognition of Czech. Text, Speech and Dialogue, ser. Lecture Notes in Computer Science, 1902 (pp. 139–162). Berlin: Springer.
Creutz, M. (2006). Induction of the morphology of natural language: Unsupervised morpheme segmentation with application to automatic speech recognition. Ph.D. dissertation, Helsinki University of Technology, Finland, 2006.
Creutz, M., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., et al. (2007). Morph-based speech recognition and modeling of out of- vocabulary words across languages. ACM Transactions on Speech and Language Processing,5(1), 3.
Article Google Scholar
Diehl, F., Gales, M., Tomalin, M., & Woodland, P. (2012). Morphological decomposition in Arabic ASR systems. Computer Speech and Language,26, 229–243.
Article Google Scholar
El-Desoky, A., Gollan, C., Rybach, D., Schlüter, R., & Ney, H. (2009). Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR. Interspeech (pp. 2679–2682), Brighton, UK.
El-Desoky, A., Shaik, M., Schlüter, R., & Ney, H. (2010). Sub-lexical language models for German LVCSR. IEEE Workshop on Spoken Language Technology (pp. 159–164), Berkeley, CA, USA, Dec. 2010.
Hirsimaki, T. (2006). Unlimited vocabulary speech recognition with morph language models applied to Finish. Computer Speech and Language,20, 515–541.
Article Google Scholar
Huet, S. (2010). Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition. Computer Speech and Language,24, 663–684.
Article Google Scholar
Kneissler, J., & Klakow, D. (2001). Speech recognition for huge vocabularies by using optimized sub-word units. Proceedings of the European Conference on Speech Communication and Technology, 1, (pp. 69–72). Aalborg, Denmark.
Kurimo, M., Puurula, A., Arisoy, E., Siivola, V., Hirsimäki, T., Pylkkönen, J., Alumäe, T., & Saraclar, M. (2006). Unlimited vocabulary speech recognition for agglutinative languages. Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL (pp. 487–494).
Larson, M., Willett, D., Köhler, J., & Rigoll, R. (2000). Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches. Proceedings of the International Conference on Spoken Language Processing, Beijing, China.
Ordelman, R., Hassen, A. V., & Jong, F. D. (2003). Compound decomposition in Dutch large vocabulary speech recognition. Proceedings of the European Conference on Speech Communication and Technology (pp. 225–228), Geneva, Switzerland.
Piotr, M. (2008). Syllable based language model for large vocabulary continuous speech recognition of polish. Text, Speech and Dialogue, ser. Lecture Notes in Computer Science,5246, 397–401.
Article Google Scholar
Rotovnik, T., Maučec, M. S., & Kačič, Z. (2007). Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Communication,49(6), 452–537.
Article Google Scholar
Schrumpf, C., Larson, M., & Eickeler, S. (2005). Syllable-based language models in speech recognition for English spoken document retrieval. Proceedings of the 7th International Workshop of the EU Network of Excellence DELOS on AVIVDiLib (pp. 196–205). Cortona, Italy.
Shaik, M., El-Desoky, A.,Schlüter, R., & Ney, H. (2011). hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR. Interspeech (pp. 28–31). Florence, Italy.
Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. Proceedings of the International Conference on Spoken Language Processing, 2 (pp. 901–904). Denver, Colorado, USA.
Stolke, A. (2006). Morphology-based language modeling for conversational Arabic speech recognition. Computer Speech and Language,20, 589–608.
Article Google Scholar
Xu, B., Ma, B., Zhang, S., Qu, F., & Huang, T. (1996). Speaker independent dictation of Chinese speech with 32K vocabulary. Proceeding of Fourth International Conference on Spoken Language Processing (Vol. 4, pp. 2320 – 2323), Philadelphia, PA, USA.
Young, S., et al. (2006). The HTK book version 3.4. Cambridge: Cambridge University.
Google Scholar
Zitoni, I. (2003). Statistical language modeling based on variable-length sequences. Computer Speech and Language,17, 27–41.
Article Google Scholar

Download references

Acknowledgements

We appreciate the helpful discussions with Dr. Kim and Prof. Ri, anonymous reviewers and editors for many invaluable comments and suggestions to improve this paper.

Author information

Authors and Affiliations

College of Information Science, KIM IL SUNG University, Pyongyang, Democratic People’s Republic of Korea
Hyok-Chol Ri

Authors

Hyok-Chol Ri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyok-Chol Ri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ri, HC. A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system. Int J Speech Technol 22, 971–977 (2019). https://doi.org/10.1007/s10772-019-09637-2

Download citation

Received: 04 March 2019
Accepted: 16 September 2019
Published: 25 September 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10772-019-09637-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation