Abstract
The work presented in this paper deals with the text normalization for highly inflectional languages. This paper is focused on abbreviation expansion and likewise on numerals normalization. Our text normalization system does not use any explicit parser or part-of-speech tagger and thus it can be called lightly supervised. The standard rule-based text normalization method is compared with the proposed statistical-based one in the task of expansion of Czech abbreviations.
This research was supported by the Grant Agency of the Czech Republic, project No. GAČR 102/08/0707 and the Technology Agency of the Czech Republic, project No. TA01011264 and the Ministry of Education of the Czech Republic, project No. MŠMT LC536.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hippman, R., Dostálová, T., Zvárová, J., Nagy, M., Seydlová, M., Hanzlíček, P., Kříž, P., Šmídl, L., Trmal, J.: Voice-supported electronic health record for temporomandibular joint disorders. Methods of Information in Medicine 49, 168–172 (2010)
Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: An empirical analysis of supervised learning performance criteria, pp. 69–78. ACM Press, New York (2004)
Shen, Y.: Loss Functions for Binary Classification and Class Probability Estimation. PhD thesis (2005)
Sproat, R.: Lightly supervised learning of text normalization: Russian number names. In: IEEE Workshop on Spoken Language Technology, Berkeley, U.S.A (2010)
Schlippe, T., Zhu, C., Gebhardt, J., Schultz, T.: Text normalization based on statistical machine translation and internet user support. In: INTERSPEECH, pp. 1816–1819 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zelinka, J., Romportl, J., Müller, L. (2011). Statistical-Based Abbreviation Expansion. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)