Abstract
The task of finding syllable boundaries can be straightforward or challenging, depending on the language. Text-to-speech applications have been shown to perform considerably better when syllabication, whether orthographic or phonetic, is employed as a means of breaking down the text into units bellow word level. Romanian syllabication is non-trivial mainly but not exclusively due to its hiatus-diphthong ambiguity. This phenomenon affects both phonetic and orthographic syllabication. In this paper, we focus on orthographic syllabication for Romanian and show that the task can be carried out with a high degree of accuracy by using sequence tagging. We compare this approach to support vector machines and rule-based methods. The features we used are simply character n-grams with end-of-word marking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bartlett, S., Kondrak, G., Cherry, C.: Automatic syllabification with structured svms for letter to phoneme conversion. In: 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2008: HLT), pp. 568–576. Association for Computational Linguistics, Columbus (2008)
Collective: Collective: Dictionarul ortografic, ortoepic si morfologic al limbii române., 2nd edn., revised. Romanian Academy, Bucharest (2010) (in Romanian)
Trogkanis, N., Elkan, C.: Conditional Random Fields for word hyphenation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 366–374. Association for Computational Linguistics, Uppsala (2010)
Toma, S.A., Oancea, E., Munteanu, D.: Automatic rule-based syllabication for Romanian. In: Proceedings of the 5th Conference on Speech Technology and Human-Computer Dialogue (2009)
Dinu, A., Dinu, L.P.: A parallel approach to syllabification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 83–87. Springer, Heidelberg (2005)
Chitoran, I., Hualde, J.I.: From hiatus to diphthong: the evolution of vowel sequences in romance. Phonology 24, 37–75 (2007)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields (CRFs) (2007)
Barbu, A.M.: Romanian lexical databases: Inflected and syllabic forms dictionaries. In: Sixth International Language Resources and Evaluation (LREC 2008) (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dinu, L.P., Niculae, V., Sulea, OM. (2013). Romanian Syllabication Using Machine Learning. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_57
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)