Abstract
Syllabification may be considered trivial for humans, but it can prove to be a challenging task in terms of automated text analysis. In this study, we explore three approaches to syllabify words in Romanian using state-of-the-art deep learning architectures in sequence prediction, namely BiLSTM, CNN, and transformer. In contrast to previous approaches, our models take into account the part of speech of the word, which in return can weigh heavily in situations where words have the same written form, but different syllabification. Our best model obtains an accuracy of approximately 98% using a conditional random field on top of the BiLSTM architecture, surpassing all previous state-of-the-art models. Our model represents a building block for multiple smart learning ecosystems, ranging from better hyphenation software for text evaluation, to text-to-speech and speech-to-text frameworks employed in intelligent houses or personal assistants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vintila-Radulescu, I. (ed.): DOOM2. Dicționarul ortografic, ortoepic și morfologic al limbii române. Univers Enciclopedic, Bucharest, Romania (2005)
Mañas, J.A.: Word division in Spanish. Commun. ACM 30(7), 612–616 (1987)
Kodydek, G.: A word analysis system for German hyphenation, full text search, and spell checking, with regard to the latest reform of German orthography. In: International Workshop on Text, Speech and Dialogue, pp. 39–44. Springer, Brno, Czech Republic (2000)
Bouma, G.: Finite state methods for hyphenation. Nat. Lang. Eng. 9(1), 5–20 (2003)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)
Liang, F.M.: Word hyphenation by computer. PhD, Department of Computer Science, Stanford University, Stanford, CA, USA (1983)
Rosenbaum, W.S.: Digital reference hyphenation matrix apparatus for automatically forming hyphenated words. In: USPTO (ed.). Google Patents, United States (1977)
Hersey, I.L., Stephens, R.L., Zamora, A.: Computer method for ranked hyphenation of multilingual text. In: USPTO (ed.). Google Patents, United States (1994)
Carlgren, R.G., Reed, M.A., Rosenbaum, W.S.: Mixed mode enhanced resolution hyphenation function for a text processing system. In: USPTO (ed.). Google Patents, United States (1986)
Alonichau, S., Shahani, R., Powell, K.: Multi-lingual word hyphenation using inductive machine learning on training data. In: USPTO (ed.). Google Patents, United States (2015)
Trogkanis, N., Elkan, C.: Conditional random fields for word hyphenation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 366–374. The Association for Computer Linguistics, Uppsala, Sweden (2010)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, Williamstown, MA, USA (2001)
Bartlett, S., Kondrak, G., Cherry, C.: Automatic syllabification with structured SVMs for letter-to-phoneme conversion. In: Proceedings of ACL-08: HLT, pp. 568–576. The Association for Computer Linguistics, Columbus, OH, USA (2008)
Barbu, A.-M.: Romanian lexical data bases: inflected and syllabic forms dictionaries. In: International Conference on Language Resources and Evaluation. European Language Resources Association, Marrakech, Morocco (2008)
Dinu, L.P., Niculae, V., Sulea, O.-M.: Romanian syllabication using machine learning. In: International Conference on Text, Speech and Dialogue, vol. 8082, pp. 450–456. Springer, Pilsen, Czech Republic (2013)
Boroş, T.: A unified lexical processing framework based on the Margin Infused Relaxed Algorithm. A case study on the Romanian Language. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 91–97. RANLP 2013 Organising Committee/ACL, Hissar, Bulgaria (2013)
Boros, T., Dumitrescu, S.D., Pipa, S.: Fast and accurate decision trees for natural language processing tasks. In: RANLP, pp. 103–110. INCOMA Ltd., Varna, Bulgaria (2017)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112. Curran Associates, Inc., Montreal, Quebec, Canada (2014)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008. Curran Associates, Inc., Long Beach, CA, USA (2017)
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings. Convolutional Neural Networks and Incremental Parsing 7(1) (2017)
Acknowledgements
This work was funded by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS—UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES—“Automated Text Evaluation and Simplification.”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Corlatescu, DG., Ruseti, S., Dascalu, M. (2022). Romanian Syllabification Using Deep Neural Networks. In: Mealha, Ó., Dascalu, M., Di Mascio, T. (eds) Ludic, Co-design and Tools Supporting Smart Learning Ecosystems and Smart Education. Smart Innovation, Systems and Technologies, vol 249. Springer, Singapore. https://doi.org/10.1007/978-981-16-3930-2_8
Download citation
DOI: https://doi.org/10.1007/978-981-16-3930-2_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3929-6
Online ISBN: 978-981-16-3930-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)