Skip to main content

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 249))

  • 275 Accesses

Abstract

Syllabification may be considered trivial for humans, but it can prove to be a challenging task in terms of automated text analysis. In this study, we explore three approaches to syllabify words in Romanian using state-of-the-art deep learning architectures in sequence prediction, namely BiLSTM, CNN, and transformer. In contrast to previous approaches, our models take into account the part of speech of the word, which in return can weigh heavily in situations where words have the same written form, but different syllabification. Our best model obtains an accuracy of approximately 98% using a conditional random field on top of the BiLSTM architecture, surpassing all previous state-of-the-art models. Our model represents a building block for multiple smart learning ecosystems, ranging from better hyphenation software for text evaluation, to text-to-speech and speech-to-text frameworks employed in intelligent houses or personal assistants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Vintila-Radulescu, I. (ed.): DOOM2. Dicționarul ortografic, ortoepic și morfologic al limbii române. Univers Enciclopedic, Bucharest, Romania (2005)

    Google Scholar 

  2. Mañas, J.A.: Word division in Spanish. Commun. ACM 30(7), 612–616 (1987)

    Article  Google Scholar 

  3. Kodydek, G.: A word analysis system for German hyphenation, full text search, and spell checking, with regard to the latest reform of German orthography. In: International Workshop on Text, Speech and Dialogue, pp. 39–44. Springer, Brno, Czech Republic (2000)

    Google Scholar 

  4. Bouma, G.: Finite state methods for hyphenation. Nat. Lang. Eng. 9(1), 5–20 (2003)

    Article  Google Scholar 

  5. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)

    MathSciNet  Google Scholar 

  6. Liang, F.M.: Word hyphenation by computer. PhD, Department of Computer Science, Stanford University, Stanford, CA, USA (1983)

    Google Scholar 

  7. Rosenbaum, W.S.: Digital reference hyphenation matrix apparatus for automatically forming hyphenated words. In: USPTO (ed.). Google Patents, United States (1977)

    Google Scholar 

  8. Hersey, I.L., Stephens, R.L., Zamora, A.: Computer method for ranked hyphenation of multilingual text. In: USPTO (ed.). Google Patents, United States (1994)

    Google Scholar 

  9. Carlgren, R.G., Reed, M.A., Rosenbaum, W.S.: Mixed mode enhanced resolution hyphenation function for a text processing system. In: USPTO (ed.). Google Patents, United States (1986)

    Google Scholar 

  10. Alonichau, S., Shahani, R., Powell, K.: Multi-lingual word hyphenation using inductive machine learning on training data. In: USPTO (ed.). Google Patents, United States (2015)

    Google Scholar 

  11. Trogkanis, N., Elkan, C.: Conditional random fields for word hyphenation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 366–374. The Association for Computer Linguistics, Uppsala, Sweden (2010)

    Google Scholar 

  12. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, Williamstown, MA, USA (2001)

    Google Scholar 

  13. Bartlett, S., Kondrak, G., Cherry, C.: Automatic syllabification with structured SVMs for letter-to-phoneme conversion. In: Proceedings of ACL-08: HLT, pp. 568–576. The Association for Computer Linguistics, Columbus, OH, USA (2008)

    Google Scholar 

  14. Barbu, A.-M.: Romanian lexical data bases: inflected and syllabic forms dictionaries. In: International Conference on Language Resources and Evaluation. European Language Resources Association, Marrakech, Morocco (2008)

    Google Scholar 

  15. Dinu, L.P., Niculae, V., Sulea, O.-M.: Romanian syllabication using machine learning. In: International Conference on Text, Speech and Dialogue, vol. 8082, pp. 450–456. Springer, Pilsen, Czech Republic (2013)

    Google Scholar 

  16. Boroş, T.: A unified lexical processing framework based on the Margin Infused Relaxed Algorithm. A case study on the Romanian Language. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 91–97. RANLP 2013 Organising Committee/ACL, Hissar, Bulgaria (2013)

    Google Scholar 

  17. Boros, T., Dumitrescu, S.D., Pipa, S.: Fast and accurate decision trees for natural language processing tasks. In: RANLP, pp. 103–110. INCOMA Ltd., Varna, Bulgaria (2017)

    Google Scholar 

  18. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112. Curran Associates, Inc., Montreal, Quebec, Canada (2014)

    Google Scholar 

  19. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  21. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  22. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008. Curran Associates, Inc., Long Beach, CA, USA (2017)

    Google Scholar 

  23. Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings. Convolutional Neural Networks and Incremental Parsing 7(1) (2017)

    Google Scholar 

Download references

Acknowledgements

This work was funded by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS—UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES—“Automated Text Evaluation and Simplification.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Dascalu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Corlatescu, DG., Ruseti, S., Dascalu, M. (2022). Romanian Syllabification Using Deep Neural Networks. In: Mealha, Ó., Dascalu, M., Di Mascio, T. (eds) Ludic, Co-design and Tools Supporting Smart Learning Ecosystems and Smart Education. Smart Innovation, Systems and Technologies, vol 249. Springer, Singapore. https://doi.org/10.1007/978-981-16-3930-2_8

Download citation

Publish with us

Policies and ethics