Skip to main content

Requirements Classification Using FastText and BETO in Spanish Documents

  • Conference paper
  • First Online:
Requirements Engineering: Foundation for Software Quality (REFSQ 2023)

Abstract

Context and motivation: Machine Learning (ML) algorithms and Natural Language Processing (NLP) techniques have effectively supported the automatic software requirements classification. The emergence of pre-trained language models, like BERT, provides promising results in several downstream NLP tasks, such as text classification. Question/problem: Most ML/DL approaches on requirements classification show a lack of analysis for requirements written in the Spanish language. Moreover, there has not been much research on pre-trained language models, like fastText and BETO (BERT for the Spanish language), neither in the validation of the generalization of the models. Principal ideas/results: We aim to investigate the classification performance and generalization of fastText and BETO classifiers in comparison with other ML/DL algorithms. The findings show that Shallow ML algorithms outperformed fastText and BETO when training and testing in the same dataset, but BETO outperformed other classifiers on prediction performance in a dataset with different origins. Contribution: Our evaluation provides a quantitative analysis of the classification performance of fastTest and BETO in comparison with ML/DL algorithms, the external validity of trained models on another Spanish dataset, and the translation of the PROMISE NFR dataset in Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    defined by [18]: The Shallow ML contains simple artificial neural networks and other ML algorithms such as Support Vector Machine, Logistic Regression.

  2. 2.

    https://translate.google.es/.

  3. 3.

    https://github.com/prataffel/deep_translator.

  4. 4.

    https://doi.org/10.5281/zenodo.7311148.

  5. 5.

    https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased.

  6. 6.

    https://fasttext.cc/docs/en/crawl-vectors.html.

  7. 7.

    https://www.nltk.org/.

  8. 8.

    https://github.com/huggingface/transformers.

  9. 9.

    https://fasttext.cc/.

  10. 10.

    https://colab.research.google.com/.

  11. 11.

    https://doi.org/10.5281/zenodo.7602116.

  12. 12.

    https://www.deepl.com/translator.

References

  1. Abad, Z.S.H., Karras, O., Ghazi, P., Glinz, M., Ruhe, G., Schneider, K.: What works better? A study of classifying requirements. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference, RE 2017 (2017). https://doi.org/10.1109/RE.2017.36

  2. AlDhafer, O., Ahmad, I., Mahmood, S.: An end-to-end deep learning system for requirements classification using recurrent neural networks. Inf. Softw. Technol. 147, 106877 (2022)

    Article  Google Scholar 

  3. Alrumaih, H., Mirza, A., Alsalamah, H.: Toward automated software requirements classification. In: 2018 21st Saudi Computer Society National Computer Conference (NCC), pp. 1–6. IEEE (2018)

    Google Scholar 

  4. Apaza, R.D.G., Barrios, J.E.M., Becerra, D.A.I., Quispe, J.A.H.: ERS-TOOL: hybrid model for software requirements elicitation in Spanish language. In: Proceedings of the International Conference on Geoinformatics and Data Analysis, pp. 27–30 (2018)

    Google Scholar 

  5. Plaza-del Arco, F.M., Molina-González, M.D., Urena-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, 114120 (2021)

    Article  Google Scholar 

  6. de Arriba, A., Oriol, M., Franch, X.: Applying transfer learning to sentiment analysis in social media. In: 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp. 342–348. IEEE (2021)

    Google Scholar 

  7. Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)

    Google Scholar 

  8. Cleland-Huang, J., Mazrouee, S., Liguo, H., Port, D.: NFR [data set], March 2007. https://doi.org/10.5281/zenodo.268542

  9. Dalal, M.K., Zaveri, M.A.: Automatic text classification: a technical review. Int. J. Comput. Appl. 28(2), 37–40 (2011)

    Google Scholar 

  10. De Arriba, A., Oriol, M., Franch, X.: Merging datasets for emotion analysis. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), pp. 227–231. IEEE (2021)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  12. Dias Canedo, E., Cordeiro Mendes, B.: Software requirements classification using machine learning algorithms. Entropy 22(9), 1057 (2020)

    Article  Google Scholar 

  13. Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)

    Article  Google Scholar 

  14. Hao, Y., Dong, L., Wei, F., Xu, K.: Visualizing and understanding the effectiveness of BERT. arXiv preprint arXiv:1908.05620 (2019)

  15. Hey, T., Keim, J., Koziolek, A., Tichy, W.F.: NoRBERT: transfer learning for requirements classification. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 169–179. IEEE (2020)

    Google Scholar 

  16. Hussain, A., Mkpojiogu, E.O., Kamal, F.M.: The role of requirements in the success or failure of software projects. Int. Rev. Manag. Mark. 6(7S), 306–311 (2016)

    Google Scholar 

  17. Instituto Cervantes: El español una lengua viva (2021). https://cvc.cervantes.es/lengua/espanol_lengua_viva/. Accessed 30 Nov 2021

  18. Janiesch, C., Zschech, P., Heinrich, K.: Machine learning and deep learning. Electron. Mark. 31(3), 685–695 (2021). https://doi.org/10.1007/s12525-021-00475-2

    Article  Google Scholar 

  19. Jindal, R., Malhotra, R., Jain, A.: Techniques for text classification: literature review and current trends. Webology 12(2) (2015)

    Google Scholar 

  20. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  21. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. ACL, Doha, October 2014. https://doi.org/10.3115/v1/D14-1181

  22. Kurtanovic, Z., Maalej, W.: Automatically classifying functional and non-functional requirements using supervised machine learning. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference, RE 2017 (2017). https://doi.org/10.1109/RE.2017.82

  23. Li, G., Zheng, C., Li, M., Wang, H.: Automatic requirements classification based on graph attention network. IEEE Access 10, 30080–30090 (2022)

    Article  Google Scholar 

  24. Li, L.F., Jin-An, N.C., Kasirun, Z.M., Chua, Y.P.: An empirical comparison of machine learning algorithms for classification of software requirements. Int. J. Adv. Comput. Sci. Appl. 10(11) (2019)

    Google Scholar 

  25. Li, Q., et al.: A survey on text classification: from shallow to deep learning. arXiv preprint arXiv:2008.00364 (2020)

  26. Lima, M., Valle, V., Costa, E., Lira, F., Gadelha, B.: Software engineering repositories: expanding the promise database. In: Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pp. 427–436 (2019)

    Google Scholar 

  27. Limaylla-Lunarejo, M.I., Condori-Fernandez, N., Luaces, M.R.: Towards an automatic requirements classification in a new Spanish dataset. In: 2022 IEEE 30th International Requirements Engineering Conference (RE), pp. 270–271. IEEE (2022)

    Google Scholar 

  28. Liu, S.: Sentiment analysis of yelp reviews: a comparison of techniques and models. arXiv preprint arXiv:2004.13851 (2020)

  29. López-Úbeda, P., Plaza-del Arco, F.M., Díaz-Galiano, M.C., Martín-Valdivia, M.T.: How successful is transfer learning for detecting anorexia on social media? Appl. Sci. 11(4), 1838 (2021)

    Article  Google Scholar 

  30. Navarro-Almanza, R., Juarez-Ramirez, R., Licea, G.: Towards supporting software engineering using deep learning: a case of software requirements classification. In: 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT), pp. 116–120. IEEE (2017)

    Google Scholar 

  31. Qiu, X.P., Sun, T.X., Xu, Y.G., Shao, Y.F., Dai, N., Huang, X.J.: Pre-trained models for natural language processing: a survey. Sci. China Technol. Sci. 63(10), 1872–1897 (2020). https://doi.org/10.1007/s11431-020-1647-3

    Article  Google Scholar 

  32. Quba, G.Y., Al Qaisi, H., Althunibat, A., AlZu’bi, S.: Software requirements classification using machine learning algorithm’s. In: 2021 International Conference on Information Technology (ICIT), pp. 685–690 (2021). https://doi.org/10.1109/ICIT52682.2021.9491688

  33. Rahimi, N., Eassa, F., Elrefaei, L.: One-and two-phase software requirement classification using ensemble deep learning. Entropy 23(10), 1264 (2021)

    Article  Google Scholar 

  34. Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808 (2018)

  35. Regnell, B., Svensson, R.B., Wnuk, K.: Can we beat the complexity of very large-scale requirements engineering? In: Paech, B., Rolland, C. (eds.) REFSQ 2008. LNCS, vol. 5025, pp. 123–128. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69062-7_11

    Chapter  Google Scholar 

  36. Sainani, A., Anish, P.R., Joshi, V., Ghaisas, S.: Extracting and classifying requirements from software engineering contracts. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 147–157. IEEE (2020)

    Google Scholar 

  37. Santos, I., Nedjah, N., de Macedo Mourelle, L.: Sentiment analysis using convolutional neural network with fastText embeddings. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp. 1–5. IEEE (2017)

    Google Scholar 

  38. Sayyad Shirabad, J., Menzies, T.: The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada (2005). https://promise.site.uottawa.ca/SERepository

  39. Tiun, S., Mokhtar, U., Bakar, S., Saad, S.: Classification of functional and non-functional requirement in software requirement using Word2vec and fast text. J. Phys. Conf. Ser. 1529, 042077 (2020)

    Google Scholar 

  40. Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using N-Gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016)

    Article  Google Scholar 

  41. Úbeda, P.L., Díaz-Galiano, M.C., López, L.A.U., Martín-Valdivia, M.T., Martín-Noguerol, T., Luna, A.: Transfer learning applied to text classification in Spanish radiological reports. In: Proceedings of the LREC 2020 Workshop on Multilingual BIO 2020, pp. 29–32 (2020)

    Google Scholar 

  42. Umer, M., et al.: Impact of convolutional neural network and fastText embedding on text classification. Multimedia Tools Appl. 82, 1–17 (2022)

    Google Scholar 

  43. Vanjani, M., Aiken, M.: A comparison of free online machine language translators. J. Manag. Sci. Bus. Intell 5, 26–31 (2020)

    Google Scholar 

  44. Virtanen, A., et al.: Multilingual is not enough: BERT for Finnish. arXiv preprint arXiv:1912.07076 (2019)

  45. Xu, R., Yang, Y.: Cross-lingual distillation for text classification. arXiv preprint arXiv:1705.02073 (2017)

  46. Zhao, L., et al.: Natural language processing (NLP) for requirements engineering: a systematic mapping study. arXiv preprint arXiv:2004.01099 (2020)

Download references

Acknowledgement

This research was partially funded by Xunta de Galicia/FEDER-UE ED413C 2021/53 (Database Lab, UDC) and Galician Ministry of Culture, Education, Professional Training, and University (grants ED431G2019/04, ED431C2022/19).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nelly Condori-Fernandez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Limaylla-Lunarejo, MI., Condori-Fernandez, N., Luaces, M.R. (2023). Requirements Classification Using FastText and BETO in Spanish Documents. In: Ferrari, A., Penzenstadler, B. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2023. Lecture Notes in Computer Science, vol 13975. Springer, Cham. https://doi.org/10.1007/978-3-031-29786-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29786-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29785-4

  • Online ISBN: 978-3-031-29786-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics