Abstract
Context and motivation: Machine Learning (ML) algorithms and Natural Language Processing (NLP) techniques have effectively supported the automatic software requirements classification. The emergence of pre-trained language models, like BERT, provides promising results in several downstream NLP tasks, such as text classification. Question/problem: Most ML/DL approaches on requirements classification show a lack of analysis for requirements written in the Spanish language. Moreover, there has not been much research on pre-trained language models, like fastText and BETO (BERT for the Spanish language), neither in the validation of the generalization of the models. Principal ideas/results: We aim to investigate the classification performance and generalization of fastText and BETO classifiers in comparison with other ML/DL algorithms. The findings show that Shallow ML algorithms outperformed fastText and BETO when training and testing in the same dataset, but BETO outperformed other classifiers on prediction performance in a dataset with different origins. Contribution: Our evaluation provides a quantitative analysis of the classification performance of fastTest and BETO in comparison with ML/DL algorithms, the external validity of trained models on another Spanish dataset, and the translation of the PROMISE NFR dataset in Spanish.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
defined by [18]: The Shallow ML contains simple artificial neural networks and other ML algorithms such as Support Vector Machine, Logistic Regression.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Abad, Z.S.H., Karras, O., Ghazi, P., Glinz, M., Ruhe, G., Schneider, K.: What works better? A study of classifying requirements. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference, RE 2017 (2017). https://doi.org/10.1109/RE.2017.36
AlDhafer, O., Ahmad, I., Mahmood, S.: An end-to-end deep learning system for requirements classification using recurrent neural networks. Inf. Softw. Technol. 147, 106877 (2022)
Alrumaih, H., Mirza, A., Alsalamah, H.: Toward automated software requirements classification. In: 2018 21st Saudi Computer Society National Computer Conference (NCC), pp. 1–6. IEEE (2018)
Apaza, R.D.G., Barrios, J.E.M., Becerra, D.A.I., Quispe, J.A.H.: ERS-TOOL: hybrid model for software requirements elicitation in Spanish language. In: Proceedings of the International Conference on Geoinformatics and Data Analysis, pp. 27–30 (2018)
Plaza-del Arco, F.M., Molina-González, M.D., Urena-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, 114120 (2021)
de Arriba, A., Oriol, M., Franch, X.: Applying transfer learning to sentiment analysis in social media. In: 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp. 342–348. IEEE (2021)
Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)
Cleland-Huang, J., Mazrouee, S., Liguo, H., Port, D.: NFR [data set], March 2007. https://doi.org/10.5281/zenodo.268542
Dalal, M.K., Zaveri, M.A.: Automatic text classification: a technical review. Int. J. Comput. Appl. 28(2), 37–40 (2011)
De Arriba, A., Oriol, M., Franch, X.: Merging datasets for emotion analysis. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), pp. 227–231. IEEE (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dias Canedo, E., Cordeiro Mendes, B.: Software requirements classification using machine learning algorithms. Entropy 22(9), 1057 (2020)
Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)
Hao, Y., Dong, L., Wei, F., Xu, K.: Visualizing and understanding the effectiveness of BERT. arXiv preprint arXiv:1908.05620 (2019)
Hey, T., Keim, J., Koziolek, A., Tichy, W.F.: NoRBERT: transfer learning for requirements classification. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 169–179. IEEE (2020)
Hussain, A., Mkpojiogu, E.O., Kamal, F.M.: The role of requirements in the success or failure of software projects. Int. Rev. Manag. Mark. 6(7S), 306–311 (2016)
Instituto Cervantes: El español una lengua viva (2021). https://cvc.cervantes.es/lengua/espanol_lengua_viva/. Accessed 30 Nov 2021
Janiesch, C., Zschech, P., Heinrich, K.: Machine learning and deep learning. Electron. Mark. 31(3), 685–695 (2021). https://doi.org/10.1007/s12525-021-00475-2
Jindal, R., Malhotra, R., Jain, A.: Techniques for text classification: literature review and current trends. Webology 12(2) (2015)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. ACL, Doha, October 2014. https://doi.org/10.3115/v1/D14-1181
Kurtanovic, Z., Maalej, W.: Automatically classifying functional and non-functional requirements using supervised machine learning. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference, RE 2017 (2017). https://doi.org/10.1109/RE.2017.82
Li, G., Zheng, C., Li, M., Wang, H.: Automatic requirements classification based on graph attention network. IEEE Access 10, 30080–30090 (2022)
Li, L.F., Jin-An, N.C., Kasirun, Z.M., Chua, Y.P.: An empirical comparison of machine learning algorithms for classification of software requirements. Int. J. Adv. Comput. Sci. Appl. 10(11) (2019)
Li, Q., et al.: A survey on text classification: from shallow to deep learning. arXiv preprint arXiv:2008.00364 (2020)
Lima, M., Valle, V., Costa, E., Lira, F., Gadelha, B.: Software engineering repositories: expanding the promise database. In: Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pp. 427–436 (2019)
Limaylla-Lunarejo, M.I., Condori-Fernandez, N., Luaces, M.R.: Towards an automatic requirements classification in a new Spanish dataset. In: 2022 IEEE 30th International Requirements Engineering Conference (RE), pp. 270–271. IEEE (2022)
Liu, S.: Sentiment analysis of yelp reviews: a comparison of techniques and models. arXiv preprint arXiv:2004.13851 (2020)
López-Úbeda, P., Plaza-del Arco, F.M., Díaz-Galiano, M.C., Martín-Valdivia, M.T.: How successful is transfer learning for detecting anorexia on social media? Appl. Sci. 11(4), 1838 (2021)
Navarro-Almanza, R., Juarez-Ramirez, R., Licea, G.: Towards supporting software engineering using deep learning: a case of software requirements classification. In: 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT), pp. 116–120. IEEE (2017)
Qiu, X.P., Sun, T.X., Xu, Y.G., Shao, Y.F., Dai, N., Huang, X.J.: Pre-trained models for natural language processing: a survey. Sci. China Technol. Sci. 63(10), 1872–1897 (2020). https://doi.org/10.1007/s11431-020-1647-3
Quba, G.Y., Al Qaisi, H., Althunibat, A., AlZu’bi, S.: Software requirements classification using machine learning algorithm’s. In: 2021 International Conference on Information Technology (ICIT), pp. 685–690 (2021). https://doi.org/10.1109/ICIT52682.2021.9491688
Rahimi, N., Eassa, F., Elrefaei, L.: One-and two-phase software requirement classification using ensemble deep learning. Entropy 23(10), 1264 (2021)
Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808 (2018)
Regnell, B., Svensson, R.B., Wnuk, K.: Can we beat the complexity of very large-scale requirements engineering? In: Paech, B., Rolland, C. (eds.) REFSQ 2008. LNCS, vol. 5025, pp. 123–128. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69062-7_11
Sainani, A., Anish, P.R., Joshi, V., Ghaisas, S.: Extracting and classifying requirements from software engineering contracts. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 147–157. IEEE (2020)
Santos, I., Nedjah, N., de Macedo Mourelle, L.: Sentiment analysis using convolutional neural network with fastText embeddings. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp. 1–5. IEEE (2017)
Sayyad Shirabad, J., Menzies, T.: The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada (2005). https://promise.site.uottawa.ca/SERepository
Tiun, S., Mokhtar, U., Bakar, S., Saad, S.: Classification of functional and non-functional requirement in software requirement using Word2vec and fast text. J. Phys. Conf. Ser. 1529, 042077 (2020)
Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using N-Gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016)
Úbeda, P.L., Díaz-Galiano, M.C., López, L.A.U., Martín-Valdivia, M.T., Martín-Noguerol, T., Luna, A.: Transfer learning applied to text classification in Spanish radiological reports. In: Proceedings of the LREC 2020 Workshop on Multilingual BIO 2020, pp. 29–32 (2020)
Umer, M., et al.: Impact of convolutional neural network and fastText embedding on text classification. Multimedia Tools Appl. 82, 1–17 (2022)
Vanjani, M., Aiken, M.: A comparison of free online machine language translators. J. Manag. Sci. Bus. Intell 5, 26–31 (2020)
Virtanen, A., et al.: Multilingual is not enough: BERT for Finnish. arXiv preprint arXiv:1912.07076 (2019)
Xu, R., Yang, Y.: Cross-lingual distillation for text classification. arXiv preprint arXiv:1705.02073 (2017)
Zhao, L., et al.: Natural language processing (NLP) for requirements engineering: a systematic mapping study. arXiv preprint arXiv:2004.01099 (2020)
Acknowledgement
This research was partially funded by Xunta de Galicia/FEDER-UE ED413C 2021/53 (Database Lab, UDC) and Galician Ministry of Culture, Education, Professional Training, and University (grants ED431G2019/04, ED431C2022/19).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Limaylla-Lunarejo, MI., Condori-Fernandez, N., Luaces, M.R. (2023). Requirements Classification Using FastText and BETO in Spanish Documents. In: Ferrari, A., Penzenstadler, B. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2023. Lecture Notes in Computer Science, vol 13975. Springer, Cham. https://doi.org/10.1007/978-3-031-29786-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-29786-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29785-4
Online ISBN: 978-3-031-29786-1
eBook Packages: Computer ScienceComputer Science (R0)