Requirements Classification Using FastText and BETO in Spanish Documents

Limaylla-Lunarejo, María-Isabel; Condori-Fernandez, Nelly; Luaces, Miguel R.

doi:10.1007/978-3-031-29786-1_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13975))

Included in the following conference series:

International Working Conference on Requirements Engineering: Foundation for Software Quality

1301 Accesses

Abstract

Context and motivation: Machine Learning (ML) algorithms and Natural Language Processing (NLP) techniques have effectively supported the automatic software requirements classification. The emergence of pre-trained language models, like BERT, provides promising results in several downstream NLP tasks, such as text classification. Question/problem: Most ML/DL approaches on requirements classification show a lack of analysis for requirements written in the Spanish language. Moreover, there has not been much research on pre-trained language models, like fastText and BETO (BERT for the Spanish language), neither in the validation of the generalization of the models. Principal ideas/results: We aim to investigate the classification performance and generalization of fastText and BETO classifiers in comparison with other ML/DL algorithms. The findings show that Shallow ML algorithms outperformed fastText and BETO when training and testing in the same dataset, but BETO outperformed other classifiers on prediction performance in a dataset with different origins. Contribution: Our evaluation provides a quantitative analysis of the classification performance of fastTest and BETO in comparison with ML/DL algorithms, the external validity of trained models on another Spanish dataset, and the translation of the PROMISE NFR dataset in Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Extracting Software Requirements from Unstructured Documents

Using semantic roles to improve text classification in the requirements domain

Article 11 November 2017

Automatic Requirements Reviews - Potentials, Limitations and Practical Tool Support

Notes

1.
defined by [18]: The Shallow ML contains simple artificial neural networks and other ML algorithms such as Support Vector Machine, Logistic Regression.
2.
https://translate.google.es/.
3.
https://github.com/prataffel/deep_translator.
4.
https://doi.org/10.5281/zenodo.7311148.
5.
https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased.
6.
https://fasttext.cc/docs/en/crawl-vectors.html.
7.
https://www.nltk.org/.
8.
https://github.com/huggingface/transformers.
9.
https://fasttext.cc/.
10.
https://colab.research.google.com/.
11.
https://doi.org/10.5281/zenodo.7602116.
12.
https://www.deepl.com/translator.

References

Abad, Z.S.H., Karras, O., Ghazi, P., Glinz, M., Ruhe, G., Schneider, K.: What works better? A study of classifying requirements. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference, RE 2017 (2017). https://doi.org/10.1109/RE.2017.36
AlDhafer, O., Ahmad, I., Mahmood, S.: An end-to-end deep learning system for requirements classification using recurrent neural networks. Inf. Softw. Technol. 147, 106877 (2022)
Article Google Scholar
Alrumaih, H., Mirza, A., Alsalamah, H.: Toward automated software requirements classification. In: 2018 21st Saudi Computer Society National Computer Conference (NCC), pp. 1–6. IEEE (2018)
Google Scholar
Apaza, R.D.G., Barrios, J.E.M., Becerra, D.A.I., Quispe, J.A.H.: ERS-TOOL: hybrid model for software requirements elicitation in Spanish language. In: Proceedings of the International Conference on Geoinformatics and Data Analysis, pp. 27–30 (2018)
Google Scholar
Plaza-del Arco, F.M., Molina-González, M.D., Urena-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, 114120 (2021)
Article Google Scholar
de Arriba, A., Oriol, M., Franch, X.: Applying transfer learning to sentiment analysis in social media. In: 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp. 342–348. IEEE (2021)
Google Scholar
Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)
Google Scholar
Cleland-Huang, J., Mazrouee, S., Liguo, H., Port, D.: NFR [data set], March 2007. https://doi.org/10.5281/zenodo.268542
Dalal, M.K., Zaveri, M.A.: Automatic text classification: a technical review. Int. J. Comput. Appl. 28(2), 37–40 (2011)
Google Scholar
De Arriba, A., Oriol, M., Franch, X.: Merging datasets for emotion analysis. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), pp. 227–231. IEEE (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dias Canedo, E., Cordeiro Mendes, B.: Software requirements classification using machine learning algorithms. Entropy 22(9), 1057 (2020)
Article Google Scholar
Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)
Article Google Scholar
Hao, Y., Dong, L., Wei, F., Xu, K.: Visualizing and understanding the effectiveness of BERT. arXiv preprint arXiv:1908.05620 (2019)
Hey, T., Keim, J., Koziolek, A., Tichy, W.F.: NoRBERT: transfer learning for requirements classification. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 169–179. IEEE (2020)
Google Scholar
Hussain, A., Mkpojiogu, E.O., Kamal, F.M.: The role of requirements in the success or failure of software projects. Int. Rev. Manag. Mark. 6(7S), 306–311 (2016)
Google Scholar
Instituto Cervantes: El español una lengua viva (2021). https://cvc.cervantes.es/lengua/espanol_lengua_viva/. Accessed 30 Nov 2021
Janiesch, C., Zschech, P., Heinrich, K.: Machine learning and deep learning. Electron. Mark. 31(3), 685–695 (2021). https://doi.org/10.1007/s12525-021-00475-2
Article Google Scholar
Jindal, R., Malhotra, R., Jain, A.: Techniques for text classification: literature review and current trends. Webology 12(2) (2015)
Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. ACL, Doha, October 2014. https://doi.org/10.3115/v1/D14-1181
Kurtanovic, Z., Maalej, W.: Automatically classifying functional and non-functional requirements using supervised machine learning. In: Proceedings - 2017 IEEE 25th International Requirements Engineering Conference, RE 2017 (2017). https://doi.org/10.1109/RE.2017.82
Li, G., Zheng, C., Li, M., Wang, H.: Automatic requirements classification based on graph attention network. IEEE Access 10, 30080–30090 (2022)
Article Google Scholar
Li, L.F., Jin-An, N.C., Kasirun, Z.M., Chua, Y.P.: An empirical comparison of machine learning algorithms for classification of software requirements. Int. J. Adv. Comput. Sci. Appl. 10(11) (2019)
Google Scholar
Li, Q., et al.: A survey on text classification: from shallow to deep learning. arXiv preprint arXiv:2008.00364 (2020)
Lima, M., Valle, V., Costa, E., Lira, F., Gadelha, B.: Software engineering repositories: expanding the promise database. In: Proceedings of the XXXIII Brazilian Symposium on Software Engineering, pp. 427–436 (2019)
Google Scholar
Limaylla-Lunarejo, M.I., Condori-Fernandez, N., Luaces, M.R.: Towards an automatic requirements classification in a new Spanish dataset. In: 2022 IEEE 30th International Requirements Engineering Conference (RE), pp. 270–271. IEEE (2022)
Google Scholar
Liu, S.: Sentiment analysis of yelp reviews: a comparison of techniques and models. arXiv preprint arXiv:2004.13851 (2020)
López-Úbeda, P., Plaza-del Arco, F.M., Díaz-Galiano, M.C., Martín-Valdivia, M.T.: How successful is transfer learning for detecting anorexia on social media? Appl. Sci. 11(4), 1838 (2021)
Article Google Scholar
Navarro-Almanza, R., Juarez-Ramirez, R., Licea, G.: Towards supporting software engineering using deep learning: a case of software requirements classification. In: 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT), pp. 116–120. IEEE (2017)
Google Scholar
Qiu, X.P., Sun, T.X., Xu, Y.G., Shao, Y.F., Dai, N., Huang, X.J.: Pre-trained models for natural language processing: a survey. Sci. China Technol. Sci. 63(10), 1872–1897 (2020). https://doi.org/10.1007/s11431-020-1647-3
Article Google Scholar
Quba, G.Y., Al Qaisi, H., Althunibat, A., AlZu’bi, S.: Software requirements classification using machine learning algorithm’s. In: 2021 International Conference on Information Technology (ICIT), pp. 685–690 (2021). https://doi.org/10.1109/ICIT52682.2021.9491688
Rahimi, N., Eassa, F., Elrefaei, L.: One-and two-phase software requirement classification using ensemble deep learning. Entropy 23(10), 1264 (2021)
Article Google Scholar
Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808 (2018)
Regnell, B., Svensson, R.B., Wnuk, K.: Can we beat the complexity of very large-scale requirements engineering? In: Paech, B., Rolland, C. (eds.) REFSQ 2008. LNCS, vol. 5025, pp. 123–128. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69062-7_11
Chapter Google Scholar
Sainani, A., Anish, P.R., Joshi, V., Ghaisas, S.: Extracting and classifying requirements from software engineering contracts. In: 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 147–157. IEEE (2020)
Google Scholar
Santos, I., Nedjah, N., de Macedo Mourelle, L.: Sentiment analysis using convolutional neural network with fastText embeddings. In: 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI), pp. 1–5. IEEE (2017)
Google Scholar
Sayyad Shirabad, J., Menzies, T.: The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada (2005). https://promise.site.uottawa.ca/SERepository
Tiun, S., Mokhtar, U., Bakar, S., Saad, S.: Classification of functional and non-functional requirement in software requirement using Word2vec and fast text. J. Phys. Conf. Ser. 1529, 042077 (2020)
Google Scholar
Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using N-Gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016)
Article Google Scholar
Úbeda, P.L., Díaz-Galiano, M.C., López, L.A.U., Martín-Valdivia, M.T., Martín-Noguerol, T., Luna, A.: Transfer learning applied to text classification in Spanish radiological reports. In: Proceedings of the LREC 2020 Workshop on Multilingual BIO 2020, pp. 29–32 (2020)
Google Scholar
Umer, M., et al.: Impact of convolutional neural network and fastText embedding on text classification. Multimedia Tools Appl. 82, 1–17 (2022)
Google Scholar
Vanjani, M., Aiken, M.: A comparison of free online machine language translators. J. Manag. Sci. Bus. Intell 5, 26–31 (2020)
Google Scholar
Virtanen, A., et al.: Multilingual is not enough: BERT for Finnish. arXiv preprint arXiv:1912.07076 (2019)
Xu, R., Yang, Y.: Cross-lingual distillation for text classification. arXiv preprint arXiv:1705.02073 (2017)
Zhao, L., et al.: Natural language processing (NLP) for requirements engineering: a systematic mapping study. arXiv preprint arXiv:2004.01099 (2020)

Download references

Acknowledgement

This research was partially funded by Xunta de Galicia/FEDER-UE ED413C 2021/53 (Database Lab, UDC) and Galician Ministry of Culture, Education, Professional Training, and University (grants ED431G2019/04, ED431C2022/19).

Author information

Authors and Affiliations

Fac. Informática, Database Lab., Universidade da Coruña, CITIC, A Coruña, Spain
María-Isabel Limaylla-Lunarejo & Miguel R. Luaces
CITIUS, Universidad de Santiago de Compostela, Santiago, Spain
Nelly Condori-Fernandez
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Nelly Condori-Fernandez

Authors

María-Isabel Limaylla-Lunarejo
View author publications
You can also search for this author in PubMed Google Scholar
Nelly Condori-Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Miguel R. Luaces
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nelly Condori-Fernandez .

Editor information

Editors and Affiliations

CNR ISTI, Pisa, Italy
Alessio Ferrari
Chalmers Tekniska Högskola, Gothenburg, Sweden
Birgit Penzenstadler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Limaylla-Lunarejo, MI., Condori-Fernandez, N., Luaces, M.R. (2023). Requirements Classification Using FastText and BETO in Spanish Documents. In: Ferrari, A., Penzenstadler, B. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2023. Lecture Notes in Computer Science, vol 13975. Springer, Cham. https://doi.org/10.1007/978-3-031-29786-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-29786-1_11
Published: 04 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29785-4
Online ISBN: 978-3-031-29786-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Requirements Classification Using FastText and BETO in Spanish Documents