Requirements and GitHub Issues: An Automated Approach for Quality Requirements Classification

Pérez-Verdejo, J. Manuel; Sánchez-García, Á. J.; Ocharán-Hernández, J. O.; Mezura-Montes, E.; Cortés-Verdín, K.

doi:10.1134/S0361768821080193

Requirements and GitHub Issues: An Automated Approach for Quality Requirements Classification

Published: 28 December 2021

Volume 47, pages 704–721, (2021)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

J. Manuel Pérez-Verdejo¹,
Á. J. Sánchez-García¹,
J. O. Ocharán-Hernández¹,
E. Mezura-Montes² &
…
K. Cortés-Verdín¹

507 Accesses
7 Citations
Explore all metrics

Abstract

In the development of quality software, critical decisions related to planning, estimating, and managing resources are bound to the correct and timely identification of the system needs. In particular, the process of classifying this customer input into software requirements categories tends to become tedious and error-prone when it comes to large-scale systems. On the ground described by a complementary systematic literature review, this research introduces a proposal on the application of Machine Learning techniques for automated software requirements classification. In this regard, the training and later hyperparameter optimization through Differential Evolution of five classification models are carried out based on quality attributes examples found in the available literature. As a case study, these models are tested with issue reports collected from five open-source projects at GitHub to identify quality-attributes-related knowledge on such user feedback. The finding of the most characteristic terms by quality attribute through the TF-IDF algorithm stands out from the training. The results show a moderately high ability to classify other generic software requirements correctly, achieving a Geometric Mean of up to 82.51%. However, the same classifiers applied to issue reports showed significant difficulties identifying information related to quality attributes, since an F-Score no greater than 50% was reached.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Notes

REFERENCES

Young, R.R., The Requirements Engineering Handbook, Norwood, MA: Artech House, 2004.
Google Scholar
Dick, J., Hull, E., and Jackson, K., Requirements Engineering, Cham: Springer Int. Publ., 2017. https://doi.org/10.1007/978-3-319-61073-3
Book MATH Google Scholar
Davis, A.M., Just Enough Requirements Management: Where Software Development Meets Marketing, Dorset House Publ. Co., 2005.
Google Scholar
Glinz, M., A Glossary of Requirements Engineering Terminology, Zurich, 2017.
Google Scholar
Wiegers, K.E. and Beatty, J., Software Requirements, 3rd ed., Redmond, WA: Microsoft Press, 2013.
Google Scholar
Hochmuller, E., Requirements classification as a first step to grasp quality requirements, Proc. 3rd Int. Workshop on Requirements Engineering: Foundations of Software Quality, Barcelona, 1997.
Lauesen, S., Software Requirements: Styles and Techniques, Pearson Education, 2002.
Google Scholar
Wiegers, K.E., More about Software Requirements: Thorny Issues and Practical Advice, Microsoft Press, 2006.
Google Scholar
Sommerville, I., Software Engineering, Pearson Education Limited, 2016.
MATH Google Scholar
McCall, J.A., Richards, P.K., and Walters, G.F., Factors in Software Quality, New York, 1977.
Google Scholar
Bass, L., Clements, P., and Kazman, R., Software Architecture in Practice, 3rd ed., Addison-Wesley Professional, 2012.
Google Scholar
Tello-Rodríguez, M., Ocharán-Hernández, J.O., Pérez-Arriaga, J.C., Limón, X., and Sánchez-García, Á.J., A design guide for usable web APIs, Program. Comput. Software, 2020, vol. 46, no. 8, pp. 584–593. https://doi.org/10.1134/S0361768820080241
Article Google Scholar
Meth, H., Brhel, M., and Maedche, A., The state of the art in automated requirements elicitation, Inf. Software Technol., 2013, vol. 55, no. 10, pp. 1695–1709. https://doi.org/10.1016/j.infsof.2013.03.008
Article Google Scholar
Kitchenham, B. and Charters, S., Guidelines for Performing Systematic Literature Reviews in Software Engineering, Durham, 2007.
Google Scholar
Pérez-Verdejo, J.M., Sánchez-García, A.J., and Ocharán-Hernández, J.O., A systematic literature review on machine learning for automated requirements classification, Proc. 8th Int. Conf. in Software Engineering Research and Innovation (CONISOFT), Nov. 2020, pp. 21–28. https://doi.org/10.1109/CONISOFT50191.2020.00014
Abad, Z.S.H., Karras, O., Ghazi, P., Glinz, M., Ruhe, G., and Schneider, K., What works better? A study of classifying requirements, Proc. 25th IEEE Int. Requirements Engineering Conf. (RE), Lisbon, 2017, pp. 496–501. https://doi.org/10.1109/RE.2017.36
Baker, C., Deng, L., Chakraborty, S., and Dehlinger, J., Automatic multi-class non-functional software requirements classification using neural networks, Proc. 43rd IEEE Annu. Computer Software and Applications Confer. (COMPSAC), Milwaukee, WI, July 2019, vol. 2, pp. 610–615. https://doi.org/10.1109/COMPSAC.2019.10275
Dekhtyar, A. and Fong, V., RE data challenge: requirements identification with Word2Vec and TensorFlow, Proc. 25th IEEE Int. Requirements Engineering Conf. (RE), Lisbon, 2017, pp. 484–489. https://doi.org/10.1109/RE.2017.26
Iqbal, T., Elahidoost, P., and Lucio, L., A bird’s eye view on requirements engineering and machine learning, Proc. Asia-Pacific Software Engineering Conf., APSEC, Putrajaya, Dec. 2019, vol. 2018-Decem, pp. 11–20. https://doi.org/10.1109/APSEC.2018.00015
Jindal, R., Malhotra, R., and Jain, A., Automated classification of security requirements, Proc. Int. Conf. on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, 2016, pp. 2027–2033. https://doi.org/10.1109/ICACCI.2016.7732349
Kurtanovic, Z. and Maalej, W., Automatically classifying functional and non-functional requirements using supervised machine learning, Proc. 25th IEEE Int. Requirements Engineering Conf. (RE), Lisbon, Sep. 2017, pp. 490–495. https://doi.org/10.1109/RE.2017.82
Li, C., Huang, L., Ge, J., Luo, B., and Ng, V., Automatically classifying user requests in crowdsourcing requirements engineering, J. Syst. Software, 2018, vol. 138, pp. 108–123. https://doi.org/10.1016/j.jss.2017.12.028
Article Google Scholar
Lu, M. and Liang, P., Automatic classification of non-functional requirements from augmented app user reviews, Proc. 21st Int. Conf. on Evaluation and Assessment in Software Engineering, Karlskrona, 2017, pp. 344–353. https://doi.org/10.1145/3084226.3084241
Marinho, M., Arruda, D., Wanderley, F., and Lins, A., A systematic approach of dataset definition for a supervised machine learning using NFR framework, Proc. 11th Int. Conf. on the Quality of Information and Communications Technology (QUATIC), Coimbra, 2018, pp. 110–118. https://doi.org/10.1109/QUATIC.2018.00024
Riaz, M., King, J., Slankas, J., and Williams, L., Hidden in plain sight: automatically identifying security requirements from natural language artifacts, Proc. 22nd IEEE Int. Requirements Engineering Conf. (RE), Karlskrona, Aug. 2014, pp. 183–192. https://doi.org/10.1109/RE.2014.6912260
Sharma, R., Bhatia, J., and Biswas, K.K., Automated identification of business rules in requirements documents, Proc. IEEE Int. Advance Computing Conf. (IACC), Gurgaon, Feb. 2014, pp. 1442–1447. https://doi.org/10.1109/IAdCC.2014.6779538
Taj, S., Arain, Q., Memon, I., and Zubedi, A., To apply data mining for classification of crowd sourced software requirements, Proc. 8th Int. Conf. on Software and Information Engineering, Cairo, 2019, pp. 42–46. https://doi.org/10.1145/3328833.3328837
Wang, C., Zhang, F., Liang, P., Daneva, M., and van Sinderen, M., Can app changelogs improve requirements classification from app reviews? An exploratory study, Proc. 12th ACM/IEEE Int. Symp. on Empirical Software Engineering and Measurement, Oulu, 2018, pp. 43:1–43:4. https://doi.org/10.1145/3239235.3267428
Rodgers, M., et al., Testing methodological guidance on the conduct of narrative synthesis in systematic reviews: effectiveness of interventions to promote smoke alarm ownership and function, Evaluation, 2009, vol. 15, no. 1, pp. 49–73. https://doi.org/10.1177/1356389008097871
Article Google Scholar
Mitchell, T.M., Machine Learning, 1st ed., McGraw-Hill, 1997.
MATH Google Scholar
Sayyad Shirabad, J. and Menzies, T.J., The PROMISE repository of software engineering databases, 2005. http://promise.site.uottawa.ca/SERepository
ISO/IEC Standard no. 25010:2011: Systems and Software Engineering. Systems and Software Quality Requirements and Evaluation (SQuaRE). System and Software Quality Models, 2011.
Wang, W., Mahakala, K.R., Gupta, A., Hussein, N., and Wang, Y., A linear classifier based approach for identifying security requirements in open source software development, J. Ind. Inf. Integr., 2019, vol. 14, pp. 34–40. https://doi.org/10.1016/j.jii.2018.11.001
Google Scholar
Leibzon, W., Social network of software development at GitHub, Proc. IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, Aug. 2016, pp. 1374–1376. https://doi.org/10.1109/ASONAM.2016.7752419
Bissyandé, T.F., Lo, D., Jiang, L., Réveillère, L., Klein, J., and Traon, Y.L., Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub, Proc. 24th IEEE Int. Symp. on Software Reliability Engineering (ISSRE), Pasadena, CA, Nov. 2013, pp. 188–197. https://doi.org/10.1109/ISSRE.2013.6698918
Nielsen, J., 10 Usability Heuristics for User Interface Design, Nielsen Norman Group, 1994.
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., and Smyth, P., Knowledge discovery and data mining: towards a unifying framework, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, KDD’96, Portland, 1996, vol. 96, pp. 82–88.
Miller, R.E., The Quest for Software Requirements, Oconomowoc, WI: MavenMark Books, 2009.
Google Scholar
Ferrari, A., Spagnolo, G.O., and Gnesi, S., PURE: a dataset of public requirements documents, Proc. 25th IEEE Int. Requirements Engineering Conf. (RE), Lisbon, 2017, pp. 502–505. https://doi.org/10.1109/RE.2017.29
Shehata, M.S., Eberlei, A., and Hoover, H.J., Requirements reuse and feature interaction management, Proc. 15th Int. Conf. on Software and Systems Engineering and Their Applications (ICSSEA’02), Paris, 2002.
Chang, A.X. and Manning, C., SUTime: a library for recognizing and normalizing time expressions, Proc. 8th Int. Conf. on Language Resources and Evaluation (LREC12), Istanbul, May 2012, pp. 3735–3740. http://www.lrec-conf.org/proceedings/lrec2012/pdf/284_Paper.pdf
Finkel, J.R., Grenager, T., and Manning, C., Incorporating non-local information into information extraction systems by Gibbs sampling, Proc. 43rd Annu. Meeting on Association for Computational Linguistics– ACL’05, Ann Arbor, 2005, pp. 363–370. https://doi.org/10.3115/1219840.1219885
Toutanova, K. and Manning, C.D., Enriching the knowledge sources used in a maximum entropy partof-speech tagger, Proc. 2000 Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora Held in Conjunction with the 38th Annu. Meeting of the Association for Computational Linguistics, Hong Kong, 2000, vol. 13, pp. 63–70. https://doi.org/10.3115/1117794.1117802
Nakayama, H., Kubo, T., Kamura, J., Taniguchi, Y., and Liang, X., Doccano: text annotation tool for human, 2018. https://github.com/doccano/doccano
Kowsari, K., Meimandi, K.J., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D., Text classification algorithms: a survey, Information (Switzerland), 2019, vol. 10, no. 4. https://doi.org/10.3390/info10040150
Plisson, J., Lavrac, N., and Mladenic, D., A rule based approach to word lemmatization, Proc. 7th Int. Multiconf. Information Society (IS’04), 2004, pp. 83–86. http://eprints.pascal-network.org/archive/00000715/
Bird, S., Klein, E., and Loper, E., Natural Language Processing with Python, O’Reilly Media, 2009.
MATH Google Scholar
Tokunaga, T. and Iwayama, M., Text Categorization Based on Weighted Inverse Document Frequency, 1994.
Pedregosa, F., et al., Scikit-learn: machine learning in Python, J. Mach. Learn. Res., 2011, vol. 12, pp. 2825–2830.
MathSciNet MATH Google Scholar
Kesavaraj, G. and Sukumaran, S., A study on classification techniques in data mining, Proc. 4th Int. Conf. on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, July 2013, pp. 1–7. https://doi.org/10.1109/ICCCNT.2013.6726842
Feurer, M. and Hutter, F., Hyperparameter optimization, in Automated Machine Learning: Methods, Systems, Challenges, Hutter, F., Kotthoff, L., and Vanschoren, J., Eds., Cham: Springer Int. Publ., 2019, pp. 3–33. https://doi.org/10.1007/978-3-030-05318-5_1
Book Google Scholar
Tan, P.-N., Steinbach, M., and Kumar, V., Introduction to Data Mining, 1st ed., Boston, MA: Addison-Wesley Longman Publ. Co., 2005.
Google Scholar
Chawla, N.V., Japkowicz, N., and Kotcz, A., Editorial: special issue on learning from imbalanced data sets, ACM SIGKDD Explor. Newsl., 2004, vol. 6, no. 1, pp. 1–6. https://doi.org/10.1145/1007730.1007733
Article Google Scholar
Al Helal, M., Haydar, M.S., and Mostafa, S.A.M., Algorithms efficiency measurement on imbalanced data using geometric mean and cross validation, Proc. Int. Workshop on Computational Intelligence (IWCI), Mexico, Dec. 2016, pp. 110–114. https://doi.org/10.1109/IWCI.2016.7860349
Sun, Y., Wong, A.K.C., and Kamel, M.S., Classification of imbalances data: a review, Int. J. Pattern Recogn. Artif. Intell., 2009, vol. 23, no. 4, pp. 687–719. https://doi.org/10.1142/S0218001409007326
Article Google Scholar
Lemaître, G., Nogueira, F., and Aridas, C.K., Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning, J. Mach. Learn. Res., 2017, vol. 18, no. 17, pp. 1–5. http://jmlr.org/papers/v18/16-365
Google Scholar
Alcalá, R., Gacto, M.J., and Alcalá-Fdez, J., Evolutionary data mining and applications: a revision on the most cited papers from the last 10 years (2007–2017), WIREs Data Min. Knowl. Discov., 2018, vol. 8, no. 2. https://doi.org/10.1002/widm.1239
Storn, R. and Price, K., Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces, J. Global Optim., 1995, vol. 23, no. 1.
Storn, R., On the usage of differential evolution for function optimization, Proc. Conf. on North American Fuzzy Information Processing, Berkeley, Jun. 1996, pp. 519–523. https://doi.org/10.1109/NAFIPS.1996.534789
López-Ibáñez, M., Dubois-Lacoste, J., Pérez Cáceres, L., Birattari, M., and Stützle, T., The irace package: iterated racing for automatic algorithm configuration, Oper. Res. Perspect., 2016, vol. 3, pp. 43–58. https://doi.org/10.1016/j.orp.2016.09.002
MathSciNet Google Scholar
Olson, R.S., Bartley, N., Urbanowicz, R.J., and Moore, J.H., Evaluation of a tree-based pipeline optimization tool for automating data science, Proc. Conf. on Genetic and Evolutionary Computation GECCO’16, Denver, 2016, pp. 485–492. https://doi.org/10.1145/2908812.2908918
GitHub, The state of the octoverse, 2019. https://octoverse.github.com. Accessed June 13, 2020.
Russell, M.A., Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and other Social Media Sites, 2nd ed., O’Reilly Media, 2013.
Google Scholar
Kuncheva, L.I., Arnaiz-González, Á., Díez-Pastor, J.-F., and Gunn, I.A.D., Instance selection improves geometric mean accuracy: a study on imbalanced data classification, Prog. Artif. Intell., 2019, vol. 8, no. 2, pp. 215–228. https://doi.org/10.1007/s13748-019-00172-4
Article Google Scholar
Mueller, A., et al., amueller/word_cloud: WordCloud 1.5.0., Zenodo, Jul. 2018. https://doi.org/10.5281/zenodo.1322068
Cui, W., Wu, Y., Liu, S., Wei, F., Zhou, M.X., and Qu, H., Context preserving dynamic word cloud visualization, Proc. IEEE Pacific Visualization Symp. (PacificVis), Taipei, March 2010, pp. 121–128. https://doi.org/10.1109/PACIFICVIS.2010.5429600
McInnes, L., Healy, J., and Melville, J., UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, Feb. 2018. http://arxiv.org/abs/1802.03426
Bengfort, B. and Bilbro, R., Yellowbrick: visualizing the scikit-learn model selection process, J. Open Source Software, 2019, vol. 4, no. 35, p. 1075. https://doi.org/10.21105/joss.01075
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J., Distributed Representations of Words and Phrases and Their Compositionality, Oct. 2013, http://arxiv.org/abs/1310.4546
Kuang, S. and Davison, B., Learning word embeddings with chi-square weights for healthcare tweet classification, Appl. Sci., 2017, vol. 7, no. 8, p. 846. https://doi.org/10.3390/app7080846
Article Google Scholar
Virtanen, P. et al., SciPy 1.0: fundamental algorithms for scientific computing in Python, Nat. Methods, 2020, vol. 17, pp. 261–272. https://doi.org/10.1038/s41592-019-0686-2
Article Google Scholar
Kavaler, D., Sirovica, S., Hellendoorn, V., Aranovich, R., and Filkov, V., Perceived language complexity in GitHub issue discussions and their effect on issue resolution, Proc. 32nd IEEE/ACM Int. Conf. on Automated Software Engineering (ASE), Urbana-Champaign, IL, Oct. 2017, pp. 72–83. https://doi.org/10.1109/ASE.2017.8115620
Rago, A., Marcos, C., and Diaz-Pace, J.A., Using semantic roles to improve text classification in the requirements domain, Lang. Resour. Eval., 2018, vol. 52, no. 3, pp. 801–837. https://doi.org/10.1007/s10579-017-9406-7
Article Google Scholar
Palacio, D.N., McCrystal, D., Moran, K., Bernal-Cardenas, C., Poshyvanyk, D., and Shenefiel, C., Learning to identify security-related issues using convolutional neural networks, Proc. IEEE Int. Conf. on Software Maintenance and Evolution, ICSME 2019, Cleveland, 2019, pp. 140–144. https://doi.org/10.1109/ICSME.2019.00024
Mashechkin, I.V., Petrovskiy, M.I., Tsarev, D.V., and Chikunov, M.N., Machine learning methods for detecting and monitoring extremist information on the Internet, Program. Comput. Software, 2019, vol. 45, no. 3, pp. 99–115. https://doi.org/10.1134/S0361768819030058
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018. arXiv1810.04805

Download references

Author information

Authors and Affiliations

School of Statistics and Informatics, Universidad Veracruzana, 91020, Xalapa-Enríquez, Ver., Mexico
J. Manuel Pérez-Verdejo, Á. J. Sánchez-García, J. O. Ocharán-Hernández & K. Cortés-Verdín
Artificial Intelligence Research Institute, Universidad Veracruzana, 91090, Xalapa-Enríquez, Ver., Mexico
E. Mezura-Montes

Authors

J. Manuel Pérez-Verdejo
View author publications
You can also search for this author in PubMed Google Scholar
Á. J. Sánchez-García
View author publications
You can also search for this author in PubMed Google Scholar
J. O. Ocharán-Hernández
View author publications
You can also search for this author in PubMed Google Scholar
E. Mezura-Montes
View author publications
You can also search for this author in PubMed Google Scholar
K. Cortés-Verdín
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to J. Manuel Pérez-Verdejo, Á. J. Sánchez-García, J. O. Ocharán-Hernández, E. Mezura-Montes or K. Cortés-Verdín.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pérez-Verdejo, J.M., Sánchez-García, Á.J., Ocharán-Hernández, J.O. et al. Requirements and GitHub Issues: An Automated Approach for Quality Requirements Classification. Program Comput Soft 47, 704–721 (2021). https://doi.org/10.1134/S0361768821080193

Download citation

Received: 12 July 2021
Revised: 30 July 2021
Accepted: 15 August 2021
Published: 28 December 2021
Issue Date: December 2021
DOI: https://doi.org/10.1134/S0361768821080193

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions