Systematic literature review on software quality for AI-based software

Gezici, Bahar; Tarhan, Ayça Kolukısa

doi:10.1007/s10664-021-10105-2

Systematic literature review on software quality for AI-based software

Published: 17 March 2022

Volume 27, article number 66, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

4404 Accesses
1 Altmetric
Explore all metrics

Abstract

There is a widespread demand for Artificial Intelligence (AI) software, specifically Machine Learning (ML). It is getting increasingly popular and being adopted in various applications we use daily. AI-based software quality is different from traditional software quality because it generally addresses distinct and more complex kinds of problems. With the fast advance of AI technologies and related techniques, how to build high-quality AI-based software becomes a very prominent subject. This paper aims at investigating the state of the art on software quality (SQ) for AI-based systems and identifying quality attributes, applied models, challenges, and practices that are reported in the literature. We carried out a systematic literature review (SLR) from 1988 to 2020 to (i) analyze and understand related primary studies and (ii) synthesize limitations and open challenges to drive future research. Our study provides a road map for researchers to understand quality challenges, attributes, and practices in the context of software quality for AI-based software better. From the empirical evidence that we have gathered by this SLR, we suggest future work on this topic be structured under three categories which are Definition/Specification, Design/Evaluation, and Process/Socio-technical.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Quality for AI: Where We Are Now?

Synergies Between Artificial Intelligence and Software Engineering: Evolution and Trends

Impact of AI Tools on Software Development Code Quality

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

25000 I (2005) The iso/iec 25000 series of standards. https://iso25000.com/index.php/en/iso-25000-standards
25012:2008 I (2008) software engineering — software product quality requirements and evaluation (square) — data quality model. https://www.iso.org/standard/35736.html
26262-1:2018 I (2018) Road vehicles — functional safety. https://www.iso.org/standard/68383.html
29119-1:2013 I (2013) Software and systems engineering — software testing. https://www.iso.org/standard/45142.html
9126-1:2001 I (2001) Software engineering — product quality. https://www.iso.org/standard/22749.html
Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 625–635
Alamin MAA, Uddin G (2021) Quality assurance challenges for machine learning software applications during software development life cycle phases. arXiv:2105.01195
Ali Z Quality measurement challenges for artificial intelligence software
de Almeida Biolchini JC, Mian PG, Natali ACC, Conte T, Travassos GH (2007) Scientific research ontology to support systematic review in software engineering. Adv Eng Inform 21(2):133–151
Article Google Scholar
Arpteg A, Brinne B, Crnkovic-Friis L, Bosch J (2018) Software engineering challenges of deep learning. In: 2018 44Th euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 50–59
Borg M, Englund C, Wnuk K, Duran B, Levandowski C, Gao S, Tan Y, Kaijser H, Lönn H, Törnqvist J (2018) Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. arXiv:1812.05389
Bosch J, Olsson HH, Crnkovic I (2021) Engineering ai systems: a research agenda. In: Artificial intelligence paradigms for smart cyber-physical systems. IGI Global, pp 1–19
Bourque P, Dupuis R, Abran A, Moore JW, Tripp L (2004) Guide to the software engineering body of knowledge -
Braiek HB, Khomh F (2020) On testing machine learning programs. J Syst Softw 164:110542
Article Google Scholar
Byrne C (2017) Development Workflows for Data Scientists. O’Reilly Media
Chen R, Bastani FB, Tsao TW (1995) On the reliability of ai planning software in real-time applications. IEEE Trans Knowl Data Eng 7(1):4–13
Article Google Scholar
Cummaudo A, Vasa R, Grundy J, Abdelrazek M, Cain A (2019) Losing confidence in quality: Unspoken evolution of computer vision services. In: 2019 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 333–342
Deng L (2018) Artificial intelligence in the rising wave of deep learning: The historical path and future outlook [perspectives]. IEEE Signal Proc Mag 35(1):180–177
Article Google Scholar
Forward A, Lethbridge TC (2008) A taxonomy of software types to facilitate search and evidence-based software engineering. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp 179–191
Garousi V, Felderer M, Mäntylä MV (2016) The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, pp 1–6
Geske F, Hofmann P, Lämmermann L, Schlatt V, Urbach N (2021) Gateways to artificial intelligence: Developing a taxonomy for ai service platforms. In: Twenty-ninth european conference on information systems (ECIS)
Gezici B, Tarhan AK (2019) Final pool. https://drive.google.com/file/d/1ve6BpJTrITsfo6auSoWKh48ajWbNb05n/view?usp=sharing
Hamada K, Ishikawa F, Masuda S, Matsuya M, Ujita Y (2020) Guidelines for quality assurance of machine learning-based artificial intelligence. In: SEKE2020: The 32nd international conference on software engineering & knowledge engineering, pp 335–341
Hannousse A (2021) Searching relevant papers for software engineering secondary studies: Semantic scholar coverage and identification role. IET Softw 15 (1):126–146
Article Google Scholar
Henriksson J, Borg M, Englund C (2018) Automotive safety and machine learning: Initial results from a study on how to adapt the iso 26262 safety standard. In: 2018 IEEE/ACM 1St international workshop on software engineering for AI in autonomous systems (SEFAIAS). IEEE, pp 47–49
Hopgood AA (2005) The state of artificial intelligence. Adv Comput 65:1–75
Article Google Scholar
Horkoff J (2019) Non-functional requirements for machine learning: Challenges and new directions. In: 2019 IEEE 27Th international requirements engineering conference (RE). IEEE, pp 386–391
Hyun Park S, Seon Shin W, Hyun Park Y, Lee Y (2017) Building a new culture for quality management in the era of the fourth industrial revolution. Total Qual Manag Bus Excell 28(9-10):934–945
Article Google Scholar
Ishikawa F, Yoshioka N (2019) How do engineers perceive difficulties in engineering of machine-learning systems?-questionnaire survey. In: 2019 IEEE/ACM Joint 7th international workshop on conducting empirical studies in industry (CESI) and 6th international workshop on software engineering research and industrial practice (SER&IP). IEEE, pp 2–9
ISO/IEC (2011) Iso/iec 25010 (2011)-systems and software quality requirements and evaluation (square)-system and software quality models. International Standard ISO/IEC 25010 2(1):1–25
Google Scholar
Ivarsson M, Gorschek T (2011) A method for evaluating rigor and industrial relevance of technology evaluations. Empir Softw Eng 16(3):365–395
Article Google Scholar
Kitchenham B (2004) Procedures for performing systematic reviews. keele, UK. Keele Univ 33(2004):1–26
Google Scholar
Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering -
Kuwajima H, Ishikawa F (2019) Adapting square for quality assessment of artificial intelligence systems. In: 2019 IEEE International symposium on software reliability engineering workshops (ISSREW). IEEE, pp 13–18
Kuwajima H, Yasuoka H, Nakae T (2018) Open problems in engineering and quality assurance of safety critical machine learning systems. arXiv:1812.03057
Kuwajima H, Yasuoka H, Nakae T (2020) Engineering problems in machine learning systems. Mach Learn 109(5):1103–1126
Article MathSciNet MATH Google Scholar
Lakshen GA, Vraneš S., Janev V (2016) Big data and quality: A literature review. In: 2016 24Th telecommunications forum (TELFOR). IEEE, pp 1–4
Lenarduzzi V, Lomio F, Moreschini S, Taibi D, Tamburri DA (2021) Software quality for ai: Where we are now?. In: International conference on software quality. Springer, pp 43–53
Liu Y, Ma L, Zhao J (2019) Secure deep learning engineering: a road towards quality assurance of intelligent systems. In: International conference on formal engineering methods. Springer, pp 3–15
Lwakatare LE, Raj A, Crnkovic I, Bosch J, Olsson HH (2020) Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions. Inf Softw Technol 127:106368
Article Google Scholar
Malik V, Singh S (2020) Artificial intelligent environments: risk management and quality assurance implementation. J Discret Math Sci Cryptogr 23 (1):187–195
Article MATH Google Scholar
Mannarswamy S, Roy S, Chidambaram S (2020) Tutorial on software testing & quality assurance for machine learning applications from research bench to real world. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp 373–374
Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems: A survey. arXiv:2105.01984
Masuda S, Ono K, Yasue T, Hosokawa N (2018) A survey of software quality for machine learning applications. In: 2018 IEEE International conference on software testing, verification and validation workshops (ICSTW). IEEE, pp 279–284
Murphy C, Kaiser GE, Arias M (2006) A framework for quality assurance of machine learning applications -
Nakajima S (2018) Quality assurance of machine learning software. In: 2018 IEEE 7Th global conference on consumer electronics (GCCE). IEEE, pp 601–604
Nakajima S (2019) Distortion and faults in machine learning software. In: International workshop on structured object-oriented formal language and method. Springer, pp 29–41
Nakamichi K, Ohashi K, Namba I, Yamamoto R, Aoyama M, Joeckel L, Siebert J, Heidrich J (2020) Requirements-driven method to determine quality characteristics and measurements for machine learning software and its evaluation. In: 2020 IEEE 28Th international requirements engineering conference (RE). IEEE, pp 260–270
Nascimento E, Nguyen-Duc A, Sundbø I, Conte T (2020) Software engineering for artificial intelligence and machine learning software: A systematic literature review. arXiv:2011.03751
Nguyen-Duc A, Abrahamsson P (2020) Continuous experimentation on artificial intelligence software: a research agenda. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1513–1516
Nishi Y, Masuda S, Ogawa H, Uetsuki K (2018) A test architecture for machine learning product. In: 2018 IEEE International conference on software testing, verification and validation workshops (ICSTW). IEEE, pp 273–278
Ongsulee P (2017) Artificial intelligence, machine learning and deep learning. In: 2017 15Th international conference on ICT and knowledge engineering (ICT&KE). IEEE, pp 1–6
Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18
Article Google Scholar
Pons L, Ozkaya I (2019) Priority quality attributes for engineering ai-enabled systems. arXiv:1911.02912
Poth A, Meyer B, Schlicht P, Riel A (2020) Quality assurance for machine learning–an approach to function and system safeguarding. In: 2020 IEEE 20Th international conference on software quality, reliability and security (QRS). IEEE, pp 22–29
Rahman MS, Reza H (2020) Systematic mapping study of non-functional requirements in big data system. In: 2020 IEEE International conference on electro information technology (EIT). IEEE, pp 025–031
Riccio V, Jahangirova G, Stocco A, Humbatova N, Weiss M, Tonella P (2020) Testing machine learning based systems: a systematic mapping. Empir Softw Eng 25(6):5193–5254
Article Google Scholar
Rushby J (1988) Quality measures and assurance for AI software, vol 18. National Aeronautics and Space Administration, Scientific and Technical Information Division
Russel S, Norvig P (2009) Artificial intelligence: a modern approach, English
Samoili S, Cobo ML, Gomez E, De Prato G, Martinez-Plumed F, Delipetrev B (2020) Ai watch. defining artificial intelligence. towards an operational definition and taxonomy of artificial intelligence. In: JRC Technical reports. Joint research centre (seville site)
Siebert J, Joeckel L, Heidrich J, Nakamichi K, Ohashi K, Namba I, Yamamoto R, Aoyama M (2020) Towards guidelines for assessing qualities of machine learning systems. In: International conference on the quality of information and communications technology. Springer, pp 17–31
Taleb I, Serhani MA, Dssouli R (2018) Big data quality: a survey. In: 2018 IEEE International congress on big data (bigdata congress). IEEE, pp 166–173
Tao C, Gao J, Wang T (2019) Testing and quality validation for ai software–perspectives, issues, and practices. IEEE Access 7:120164–120175
Tao C, Hao C, Gao J, Wang T, Wen W (2017) A practical study on quality evaluation for age recognition systems. In: SEKE, pp 345–350
Tsintzira AA, Arvanitou EM, Ampatzoglou A, Chatzigeorgiou A (2020) Applying machine learning in technical debt management: Future opportunities and challenges. In: International conference on the quality of information and communications technology. Springer, pp 53–67
Turhan B, Kutlubay O (2007) Mining software data. In: 2007 IEEE 23Rd international conference on data engineering workshop. IEEE, pp 912–916
Vinayagasundaram B, Srivatsa S (2007) Software quality in artificial intelligence system. Inf Technol J 6(6):835–842
Article Google Scholar
Vogelsang A, Borg M (2019) Requirements engineering for machine learning: Perspectives from data scientists. In: 2019 IEEE 27Th international requirements engineering conference workshops (REW). IEEE, pp 245–251
Wan Z, Xia X, Lo D, Murphy GC (2019) How does machine learning change software development practices? IEEE Transactions on Software Engineering
Wieringa RJ (2014) Design science methodology for information systems and software engineering. Springer
Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp 1–10
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media
Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering
Zhang P, Cao W, Muccini H (2020) Quality assurance technologies of big data applications: A systematic literature review. arXiv:2002.01759

Download references

Author information

Authors and Affiliations

Institute of Science, Computer Engineering Department, Hacettepe University, Ankara, Turkey
Bahar Gezici & Ayça Kolukısa Tarhan

Authors

Bahar Gezici
View author publications
You can also search for this author inPubMed Google Scholar
Ayça Kolukısa Tarhan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bahar Gezici.

Ethics declarations

Conflicts of Interests/Competing interests

Please find attached the paper, “Systematic Literature Review on Software Quality for AI-based Software” by Bahar Gezici and Ayça Kolukısa Tarhan, which we would like to submit for possible publication to the Empirical Software Engineering. We confirm that this work is original and has not been published elsewhere nor is it currently under consideration for publication elsewhere.

For any information concerning this manuscript, please contact me preferably by e-mail at bahargezici@cs.hacettepe.edu.tr. Thank you for your consideration of this manuscript.

Additional information

Communicated by: Paolo Tonella

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Table 18 Mapping between each primary study ID, e.g. P1, P2, and the reference to the corresponding paper

Full size table

Table 19 Detailed information per study for RQ 2.2 (addressed challenges of quality) and RQ 4.1 (how these challenges are addressed)

Full size table

Table 20 Details of bottom-up approach followed in this SLR for relations of metrics and quality attributes used in primary studies

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gezici, B., Tarhan, A.K. Systematic literature review on software quality for AI-based software. Empir Software Eng 27, 66 (2022). https://doi.org/10.1007/s10664-021-10105-2

Download citation

Accepted: 09 December 2021
Published: 17 March 2022
DOI: https://doi.org/10.1007/s10664-021-10105-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Systematic literature review on software quality for AI-based software

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Software Quality for AI: Where We Are Now?

Synergies Between Artificial Intelligence and Software Engineering: Evolution and Trends

Impact of AI Tools on Software Development Code Quality

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interests/Competing interests

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now