Abstract
There is a widespread demand for Artificial Intelligence (AI) software, specifically Machine Learning (ML). It is getting increasingly popular and being adopted in various applications we use daily. AI-based software quality is different from traditional software quality because it generally addresses distinct and more complex kinds of problems. With the fast advance of AI technologies and related techniques, how to build high-quality AI-based software becomes a very prominent subject. This paper aims at investigating the state of the art on software quality (SQ) for AI-based systems and identifying quality attributes, applied models, challenges, and practices that are reported in the literature. We carried out a systematic literature review (SLR) from 1988 to 2020 to (i) analyze and understand related primary studies and (ii) synthesize limitations and open challenges to drive future research. Our study provides a road map for researchers to understand quality challenges, attributes, and practices in the context of software quality for AI-based software better. From the empirical evidence that we have gathered by this SLR, we suggest future work on this topic be structured under three categories which are Definition/Specification, Design/Evaluation, and Process/Socio-technical.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
25000 I (2005) The iso/iec 25000 series of standards. https://iso25000.com/index.php/en/iso-25000-standards
25012:2008 I (2008) software engineering — software product quality requirements and evaluation (square) — data quality model. https://www.iso.org/standard/35736.html
26262-1:2018 I (2018) Road vehicles — functional safety. https://www.iso.org/standard/68383.html
29119-1:2013 I (2013) Software and systems engineering — software testing. https://www.iso.org/standard/45142.html
9126-1:2001 I (2001) Software engineering — product quality. https://www.iso.org/standard/22749.html
Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 625–635
Alamin MAA, Uddin G (2021) Quality assurance challenges for machine learning software applications during software development life cycle phases. arXiv:2105.01195
Ali Z Quality measurement challenges for artificial intelligence software
de Almeida Biolchini JC, Mian PG, Natali ACC, Conte T, Travassos GH (2007) Scientific research ontology to support systematic review in software engineering. Adv Eng Inform 21(2):133–151
Arpteg A, Brinne B, Crnkovic-Friis L, Bosch J (2018) Software engineering challenges of deep learning. In: 2018 44Th euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 50–59
Borg M, Englund C, Wnuk K, Duran B, Levandowski C, Gao S, Tan Y, Kaijser H, Lönn H, Törnqvist J (2018) Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. arXiv:1812.05389
Bosch J, Olsson HH, Crnkovic I (2021) Engineering ai systems: a research agenda. In: Artificial intelligence paradigms for smart cyber-physical systems. IGI Global, pp 1–19
Bourque P, Dupuis R, Abran A, Moore JW, Tripp L (2004) Guide to the software engineering body of knowledge -
Braiek HB, Khomh F (2020) On testing machine learning programs. J Syst Softw 164:110542
Byrne C (2017) Development Workflows for Data Scientists. O’Reilly Media
Chen R, Bastani FB, Tsao TW (1995) On the reliability of ai planning software in real-time applications. IEEE Trans Knowl Data Eng 7(1):4–13
Cummaudo A, Vasa R, Grundy J, Abdelrazek M, Cain A (2019) Losing confidence in quality: Unspoken evolution of computer vision services. In: 2019 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 333–342
Deng L (2018) Artificial intelligence in the rising wave of deep learning: The historical path and future outlook [perspectives]. IEEE Signal Proc Mag 35(1):180–177
Forward A, Lethbridge TC (2008) A taxonomy of software types to facilitate search and evidence-based software engineering. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, pp 179–191
Garousi V, Felderer M, Mäntylä MV (2016) The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, pp 1–6
Geske F, Hofmann P, Lämmermann L, Schlatt V, Urbach N (2021) Gateways to artificial intelligence: Developing a taxonomy for ai service platforms. In: Twenty-ninth european conference on information systems (ECIS)
Gezici B, Tarhan AK (2019) Final pool. https://drive.google.com/file/d/1ve6BpJTrITsfo6auSoWKh48ajWbNb05n/view?usp=sharing
Hamada K, Ishikawa F, Masuda S, Matsuya M, Ujita Y (2020) Guidelines for quality assurance of machine learning-based artificial intelligence. In: SEKE2020: The 32nd international conference on software engineering & knowledge engineering, pp 335–341
Hannousse A (2021) Searching relevant papers for software engineering secondary studies: Semantic scholar coverage and identification role. IET Softw 15 (1):126–146
Henriksson J, Borg M, Englund C (2018) Automotive safety and machine learning: Initial results from a study on how to adapt the iso 26262 safety standard. In: 2018 IEEE/ACM 1St international workshop on software engineering for AI in autonomous systems (SEFAIAS). IEEE, pp 47–49
Hopgood AA (2005) The state of artificial intelligence. Adv Comput 65:1–75
Horkoff J (2019) Non-functional requirements for machine learning: Challenges and new directions. In: 2019 IEEE 27Th international requirements engineering conference (RE). IEEE, pp 386–391
Hyun Park S, Seon Shin W, Hyun Park Y, Lee Y (2017) Building a new culture for quality management in the era of the fourth industrial revolution. Total Qual Manag Bus Excell 28(9-10):934–945
Ishikawa F, Yoshioka N (2019) How do engineers perceive difficulties in engineering of machine-learning systems?-questionnaire survey. In: 2019 IEEE/ACM Joint 7th international workshop on conducting empirical studies in industry (CESI) and 6th international workshop on software engineering research and industrial practice (SER&IP). IEEE, pp 2–9
ISO/IEC (2011) Iso/iec 25010 (2011)-systems and software quality requirements and evaluation (square)-system and software quality models. International Standard ISO/IEC 25010 2(1):1–25
Ivarsson M, Gorschek T (2011) A method for evaluating rigor and industrial relevance of technology evaluations. Empir Softw Eng 16(3):365–395
Kitchenham B (2004) Procedures for performing systematic reviews. keele, UK. Keele Univ 33(2004):1–26
Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering -
Kuwajima H, Ishikawa F (2019) Adapting square for quality assessment of artificial intelligence systems. In: 2019 IEEE International symposium on software reliability engineering workshops (ISSREW). IEEE, pp 13–18
Kuwajima H, Yasuoka H, Nakae T (2018) Open problems in engineering and quality assurance of safety critical machine learning systems. arXiv:1812.03057
Kuwajima H, Yasuoka H, Nakae T (2020) Engineering problems in machine learning systems. Mach Learn 109(5):1103–1126
Lakshen GA, Vraneš S., Janev V (2016) Big data and quality: A literature review. In: 2016 24Th telecommunications forum (TELFOR). IEEE, pp 1–4
Lenarduzzi V, Lomio F, Moreschini S, Taibi D, Tamburri DA (2021) Software quality for ai: Where we are now?. In: International conference on software quality. Springer, pp 43–53
Liu Y, Ma L, Zhao J (2019) Secure deep learning engineering: a road towards quality assurance of intelligent systems. In: International conference on formal engineering methods. Springer, pp 3–15
Lwakatare LE, Raj A, Crnkovic I, Bosch J, Olsson HH (2020) Large-scale machine learning systems in real-world industrial settings: a review of challenges and solutions. Inf Softw Technol 127:106368
Malik V, Singh S (2020) Artificial intelligent environments: risk management and quality assurance implementation. J Discret Math Sci Cryptogr 23 (1):187–195
Mannarswamy S, Roy S, Chidambaram S (2020) Tutorial on software testing & quality assurance for machine learning applications from research bench to real world. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp 373–374
Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems: A survey. arXiv:2105.01984
Masuda S, Ono K, Yasue T, Hosokawa N (2018) A survey of software quality for machine learning applications. In: 2018 IEEE International conference on software testing, verification and validation workshops (ICSTW). IEEE, pp 279–284
Murphy C, Kaiser GE, Arias M (2006) A framework for quality assurance of machine learning applications -
Nakajima S (2018) Quality assurance of machine learning software. In: 2018 IEEE 7Th global conference on consumer electronics (GCCE). IEEE, pp 601–604
Nakajima S (2019) Distortion and faults in machine learning software. In: International workshop on structured object-oriented formal language and method. Springer, pp 29–41
Nakamichi K, Ohashi K, Namba I, Yamamoto R, Aoyama M, Joeckel L, Siebert J, Heidrich J (2020) Requirements-driven method to determine quality characteristics and measurements for machine learning software and its evaluation. In: 2020 IEEE 28Th international requirements engineering conference (RE). IEEE, pp 260–270
Nascimento E, Nguyen-Duc A, Sundbø I, Conte T (2020) Software engineering for artificial intelligence and machine learning software: A systematic literature review. arXiv:2011.03751
Nguyen-Duc A, Abrahamsson P (2020) Continuous experimentation on artificial intelligence software: a research agenda. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1513–1516
Nishi Y, Masuda S, Ogawa H, Uetsuki K (2018) A test architecture for machine learning product. In: 2018 IEEE International conference on software testing, verification and validation workshops (ICSTW). IEEE, pp 273–278
Ongsulee P (2017) Artificial intelligence, machine learning and deep learning. In: 2017 15Th international conference on ICT and knowledge engineering (ICT&KE). IEEE, pp 1–6
Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines for conducting systematic mapping studies in software engineering: an update. Inf Softw Technol 64:1–18
Pons L, Ozkaya I (2019) Priority quality attributes for engineering ai-enabled systems. arXiv:1911.02912
Poth A, Meyer B, Schlicht P, Riel A (2020) Quality assurance for machine learning–an approach to function and system safeguarding. In: 2020 IEEE 20Th international conference on software quality, reliability and security (QRS). IEEE, pp 22–29
Rahman MS, Reza H (2020) Systematic mapping study of non-functional requirements in big data system. In: 2020 IEEE International conference on electro information technology (EIT). IEEE, pp 025–031
Riccio V, Jahangirova G, Stocco A, Humbatova N, Weiss M, Tonella P (2020) Testing machine learning based systems: a systematic mapping. Empir Softw Eng 25(6):5193–5254
Rushby J (1988) Quality measures and assurance for AI software, vol 18. National Aeronautics and Space Administration, Scientific and Technical Information Division
Russel S, Norvig P (2009) Artificial intelligence: a modern approach, English
Samoili S, Cobo ML, Gomez E, De Prato G, Martinez-Plumed F, Delipetrev B (2020) Ai watch. defining artificial intelligence. towards an operational definition and taxonomy of artificial intelligence. In: JRC Technical reports. Joint research centre (seville site)
Siebert J, Joeckel L, Heidrich J, Nakamichi K, Ohashi K, Namba I, Yamamoto R, Aoyama M (2020) Towards guidelines for assessing qualities of machine learning systems. In: International conference on the quality of information and communications technology. Springer, pp 17–31
Taleb I, Serhani MA, Dssouli R (2018) Big data quality: a survey. In: 2018 IEEE International congress on big data (bigdata congress). IEEE, pp 166–173
Tao C, Gao J, Wang T (2019) Testing and quality validation for ai software–perspectives, issues, and practices. IEEE Access 7:120164–120175
Tao C, Hao C, Gao J, Wang T, Wen W (2017) A practical study on quality evaluation for age recognition systems. In: SEKE, pp 345–350
Tsintzira AA, Arvanitou EM, Ampatzoglou A, Chatzigeorgiou A (2020) Applying machine learning in technical debt management: Future opportunities and challenges. In: International conference on the quality of information and communications technology. Springer, pp 53–67
Turhan B, Kutlubay O (2007) Mining software data. In: 2007 IEEE 23Rd international conference on data engineering workshop. IEEE, pp 912–916
Vinayagasundaram B, Srivatsa S (2007) Software quality in artificial intelligence system. Inf Technol J 6(6):835–842
Vogelsang A, Borg M (2019) Requirements engineering for machine learning: Perspectives from data scientists. In: 2019 IEEE 27Th international requirements engineering conference workshops (REW). IEEE, pp 245–251
Wan Z, Xia X, Lo D, Murphy GC (2019) How does machine learning change software development practices? IEEE Transactions on Software Engineering
Wieringa RJ (2014) Design science methodology for information systems and software engineering. Springer
Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp 1–10
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media
Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering
Zhang P, Cao W, Muccini H (2020) Quality assurance technologies of big data applications: A systematic literature review. arXiv:2002.01759
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interests/Competing interests
Please find attached the paper, “Systematic Literature Review on Software Quality for AI-based Software” by Bahar Gezici and Ayça Kolukısa Tarhan, which we would like to submit for possible publication to the Empirical Software Engineering. We confirm that this work is original and has not been published elsewhere nor is it currently under consideration for publication elsewhere.
For any information concerning this manuscript, please contact me preferably by e-mail at bahargezici@cs.hacettepe.edu.tr. Thank you for your consideration of this manuscript.
Additional information
Communicated by: Paolo Tonella
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Gezici, B., Tarhan, A.K. Systematic literature review on software quality for AI-based software. Empir Software Eng 27, 66 (2022). https://doi.org/10.1007/s10664-021-10105-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10105-2