research-article

How Provenance helps Quality Assurance Activities in AI/ML Systems

Authors:
Takao Nakagawa

National Institute of Advanced Industrial Science and Technology, JP

National Institute of Advanced Industrial Science and Technology, JP

0000-0001-5815-7324
View Profile

,
Kenichiro Narita

National Institute of Advanced Industrial Science and Technology, JP

National Institute of Advanced Industrial Science and Technology, JP

0000-0003-3974-5918
View Profile

,
Kyoung-Sook Kim

National Institute of Advanced Industrial Science and Technology, JP

National Institute of Advanced Industrial Science and Technology, JP

0000-0003-0670-8053
View Profile

AIMLSystems '22: Proceedings of the Second International Conference on AI-ML SystemsOctober 2022Article No.: 24Pages 1–9https://doi.org/10.1145/3564121.3564801

Published:16 May 2023Publication History

AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems

Pages 1–9

ABSTRACT

Quality assurance is required for the wide use of artificial intelligence (AI) systems in industry and society, including mission-critical areas such as medical or disaster management domains. However, the quality evaluation methods of machine learning (ML) components, especially deep neural networks, have not yet been established. In addition, various metrics are applied by evaluators with different quality requirements and testing environments, from data collection to experimentation to deployment. In this paper, we propose a quality provenance model, AIQPROV, to record who evaluated quality, when from which viewpoint, and how the evaluation was used. The AIQPROV model focuses on human activities on how to apply this to the field of quality assurance, where human intervention is required. Moreover, we present an extension of the W3C PROV framework and conduct a database to store the provenance information of the quality assurance lifecycle with 11 use cases to validate our model.

Supplemental Material

merged.mp4

mp4

70.3 MB

Download

References

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300.Google ScholarDigital Library
Ekaba Bisong. 2019. Kubeflow and Kubeflow Pipelines. Apress, Berkeley, CA, 671–685. https://doi.org/10.1007/978-1-4842-4470-8_46Google Scholar
Souti Chattopadhyay, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020. What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
Committee for machine learning quality management. 2020. Machine Learning Quality Management Guideline v1.0.1. Technical Report. National Institute of Advanced Industrial Science and Technology(AIST). https://www.cpsec.aist.go.jp/achievements/aiqm/AIQM-Guideline-1.0.1-en.pdfGoogle Scholar
Sato Danilo and Windheuser Christoph Wider Arif. 2019. Continuous Delivery for Machine Learning. Retrieved 2022-06-27 from https://martinfowler.com/articles/cd4ml.htmlGoogle Scholar
Michael Felderer and Rudolf Ramler. 2021. Quality Assurance for AI-Based Systems: Overview and Challenges (Introduction to Interactive Session). In International Conference on Software Quality. Springer, 33–42.Google ScholarCross Ref
B.P. Harenslak and J. de Ruiter. 2021. Data Pipelines with Apache Airflow. Manning. https://books.google.co.jp/books?id=8EwnEAAAQBAJGoogle Scholar
Ben Hutchinson, Negar Rostamzadeh, Christina Greer, Katherine Heller, and Vinodkumar Prabhakaran. 2022. Evaluation Gaps in Machine Learning Practice. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1859–1876. https://doi.org/10.1145/3531146.3533233Google ScholarDigital Library
ISO/IEC 25010. 2011. ISO/IEC 25010:2011, Systems and software engineering ? Systems and software Quality Requirements and Evaluation (SQuaRE) ? System and software quality models. Standard. ISO.Google Scholar
Dominik Kerzel, Sheeba Samuel, and Birgitta König-Ries. 2021. Towards Tracking Provenance from Machine Learning Notebooks.. In KDIR. 274–281.Google Scholar
Hiroshi Kuwajima and Fuyuki Ishikawa. 2019. Adapting SQuaRE for Quality Assessment of Artificial Intelligence Systems. In IEEE International Symposium on Software Reliability Engineering Workshops, ISSRE Workshops 2019, Berlin, Germany, October 27-30, 2019, Katinka Wolter, Ina Schieferdecker, Barbara Gallina, Michel Cukier, Roberto Natella, Naghmeh Ramezani Ivaki, and Nuno Laranjeiro (Eds.). IEEE, 13–18. https://doi.org/10.1109/ISSREW.2019.00035Google Scholar
Timothy Lebo, Satya Sahoo, Deborah McGuinness, Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-Reyes, Stephan Zednik, and Jun Zhao. 2013. Prov-o: The prov ontology. W3C Recommendation. World Wide Web Consortium(2013). https://www.w3.org/TR/prov-o/Google Scholar
Dusica Marijan, Arnaud Gotlieb, and Mohit Kumar Ahuja. 2019. Challenges of Testing Machine Learning Based Systems. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). 101–102. https://doi.org/10.1109/AITest.2019.00010Google Scholar
Akshay Naresh Modi, Chiu Yuen Koo, Chuan Yu Foo, Clemens Mewald, Denis M. Baylor, Eric Breck, Heng-Tze Cheng, Jarek Wilkiewicz, Levent Koc, Lukasz Lew, Martin A. Zinkevich, Martin Wicke, Mustafa Ispir, Neoklis Polyzotis, Noah Fiedel, Salem Elie Haykal, Steven Whang, Sudip Roy, Sukriti Ramesh, Vihan Jain, Xin Zhang, and Zakaria Haque. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. In KDD 2017.Google Scholar
Kenichiro Narita, Michitaka Akita, Kyoung-Sook Kim, Yuta Iwase, Yuichi Watanaka, Takao Nakagawa, and Qiang Zhong. 2021. Qunomon: A FAIR testbed of quality evaluation for machine learning models. In 2021 28th Asia-Pacific Software Engineering Conference Workshops (APSEC Workshops). 21–24. https://doi.org/10.1109/APSECW53869.2021.00015Google ScholarCross Ref
Ipek Ozkaya. 2020. What is really different in engineering AI-enabled systems?IEEE Software 37, 4 (2020), 3–6.Google Scholar
Lukas Rupprecht, James C Davis, Constantine Arnold, Yaniv Gur, and Deepavali Bhagwat. 2020. Improving reproducibility of data science pipelines through transparent provenance capture. Proceedings of the VLDB Endowment 13, 12 (2020), 3354–3368.Google ScholarDigital Library
Sheeba Samuel, Frank Löffler, and Birgitta König-Ries. 2020. Machine learning pipelines: provenance, reproducibility and FAIR data principles. In Provenance and Annotation of Data and Processes. Springer, 226–230.Google Scholar
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. Advances in neural information processing systems 28 (2015).Google Scholar
Micah J Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. 2020. The machine learning bazaar: Harnessing the ml ecosystem for effective system development. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 785–800.Google ScholarDigital Library
Renan Souza, Leonardo Azevedo, Vítor Lourenço, Elton Soares, Raphael Thiago, Rafael Brandão, Daniel Civitarese, Emilio Vital Brazil, Marcio Moreno, Patrick Valduriez, Marta Mattoso, Renato Cerqueira, and Marco Netto. 2019. Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering. In WORKS 2019 - Workflows in Support of Large-Scale Science co-located with SC 2019 - ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis. ACM, Denver, United States, 10. https://hal-lirmm.ccsd.cnrs.fr/lirmm-02335500Google Scholar
Medha Umarji and Carolyn Seaman. 2009. Gauging Acceptance of Software Metrics: Comparing Perspectives of Managers and Developers. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement(ESEM ’09). IEEE Computer Society, USA, 236–247. https://doi.org/10.1109/ESEM.2009.5315999Google ScholarDigital Library
Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering(2020).Google ScholarDigital Library

Index Terms

How Provenance helps Quality Assurance Activities in AI/ML Systems

Recommendations

User-Perceived Quality of Interactive Systems

User-perceived quality of interactive systems is defined in terms of statistically nonoverlapping categories, so-called dimensions or factors Categories are identified by factor analysis and represent a dimensional concept of the quality of interactive ...
Read More
Optimizing Quality Assurance Strategies through an Integrated Quality Assurance Approach -- Guiding Quality Assurance with Assumptions and Selection Rules
SEAA '14: Proceedings of the 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications

Quality assurance activities are often still expensive or do not offer the expected quality. A recent trend aimed at overcoming this problem is tighter integration of several quality assurance techniques such as analysis and testing in order to exploit ...
Read More
Improving the ROI of software quality assurance activities: an empirical study
ICSP'10: Proceedings of the 2010 international conference on New modeling concepts for today's software processes: software process

Review, process audit, and testing are three main Quality Assurance activities during the software development life cycle. They complement each other to examine work products for defects and improvement opportunities to the largest extent. Understanding ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems
October 2022
209 pages
ISBN:9781450398473
DOI:10.1145/3564121

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 May 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
artificial intelligence
development lifecycle
machine learning
provenance
quality assessment
testing
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 58
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

How Provenance helps Quality Assurance Activities in AI/ML Systems

AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

User-Perceived Quality of Interactive Systems

Optimizing Quality Assurance Strategies through an Integrated Quality Assurance Approach -- Guiding Quality Assurance with Assumptions and Selection Rules

Improving the ROI of software quality assurance activities: an empirical study

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

How Provenance helps Quality Assurance Activities in AI/ML Systems

AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

User-Perceived Quality of Interactive Systems

Optimizing Quality Assurance Strategies through an Integrated Quality Assurance Approach -- Guiding Quality Assurance with Assumptions and Selection Rules

Improving the ROI of software quality assurance activities: an empirical study

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media