skip to main content
10.1145/3564121.3564801acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

How Provenance helps Quality Assurance Activities in AI/ML Systems

Published:16 May 2023Publication History

ABSTRACT

Quality assurance is required for the wide use of artificial intelligence (AI) systems in industry and society, including mission-critical areas such as medical or disaster management domains. However, the quality evaluation methods of machine learning (ML) components, especially deep neural networks, have not yet been established. In addition, various metrics are applied by evaluators with different quality requirements and testing environments, from data collection to experimentation to deployment. In this paper, we propose a quality provenance model, AIQPROV, to record who evaluated quality, when from which viewpoint, and how the evaluation was used. The AIQPROV model focuses on human activities on how to apply this to the field of quality assurance, where human intervention is required. Moreover, we present an extension of the W3C PROV framework and conduct a database to store the provenance information of the quality assurance lifecycle with 11 use cases to validate our model.

Skip Supplemental Material Section

Supplemental Material

merged.mp4

mp4

70.3 MB

References

  1. Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ekaba Bisong. 2019. Kubeflow and Kubeflow Pipelines. Apress, Berkeley, CA, 671–685. https://doi.org/10.1007/978-1-4842-4470-8_46Google ScholarGoogle Scholar
  3. Souti Chattopadhyay, Ishita Prasad, Austin Z Henley, Anita Sarma, and Titus Barik. 2020. What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Committee for machine learning quality management. 2020. Machine Learning Quality Management Guideline v1.0.1. Technical Report. National Institute of Advanced Industrial Science and Technology(AIST). https://www.cpsec.aist.go.jp/achievements/aiqm/AIQM-Guideline-1.0.1-en.pdfGoogle ScholarGoogle Scholar
  5. Sato Danilo and Windheuser Christoph Wider Arif. 2019. Continuous Delivery for Machine Learning. Retrieved 2022-06-27 from https://martinfowler.com/articles/cd4ml.htmlGoogle ScholarGoogle Scholar
  6. Michael Felderer and Rudolf Ramler. 2021. Quality Assurance for AI-Based Systems: Overview and Challenges (Introduction to Interactive Session). In International Conference on Software Quality. Springer, 33–42.Google ScholarGoogle ScholarCross RefCross Ref
  7. B.P. Harenslak and J. de Ruiter. 2021. Data Pipelines with Apache Airflow. Manning. https://books.google.co.jp/books?id=8EwnEAAAQBAJGoogle ScholarGoogle Scholar
  8. Ben Hutchinson, Negar Rostamzadeh, Christina Greer, Katherine Heller, and Vinodkumar Prabhakaran. 2022. Evaluation Gaps in Machine Learning Practice. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1859–1876. https://doi.org/10.1145/3531146.3533233Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ISO/IEC 25010. 2011. ISO/IEC 25010:2011, Systems and software engineering ? Systems and software Quality Requirements and Evaluation (SQuaRE) ? System and software quality models. Standard. ISO.Google ScholarGoogle Scholar
  10. Dominik Kerzel, Sheeba Samuel, and Birgitta König-Ries. 2021. Towards Tracking Provenance from Machine Learning Notebooks.. In KDIR. 274–281.Google ScholarGoogle Scholar
  11. Hiroshi Kuwajima and Fuyuki Ishikawa. 2019. Adapting SQuaRE for Quality Assessment of Artificial Intelligence Systems. In IEEE International Symposium on Software Reliability Engineering Workshops, ISSRE Workshops 2019, Berlin, Germany, October 27-30, 2019, Katinka Wolter, Ina Schieferdecker, Barbara Gallina, Michel Cukier, Roberto Natella, Naghmeh Ramezani Ivaki, and Nuno Laranjeiro (Eds.). IEEE, 13–18. https://doi.org/10.1109/ISSREW.2019.00035Google ScholarGoogle Scholar
  12. Timothy Lebo, Satya Sahoo, Deborah McGuinness, Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-Reyes, Stephan Zednik, and Jun Zhao. 2013. Prov-o: The prov ontology. W3C Recommendation. World Wide Web Consortium(2013). https://www.w3.org/TR/prov-o/Google ScholarGoogle Scholar
  13. Dusica Marijan, Arnaud Gotlieb, and Mohit Kumar Ahuja. 2019. Challenges of Testing Machine Learning Based Systems. In 2019 IEEE International Conference On Artificial Intelligence Testing (AITest). 101–102. https://doi.org/10.1109/AITest.2019.00010Google ScholarGoogle Scholar
  14. Akshay Naresh Modi, Chiu Yuen Koo, Chuan Yu Foo, Clemens Mewald, Denis M. Baylor, Eric Breck, Heng-Tze Cheng, Jarek Wilkiewicz, Levent Koc, Lukasz Lew, Martin A. Zinkevich, Martin Wicke, Mustafa Ispir, Neoklis Polyzotis, Noah Fiedel, Salem Elie Haykal, Steven Whang, Sudip Roy, Sukriti Ramesh, Vihan Jain, Xin Zhang, and Zakaria Haque. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. In KDD 2017.Google ScholarGoogle Scholar
  15. Kenichiro Narita, Michitaka Akita, Kyoung-Sook Kim, Yuta Iwase, Yuichi Watanaka, Takao Nakagawa, and Qiang Zhong. 2021. Qunomon: A FAIR testbed of quality evaluation for machine learning models. In 2021 28th Asia-Pacific Software Engineering Conference Workshops (APSEC Workshops). 21–24. https://doi.org/10.1109/APSECW53869.2021.00015Google ScholarGoogle ScholarCross RefCross Ref
  16. Ipek Ozkaya. 2020. What is really different in engineering AI-enabled systems?IEEE Software 37, 4 (2020), 3–6.Google ScholarGoogle Scholar
  17. Lukas Rupprecht, James C Davis, Constantine Arnold, Yaniv Gur, and Deepavali Bhagwat. 2020. Improving reproducibility of data science pipelines through transparent provenance capture. Proceedings of the VLDB Endowment 13, 12 (2020), 3354–3368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sheeba Samuel, Frank Löffler, and Birgitta König-Ries. 2020. Machine learning pipelines: provenance, reproducibility and FAIR data principles. In Provenance and Annotation of Data and Processes. Springer, 226–230.Google ScholarGoogle Scholar
  19. David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. Advances in neural information processing systems 28 (2015).Google ScholarGoogle Scholar
  20. Micah J Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. 2020. The machine learning bazaar: Harnessing the ml ecosystem for effective system development. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 785–800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Renan Souza, Leonardo Azevedo, Vítor Lourenço, Elton Soares, Raphael Thiago, Rafael Brandão, Daniel Civitarese, Emilio Vital Brazil, Marcio Moreno, Patrick Valduriez, Marta Mattoso, Renato Cerqueira, and Marco Netto. 2019. Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering. In WORKS 2019 - Workflows in Support of Large-Scale Science co-located with SC 2019 - ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis. ACM, Denver, United States, 10. https://hal-lirmm.ccsd.cnrs.fr/lirmm-02335500Google ScholarGoogle Scholar
  22. Medha Umarji and Carolyn Seaman. 2009. Gauging Acceptance of Software Metrics: Comparing Perspectives of Managers and Developers. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement(ESEM ’09). IEEE Computer Society, USA, 236–247. https://doi.org/10.1109/ESEM.2009.5315999Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering(2020).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How Provenance helps Quality Assurance Activities in AI/ML Systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems
            October 2022
            209 pages
            ISBN:9781450398473
            DOI:10.1145/3564121

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 May 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)58
            • Downloads (Last 6 weeks)18

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format