research-article

Open access

PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI Trustworthiness

Authors:

Maximilian Pintz,

Michael MockAuthors Info & Claims

RAIE '24: Proceedings of the 2nd International Workshop on Responsible AI Engineering

Pages 20 - 27

https://doi.org/10.1145/3643691.3648585

Published: 29 July 2024 Publication History

Abstract

As AI applications are emerging in diverse fields - e.g., industry, healthcare or finance - weaknesses and failures of such applications might bare unacceptable risks which need to be rigorously assessed, quantified and, if necessary, mitigated. One crucial component of an effective AI trustworthiness assessment and risk management are systematic evaluations of the AI application based on properly chosen and executed tests. In addition to the known requirements of providing facilities for automated and reproducible tests, an assessment platform for Trustworthy AI must support the integration of different AI models and data sets, must be extensible for AI risk specific metrics and test tools, and should facilitate collaboration between model providers, assessment tool developers and auditors. In this paper, we develop an architecture of a platform for automated, reproducible and collaborative assessments of AI applications, based on an in-depth requirements analysis that maps use cases and collaboration scenarios to technical requirements.

References

[1]

Iso/iec/ieee international standard - systems and software engineering - life cycle processes - risk management. ISO/IEC/IEEE 16085:2021(E), pp. 1--60, 2021.

[2]

Iso/iec/ieee international standard - software and systems engineering - software testing - part 1:general concepts. ISO/IEC/IEEE 29119-1:2022(E), pp. 1--60, 2022.

[3]

Ai verify foundation. what is ai verify? 2023.

[4]

ISO/IEC 23894:2023. Information technology - Artificial intelligence - Guidance on risk management. Standard, International Organization for Standardization, Geneva, CH, 2023. URL https://www.iso.org/standard/77304.html.

[5]

Michael Armbrust, Ali Ghodsi, Reynold Xin, et al. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In Proceedings of CIDR, volume 8, 2021.

[6]

Monya Baker. Reproducibility crisis. Nature, 533(26):353--66, 2016.

[7]

Beck. Test Driven Development: By Example. Addison-Wesley Longman Publishing Co., Inc., USA, 2002. ISBN 0321146530.

Digital Library

[8]

Niklas Beck, Claudio Martens, Karl-Heinz Sylla, et al. Zukunftssichere Lösungen für maschinelles Lernen. 2020.

[9]

Andrew Bell, Ian Solano-Kamaiko, Oded Nov, et al. It's just not that simple: an empirical study of the accuracy-explainability trade-off in machine learning for public policy. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 248--266, 2022.

Digital Library

[10]

Rachel KE Bellamy, Kuntal Dey, Michael Hind, et al. Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63(4/5):4--1, 2019.

[11]

HoussemBen Braiek and Foutse Khomh. On testing machine learning programs. Journal of Systems and Software, 164:110542, 2020. ISSN 0164-1212.

[12]

Junyi Chai, Hao Zeng, Anming Li, et al. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications, 6:100134, 2021. ISSN 2666-8270.

[13]

Nishanth Chandran, Divya Gupta, Aseem Rastogi, et al. Ezpc: Programmable and efficient secure two-party computation for machine learning. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 496--511, 2019.

[14]

European Commission. Proposal for a regulation laying down harmonised rules on artificial int certain union legislative acts, 2021.

[15]

Eelco Dolstra, Andres Löh, and Nicolas Pierron. Nixos: A purely functional linux distribution. Journal of Functional Programming, 20(5--6):577--615, 2010.

Digital Library

[16]

Johnu George and Amit Saha. End-to-end machine learning using kubeflow. In 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), pp. 336--338, 2022.

Digital Library

[17]

D. Graham. Foundations of Software Testing: ISTQB Certification. Course Technology Cengage Learning, 2008. ISBN 9781844809899.

[18]

Odd Erik Gundersen, Saeid Shamsaliei, and Richard Juul Isdahl. Do machine learning platforms provide out-of-the-box reproducibility? Future Generation Computer Systems, 126:34--47, 2022.

Digital Library

[19]

Elena Haedecke, Michael Mock, and Maram Akila. Scrutinai: a visual analytics approach for the semantic analysis of deep neural network predictions. In EuroVis Workshop on Visual Analytics (EuroVA), pp. 73--775. The Eurographics Association, 2022.

[20]

Thilo Hagendorff. The ethics of AI ethics - an evaluation of guidelines. CoRR, abs/1903.03425, 2019. URL http://arxiv.org/abs/1903.03425.

[21]

Anna Hedström, Leander Weber, Daniel Krakowczyk, et al. Quantus: An explainable ai toolkit for responsible evaluation of neural network explanations and beyond. Journal of Machine Learning Research, 24(34):1--11, 2023.

[22]

Lennard Helmer, Claudio Martens, Dennis Wegener, et al. Towards Trustworthy AI Engineering - A Case study on integrating an AI audit catalogue into MLOps processes. In 2024 International Workshop on Responsible AI Engineering (RAIE '24), 2024.

[23]

Sebastian Houben, Stephanie Abrecht, Maram Akila, et al. Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety, pp. 3--78. Springer International Publishing, Cham, 2022. ISBN 978-3-031-01233-4.

[24]

Open Containers Initiative et al. Open container initiative runtime specification, 2020.

[25]

Mohd Javaid, Abid Haleem, Ravi Pratap Singh, et al. Artificial intelligence applications for industry 4.0: A literature-based study. Journal of Industrial Integration and Management, 7(01):83--111, 2022.

[26]

Foutse Khomh, Bram Adams, Jinghui Cheng, et al. Software engineering for machine-learning applications: The road ahead. IEEE Software, 35(5):81--84, 2018.

[27]

Bo Li, Peng Qi, Bo Liu, et al. Trustworthy ai: From principles to practices. ACM Computing Surveys, 55(9):1--46, 2023.

Digital Library

[28]

Percy Liang, Rishi Bommasani, Tony Lee, et al. Holistic evaluation of language models. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. Featured Certification, Expert Certification.

[29]

Qinghua Lu, Liming Zhu, Xiwei Xu, et al. Towards a roadmap on software engineering for responsible ai. In Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI, CAIN '22, pp. 101--112, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392754.

Digital Library

[30]

Qinghua Lu, Liming Zhu, Xiwei Xu, et al. Responsible ai pattern catalogue: A collection of best practices for ai governance and engineering. ACM Comput. Surv., oct 2023. ISSN 0360-0300. Just Accepted.

Digital Library

[31]

Xinsong Ma, Zekai Wang, and Weiwei Liu. On the tradeoff between robustness and fairness. Advances in Neural Information Processing Systems, 35:26230--26241, 2022.

[32]

Sean McGregor. Preventing repeated real world ai failures by cataloging incidents: The ai incident database. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 15458--15463, 2021.

[33]

Alexandru Moga, Thanikesavan Sivanthi, and Carsten Franke. Os-level virtualization for industrial automation systems: Are we there yet? In Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1838--1843, 2016.

Digital Library

[34]

CISA et al. NCSC. Guidelines for secure AI system development. Report, National Cyber Security Centre, Cybersecurity and Infrastructure Security Agency, London, GB, 2023. URL https://www.ncsc.gov.uk/files/Guidelines-for-secure-AI-system-development.pdf.

[35]

Maria-Irina Nicolae, Mathieu Sinn, Minh Ngoc Tran, et al. Adversarial robustness toolbox v1.0.0, 2019.

[36]

High-Level Expert Group on Artificial Intelligence. Ethics Guidelines for Trustworthy AI. Report, European Commission, Brussels, B, 2019. URL https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.

[37]

Maximilian Pintz, Joachim Sicking, Maximilian Poretschkin, et al. A survey on uncertainty toolkits for deep learning. ML Evaluation Standards Workshop at ICLR 2022, 2022.

[38]

Maximilian Poretschkin, Anna Schmitz, Maram Akila, et al. Guideline for Trustworthy Artificial Intelligence - AI Assessment Catalog, 2023. URL https://arxiv.org/abs/2307.03681.

[39]

Georg Rehm, Dimitrios Galanis, Penny Labropoulou, et al. Towards an interoperable ecosystem of ai and lt platforms: A roadmap for the implementation of different levels of interoperability. arXiv preprint arXiv:2004.08355, 2020.

[40]

Christian Sandvig, Kevin Hamilton, Karrie Karahalios, et al. Auditing algorithms : Research methods for detecting discrimination on internet platforms. 2014. URL https://api.semanticscholar.org/CorpusID:15686114.

[41]

Anna Schmitz, Maram Akila, Dirk Hecker, et al. The why and how of trustworthy ai. at - Automatisierungstechnik, 70(9):793--804, 2022.

[42]

Peter Schüller, João Paolo Costeira, James Crowley, et al. Composing complex and hybrid ai solutions. arXiv preprint arXiv:2202.12566, 2022.

[43]

Ehsan Toreini, Maryam Mehrnezhad, and Aad van Moorsel. Fairness as a service (faas): verifiable and privacy-preserving fairness auditing of machine learning systems jo - international journal of information security. 2023. URL https://api.semanticscholar.org/CorpusID:15686114.

[44]

Amirsina Torfi, Rouzbeh A Shirvani, Yaser Keneshloo, et al. Natural language processing advancements by deep learning: A survey. arXiv preprint arXiv:2003.01200, 2020.

[45]

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, et al. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2018.

[46]

Aditya Venkataraman and Kishore Kumar Jagadeesha. Evaluation of inter-process communication mechanisms. Architecture, 86:64, 2015.

[47]

Michael Walfish and Andrew J. Blumberg. Verifying computations without reexecuting them. Commun. ACM, 58(2):74--84, jan 2015. ISSN 0001-0782.

Digital Library

[48]

Boming Xia, Qinghua Lu, Harsha Perera, et al. Towards concrete and connected ai risk assessment (c2aira): A systematic mapping study. In 2023 IEEE/ACM 2nd International Conference on AI Engineering - Software Engineering for AI (CAIN), pp. 104--116, 2023.

[49]

Matei Zaharia, Andrew Chen, Aaron Davidson, et al. Accelerating the machine learning lifecycle with mlflow. IEEE Data Eng. Bull., 41(4):39--45, 2018.

[50]

Jie M Zhang, Mark Harman, Lei Ma, et al. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering, 48(1):1--36, 2020.

Digital Library

[51]

Qian Zhang, Jie Lu, and Yaochu Jin. Artificial intelligence in recommender systems. Complex & Intelligent Systems, 7:439--457, 2021.

Recommendations

Personalised automated assessments
AInF'15: Proceedings of the First International Conference on AI and Feedback - Volume 1407

Consider an evaluator, or an assessor, who needs to assess a large amount of information. For instance, think of a tutor in a massive open online course with thousands of enrolled students, a senior program committee member in a large peer review ...
Making risk assessments more comparable and repeatable

Many of the objections to implementing Risk Management and acting upon risk results hinge on the subjectivity of the risk assessment system. This subjectivity makes it difficult to make risk assessments justifiable, repeatable, and comparable over an ...
People Perceive Algorithmic Assessments as Less Fair and Trustworthy Than Identical Human Assessments
CSCW

Algorithmic risk assessments are being deployed in an increasingly broad spectrum of domains including banking, medicine, and law enforcement. However, there is widespread concern about their fairness and trustworthiness, and people are also known to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RAIE '24: Proceedings of the 2nd International Workshop on Responsible AI Engineering

April 2024

62 pages

ISBN:9798400705724

DOI:10.1145/3643691

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2024

Check for updates

Qualifiers

Research-article

Conference

RAIE '24

Sponsor:

SIGSOFT

RAIE '24: 2nd International Workshop on Responsible AI Engineering

April 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
187
Total Downloads

Downloads (Last 12 months)187
Downloads (Last 6 weeks)32

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten