skip to main content
10.1145/3643691.3648585acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI Trustworthiness

Published: 29 July 2024 Publication History

Abstract

As AI applications are emerging in diverse fields - e.g., industry, healthcare or finance - weaknesses and failures of such applications might bare unacceptable risks which need to be rigorously assessed, quantified and, if necessary, mitigated. One crucial component of an effective AI trustworthiness assessment and risk management are systematic evaluations of the AI application based on properly chosen and executed tests. In addition to the known requirements of providing facilities for automated and reproducible tests, an assessment platform for Trustworthy AI must support the integration of different AI models and data sets, must be extensible for AI risk specific metrics and test tools, and should facilitate collaboration between model providers, assessment tool developers and auditors. In this paper, we develop an architecture of a platform for automated, reproducible and collaborative assessments of AI applications, based on an in-depth requirements analysis that maps use cases and collaboration scenarios to technical requirements.

References

[1]
Iso/iec/ieee international standard - systems and software engineering - life cycle processes - risk management. ISO/IEC/IEEE 16085:2021(E), pp. 1--60, 2021.
[2]
Iso/iec/ieee international standard - software and systems engineering - software testing - part 1:general concepts. ISO/IEC/IEEE 29119-1:2022(E), pp. 1--60, 2022.
[3]
Ai verify foundation. what is ai verify? 2023.
[4]
ISO/IEC 23894:2023. Information technology - Artificial intelligence - Guidance on risk management. Standard, International Organization for Standardization, Geneva, CH, 2023. URL https://www.iso.org/standard/77304.html.
[5]
Michael Armbrust, Ali Ghodsi, Reynold Xin, et al. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In Proceedings of CIDR, volume 8, 2021.
[6]
Monya Baker. Reproducibility crisis. Nature, 533(26):353--66, 2016.
[7]
Beck. Test Driven Development: By Example. Addison-Wesley Longman Publishing Co., Inc., USA, 2002. ISBN 0321146530.
[8]
Niklas Beck, Claudio Martens, Karl-Heinz Sylla, et al. Zukunftssichere Lösungen für maschinelles Lernen. 2020.
[9]
Andrew Bell, Ian Solano-Kamaiko, Oded Nov, et al. It's just not that simple: an empirical study of the accuracy-explainability trade-off in machine learning for public policy. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 248--266, 2022.
[10]
Rachel KE Bellamy, Kuntal Dey, Michael Hind, et al. Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63(4/5):4--1, 2019.
[11]
HoussemBen Braiek and Foutse Khomh. On testing machine learning programs. Journal of Systems and Software, 164:110542, 2020. ISSN 0164-1212.
[12]
Junyi Chai, Hao Zeng, Anming Li, et al. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications, 6:100134, 2021. ISSN 2666-8270.
[13]
Nishanth Chandran, Divya Gupta, Aseem Rastogi, et al. Ezpc: Programmable and efficient secure two-party computation for machine learning. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 496--511, 2019.
[14]
European Commission. Proposal for a regulation laying down harmonised rules on artificial int certain union legislative acts, 2021.
[15]
Eelco Dolstra, Andres Löh, and Nicolas Pierron. Nixos: A purely functional linux distribution. Journal of Functional Programming, 20(5--6):577--615, 2010.
[16]
Johnu George and Amit Saha. End-to-end machine learning using kubeflow. In 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), pp. 336--338, 2022.
[17]
D. Graham. Foundations of Software Testing: ISTQB Certification. Course Technology Cengage Learning, 2008. ISBN 9781844809899.
[18]
Odd Erik Gundersen, Saeid Shamsaliei, and Richard Juul Isdahl. Do machine learning platforms provide out-of-the-box reproducibility? Future Generation Computer Systems, 126:34--47, 2022.
[19]
Elena Haedecke, Michael Mock, and Maram Akila. Scrutinai: a visual analytics approach for the semantic analysis of deep neural network predictions. In EuroVis Workshop on Visual Analytics (EuroVA), pp. 73--775. The Eurographics Association, 2022.
[20]
Thilo Hagendorff. The ethics of AI ethics - an evaluation of guidelines. CoRR, abs/1903.03425, 2019. URL http://arxiv.org/abs/1903.03425.
[21]
Anna Hedström, Leander Weber, Daniel Krakowczyk, et al. Quantus: An explainable ai toolkit for responsible evaluation of neural network explanations and beyond. Journal of Machine Learning Research, 24(34):1--11, 2023.
[22]
Lennard Helmer, Claudio Martens, Dennis Wegener, et al. Towards Trustworthy AI Engineering - A Case study on integrating an AI audit catalogue into MLOps processes. In 2024 International Workshop on Responsible AI Engineering (RAIE '24), 2024.
[23]
Sebastian Houben, Stephanie Abrecht, Maram Akila, et al. Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety, pp. 3--78. Springer International Publishing, Cham, 2022. ISBN 978-3-031-01233-4.
[24]
Open Containers Initiative et al. Open container initiative runtime specification, 2020.
[25]
Mohd Javaid, Abid Haleem, Ravi Pratap Singh, et al. Artificial intelligence applications for industry 4.0: A literature-based study. Journal of Industrial Integration and Management, 7(01):83--111, 2022.
[26]
Foutse Khomh, Bram Adams, Jinghui Cheng, et al. Software engineering for machine-learning applications: The road ahead. IEEE Software, 35(5):81--84, 2018.
[27]
Bo Li, Peng Qi, Bo Liu, et al. Trustworthy ai: From principles to practices. ACM Computing Surveys, 55(9):1--46, 2023.
[28]
Percy Liang, Rishi Bommasani, Tony Lee, et al. Holistic evaluation of language models. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. Featured Certification, Expert Certification.
[29]
Qinghua Lu, Liming Zhu, Xiwei Xu, et al. Towards a roadmap on software engineering for responsible ai. In Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI, CAIN '22, pp. 101--112, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392754.
[30]
Qinghua Lu, Liming Zhu, Xiwei Xu, et al. Responsible ai pattern catalogue: A collection of best practices for ai governance and engineering. ACM Comput. Surv., oct 2023. ISSN 0360-0300. Just Accepted.
[31]
Xinsong Ma, Zekai Wang, and Weiwei Liu. On the tradeoff between robustness and fairness. Advances in Neural Information Processing Systems, 35:26230--26241, 2022.
[32]
Sean McGregor. Preventing repeated real world ai failures by cataloging incidents: The ai incident database. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 15458--15463, 2021.
[33]
Alexandru Moga, Thanikesavan Sivanthi, and Carsten Franke. Os-level virtualization for industrial automation systems: Are we there yet? In Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1838--1843, 2016.
[34]
CISA et al. NCSC. Guidelines for secure AI system development. Report, National Cyber Security Centre, Cybersecurity and Infrastructure Security Agency, London, GB, 2023. URL https://www.ncsc.gov.uk/files/Guidelines-for-secure-AI-system-development.pdf.
[35]
Maria-Irina Nicolae, Mathieu Sinn, Minh Ngoc Tran, et al. Adversarial robustness toolbox v1.0.0, 2019.
[36]
High-Level Expert Group on Artificial Intelligence. Ethics Guidelines for Trustworthy AI. Report, European Commission, Brussels, B, 2019. URL https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.
[37]
Maximilian Pintz, Joachim Sicking, Maximilian Poretschkin, et al. A survey on uncertainty toolkits for deep learning. ML Evaluation Standards Workshop at ICLR 2022, 2022.
[38]
Maximilian Poretschkin, Anna Schmitz, Maram Akila, et al. Guideline for Trustworthy Artificial Intelligence - AI Assessment Catalog, 2023. URL https://arxiv.org/abs/2307.03681.
[39]
Georg Rehm, Dimitrios Galanis, Penny Labropoulou, et al. Towards an interoperable ecosystem of ai and lt platforms: A roadmap for the implementation of different levels of interoperability. arXiv preprint arXiv:2004.08355, 2020.
[40]
Christian Sandvig, Kevin Hamilton, Karrie Karahalios, et al. Auditing algorithms : Research methods for detecting discrimination on internet platforms. 2014. URL https://api.semanticscholar.org/CorpusID:15686114.
[41]
Anna Schmitz, Maram Akila, Dirk Hecker, et al. The why and how of trustworthy ai. at - Automatisierungstechnik, 70(9):793--804, 2022.
[42]
Peter Schüller, João Paolo Costeira, James Crowley, et al. Composing complex and hybrid ai solutions. arXiv preprint arXiv:2202.12566, 2022.
[43]
Ehsan Toreini, Maryam Mehrnezhad, and Aad van Moorsel. Fairness as a service (faas): verifiable and privacy-preserving fairness auditing of machine learning systems jo - international journal of information security. 2023. URL https://api.semanticscholar.org/CorpusID:15686114.
[44]
Amirsina Torfi, Rouzbeh A Shirvani, Yaser Keneshloo, et al. Natural language processing advancements by deep learning: A survey. arXiv preprint arXiv:2003.01200, 2020.
[45]
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, et al. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2018.
[46]
Aditya Venkataraman and Kishore Kumar Jagadeesha. Evaluation of inter-process communication mechanisms. Architecture, 86:64, 2015.
[47]
Michael Walfish and Andrew J. Blumberg. Verifying computations without reexecuting them. Commun. ACM, 58(2):74--84, jan 2015. ISSN 0001-0782.
[48]
Boming Xia, Qinghua Lu, Harsha Perera, et al. Towards concrete and connected ai risk assessment (c2aira): A systematic mapping study. In 2023 IEEE/ACM 2nd International Conference on AI Engineering - Software Engineering for AI (CAIN), pp. 104--116, 2023.
[49]
Matei Zaharia, Andrew Chen, Aaron Davidson, et al. Accelerating the machine learning lifecycle with mlflow. IEEE Data Eng. Bull., 41(4):39--45, 2018.
[50]
Jie M Zhang, Mark Harman, Lei Ma, et al. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering, 48(1):1--36, 2020.
[51]
Qian Zhang, Jie Lu, and Yaochu Jin. Artificial intelligence in recommender systems. Complex & Intelligent Systems, 7:439--457, 2021.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RAIE '24: Proceedings of the 2nd International Workshop on Responsible AI Engineering
April 2024
62 pages
ISBN:9798400705724
DOI:10.1145/3643691
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2024

Check for updates

Qualifiers

  • Research-article

Conference

RAIE '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 187
    Total Downloads
  • Downloads (Last 12 months)187
  • Downloads (Last 6 weeks)32
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media