Abstract
Continuing growth of Artificial Intelligence (AI) adoption across enterprises and governments around the world has fueled the demand for trustworthy AI systems and applications. The need ranges from the so-called Explainable or Interpretable AI to Responsible AI, driven by the underlying demand for increasing confidence in deploying AI as part of Enterprise IT. Both internal to organizations as well as external, customer- and user-facing use cases based on AI are increasingly being expected to meet these demands. This paper describes the need for and definitions of trustworthiness and responsibility in AI systems, summarizes currently popular AI benchmarks, and deliberates on the challenges and the opportunities for assessing and benchmarking Trustworthy and Responsible aspects of AI systems and applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bourrasset, C., et al.: Requirements for an enterprise AI benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2018. LNCS, vol. 11135, pp. 71–81. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11404-6_6
MLPerf. https://mlcommons.org/
Mattson, P., et al.: MLPerf training benchmark. Proc. Mach. Learn. Syst. 2, 336–349 (2020)
Reddy, V.J., et al.: MLPerf Inference Benchmark. arXiv preprint arXiv: 1911:02549 (2019)
TPCx-AI. https://www.tpc.org/tpcx-ai/default5.asp.
Transaction Processing and Performance Council, “TPC Express Benchmark ™ AI - Full Disclosure Report” (2022)
Bommasani, R., et. al.: On the opportunities and risks of foundation models. arXiv preprint https://arxiv.org/pdf/2108.07258.pdf. (2022)
Hodak, M., Ellison, D., Dholakia, A.: Benchmarking AI inference: where we are in 2020. In: Nambiar, R., Poess, M. (eds.) TPCTC 2020. LNCS, vol. 12752, pp. 93–102. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84924-5_7
Hodak, M., Ellison, D., Dholakia, A.: Everyone is a winner: interpreting MLPerf inference benchmark results. In: Nambiar, R., Poess, M. (eds.) TPCTC 2021. LNCS, vol. 13169, pp. 50–61. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-94437-7_4
Coleman, C.A., et al.: DAWNBench: an end-to-end deep learning benchmark and competition. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017) (2017)
Arrieta, A.B., et al.: Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI (2019). arXiv:1910.10045v2
Adali, T., Guido, R.C., Ho, T.K., Müller, K.R., Strather, S.: Interpretability, reproducibility and replicability Guest editoriall. IEEE Signal Process. Mag. 39, 5–7 (2022)
National Academies of Sciences, Engineering and Medicine. Reproducibility and Replicability in Science. Washington DC, USA: National Academy Press (2019)
European Union High-level Independent Group on Artificial Intelligence. “Assessment List for Trustworthy AI” (2020). https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment.
Linux Foundation AI & Data’s Trusted AI Committee Principles Working Group “Linux Foundation AI & Data’s Principles for Trusted AI” (2021). https://lfaidata.foundation/blog/2021/02/08/lf-ai-data-announces-principles-for-trusted-ai/.
OECD.AI “OECD AI Principles” (2019). https://oecd.ai/en/ai-principles.
Nielsen, I.E., Dera, D., Rasool, G., Ramachandran, R.P., Bouaynaya, N.C.: Robust explainability. IEEE Signal Process. Mag. 39, 73–84 (2022)
Bravo-Rocca, G., Liu, P., Guitart, J., Dholakia, A., Ellison, D., Hodak, M.: Human-in-the-loop online multi-agent approach to increase trustworthiness in ML models through trust scores and data augmentation. In: IEEE COMPSAC (2022)
Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810:04805v2 (2019)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
Jacovi, A., Goldberg, Y.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness?” In: Proceedings of ACL, pp. 4198–4205 (2020)
Bibal, A., et al.: Is attention explanation? An introduction to the debate. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 3889–3900 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dholakia, A., Ellison, D., Hodak, M., Dutta, D. (2023). Benchmarking Considerations for Trustworthy and Responsible AI (Panel). In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking. TPCTC 2022. Lecture Notes in Computer Science, vol 13860. Springer, Cham. https://doi.org/10.1007/978-3-031-29576-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-29576-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29575-1
Online ISBN: 978-3-031-29576-8
eBook Packages: Computer ScienceComputer Science (R0)