Benchmarking Considerations for Trustworthy and Responsible AI (Panel)

Dholakia, Ajay; Ellison, David; Hodak, Miro; Dutta, Debojyoti

doi:10.1007/978-3-031-29576-8_8

Ajay Dholakia⁹,
David Ellison⁹,
Miro Hodak¹⁰ &
…
Debojyoti Dutta¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13860))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

296 Accesses

Abstract

Continuing growth of Artificial Intelligence (AI) adoption across enterprises and governments around the world has fueled the demand for trustworthy AI systems and applications. The need ranges from the so-called Explainable or Interpretable AI to Responsible AI, driven by the underlying demand for increasing confidence in deploying AI as part of Enterprise IT. Both internal to organizations as well as external, customer- and user-facing use cases based on AI are increasingly being expected to meet these demands. This paper describes the need for and definitions of trustworthiness and responsibility in AI systems, summarizes currently popular AI benchmarks, and deliberates on the challenges and the opportunities for assessing and benchmarking Trustworthy and Responsible aspects of AI systems and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bourrasset, C., et al.: Requirements for an enterprise AI benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2018. LNCS, vol. 11135, pp. 71–81. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11404-6_6
Chapter Google Scholar
MLPerf. https://mlcommons.org/
Mattson, P., et al.: MLPerf training benchmark. Proc. Mach. Learn. Syst. 2, 336–349 (2020)
Google Scholar
Reddy, V.J., et al.: MLPerf Inference Benchmark. arXiv preprint arXiv: 1911:02549 (2019)
Google Scholar
TPCx-AI. https://www.tpc.org/tpcx-ai/default5.asp.
Transaction Processing and Performance Council, “TPC Express Benchmark ™ AI - Full Disclosure Report” (2022)
Google Scholar
Bommasani, R., et. al.: On the opportunities and risks of foundation models. arXiv preprint https://arxiv.org/pdf/2108.07258.pdf. (2022)
Hodak, M., Ellison, D., Dholakia, A.: Benchmarking AI inference: where we are in 2020. In: Nambiar, R., Poess, M. (eds.) TPCTC 2020. LNCS, vol. 12752, pp. 93–102. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84924-5_7
Chapter Google Scholar
Hodak, M., Ellison, D., Dholakia, A.: Everyone is a winner: interpreting MLPerf inference benchmark results. In: Nambiar, R., Poess, M. (eds.) TPCTC 2021. LNCS, vol. 13169, pp. 50–61. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-94437-7_4
Chapter Google Scholar
Coleman, C.A., et al.: DAWNBench: an end-to-end deep learning benchmark and competition. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017) (2017)
Google Scholar
Arrieta, A.B., et al.: Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI (2019). arXiv:1910.10045v2
Adali, T., Guido, R.C., Ho, T.K., Müller, K.R., Strather, S.: Interpretability, reproducibility and replicability Guest editoriall. IEEE Signal Process. Mag. 39, 5–7 (2022)
Article Google Scholar
National Academies of Sciences, Engineering and Medicine. Reproducibility and Replicability in Science. Washington DC, USA: National Academy Press (2019)
Google Scholar
European Union High-level Independent Group on Artificial Intelligence. “Assessment List for Trustworthy AI” (2020). https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment.
Linux Foundation AI & Data’s Trusted AI Committee Principles Working Group “Linux Foundation AI & Data’s Principles for Trusted AI” (2021). https://lfaidata.foundation/blog/2021/02/08/lf-ai-data-announces-principles-for-trusted-ai/.
OECD.AI “OECD AI Principles” (2019). https://oecd.ai/en/ai-principles.
Nielsen, I.E., Dera, D., Rasool, G., Ramachandran, R.P., Bouaynaya, N.C.: Robust explainability. IEEE Signal Process. Mag. 39, 73–84 (2022)
Article Google Scholar
Bravo-Rocca, G., Liu, P., Guitart, J., Dholakia, A., Ellison, D., Hodak, M.: Human-in-the-loop online multi-agent approach to increase trustworthiness in ML models through trust scores and data augmentation. In: IEEE COMPSAC (2022)
Google Scholar
Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: 32^nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810:04805v2 (2019)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
Google Scholar
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
Article Google Scholar
Jacovi, A., Goldberg, Y.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness?” In: Proceedings of ACL, pp. 4198–4205 (2020)
Google Scholar
Bibal, A., et al.: Is attention explanation? An introduction to the debate. In: Proceedings of the 60^th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 3889–3900 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Lenovo, Infrastructure Solutions Group, Morrisville, NC, USA
Ajay Dholakia & David Ellison
AMD, San Jose, CA, USA
Miro Hodak
Nutanix, San Jose, CA, USA
Debojyoti Dutta

Authors

Ajay Dholakia
View author publications
You can also search for this author in PubMed Google Scholar
David Ellison
View author publications
You can also search for this author in PubMed Google Scholar
Miro Hodak
View author publications
You can also search for this author in PubMed Google Scholar
Debojyoti Dutta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ajay Dholakia .

Editor information

Editors and Affiliations

Advanced Micro Devices Inc., Santa Clara, CA, USA
Raghunath Nambiar
Oracle Corporation, Redwood City, CA, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dholakia, A., Ellison, D., Hodak, M., Dutta, D. (2023). Benchmarking Considerations for Trustworthy and Responsible AI (Panel). In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking. TPCTC 2022. Lecture Notes in Computer Science, vol 13860. Springer, Cham. https://doi.org/10.1007/978-3-031-29576-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-29576-8_8
Published: 28 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29575-1
Online ISBN: 978-3-031-29576-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics