skip to main content
research-article

Automated Annotations for AI Data and Model Transparency

Published: 11 December 2021 Publication History

Abstract

The data and Artificial Intelligence revolution has had a massive impact on enterprises, governments, and society alike. It is fueled by two key factors. First, data have become increasingly abundant and are often available openly. Enterprises have more data than they can process. Governments are spearheading open data initiatives by setting up data portals such as data.gov and releasing large amounts of data to the public. Second, AI engineering development is becoming increasingly democratized. Open source frameworks have enabled even an individual developer to engineer sophisticated AI systems. But with such ease of use comes the potential for irresponsible use of data.
Ensuring that AI systems adhere to a set of ethical principles is one of the major problems of our age. We believe that data and model transparency has a key role to play in mitigating the deleterious effects of AI systems. In this article, we describe a framework to synthesize ideas from various domains such as data transparency, data quality, data governance among others to tackle this problem. Specifically, we advocate an approach based on automated annotations (of both data and the AI model), which has a number of appealing properties. The annotations could be used by enterprises to get visibility of potential issues, prepare data transparency reports, create and ensure policy compliance, and evaluate the readiness of data for diverse downstream AI applications. We propose a model architecture and enumerate its key components that could achieve these requirements. Finally, we describe a number of interesting challenges and opportunities.

References

[1]
Mitra Basu and Tin Kam Ho. 2006. Data Complexity in Pattern Recognition. Springer Science & Business Media.
[2]
Elisa Bertino, Shawn Merrill, Alina Nesen, and Christine Utz. 2019. Redefining data transparency: A multidimensional approach. Computer 52, 1 (2019), 16–26.
[3]
Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017. The data civilizer system. In CIDR’17.
[4]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for datasets. arXiv:1803.09010. Retrieved from https://arxiv.org/abs/1803.09010.
[5]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv:1805.03677. http://arxiv.org/abs/1805.03677.
[6]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In FAT*19. 220–229. https://doi.org/10.1145/3287560.3287596
[7]
Julia Stoyanovich and Bill Howe. 2019. Nutritional labels for data and models. IEEE Data Eng. Bull. 42, 3 (2019), 13–23.
[8]
Chenkai Sun, Abolfazl Asudeh, H. V. Jagadish, Bill Howe, and Julia Stoyanovich. 2019. Mithralabel: Flexible dataset nutritional labels for responsible data science. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2893–2896.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 14, Issue 1
March 2022
61 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3505184
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2021
Accepted: 01 April 2021
Revised: 01 February 2021
Received: 01 October 2020
Published in JDIQ Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data transparency
  2. data cleaning
  3. machine learning

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 711
    Total Downloads
  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)22
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media