Systematic analysis of 32,111 AI model cards characterizes documentation practice in AI

Liang, Weixin; Rajani, Nazneen; Yang, Xinyu; Ozoani, Ezinwanne; Wu, Eric; Chen, Yiqun; Smith, Daniel Scott; Zou, James

doi:10.1038/s42256-024-00857-z

Article
Published: 21 June 2024

Systematic analysis of 32,111 AI model cards characterizes documentation practice in AI

Nature Machine Intelligence volume 6, pages 744–753 (2024)Cite this article

1364 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

The rapid proliferation of AI models has underscored the importance of thorough documentation, which enables users to understand, trust and effectively use these models in various applications. Although developers are encouraged to produce model cards, it’s not clear how much or what information these cards contain. In this study we conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face, a leading platform for distributing and deploying AI models. Our investigation sheds light on the prevailing model card documentation practices. Most AI models with a substantial number of downloads provide model cards, although with uneven informativeness. We find that sections addressing environmental impact, limitations and evaluation exhibit the lowest filled-out rates, whereas the training section is the one most consistently filled-out. We analyse the content of each section to characterize practitioners’ priorities. Interestingly, there are considerable discussions of data, sometimes with equal or even greater emphasis than the model itself. Our study provides a systematic assessment of community norms and practices surroinding model documentation through large-scale data science and linguistic analysis.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of 74,970 AI models hosted on Hugging Face.**

**Fig. 2: Section analysis of Hugging Face model cards.**

**Fig. 3: Uncovering key themes in model card sections through topic modelling analysis.**

AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content

Article Open access 27 September 2024

The algorithm journey map: a tangible approach to implementing AI solutions in healthcare

Article Open access 09 April 2024

Establishing responsible use of AI guidelines: a comprehensive case study for healthcare institutions

Article Open access 30 November 2024

Data availability

The Hugging Face model cards data are public on Hugging Face at https://Hugging Face.co/models and can be accessed through the Hugging Face Hub API at https://Hugging Face.co/docs/Hugging Face_hub/package_reference/hf_api.

Code availability

The analysis code is publicly available at https://github.com/Weixin-Liang/AI-model-card-analysis-Hugging Face (ref. ⁶²).

References

Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A. & Zou, J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 186, 1772–1791 (2023).
Article Google Scholar
Liang, W. et al. Advances, challenges and opportunities in creating data for trustworthy AI. Nat. Mach. Intell. 4, 669 – 677 (2022).
Google Scholar
Arrieta, A. B. et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58, 82–115 (2020).
Article Google Scholar
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (Lulu, 2020).
Shen, H. et al. Value cards: an educational toolkit for teaching social impacts of machine learning through deliberation. In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 850–861 (ACM, 2021).
Seifert, C., Scherzinger, S. & Wiese, L. Towards generating consumer labels for machine learning models. In 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI) 173–179 (IEEE, 2019).
Mitchell, M. et al. Model cards for model reporting. In Proc. Conference on Fairness, Accountability, and Transparency 220–229 (ACM, 2019).
Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (ACM, 2021).
Arnold, M. et al. Factsheets: increasing trust in ai services through supplier’s declarations of conformity. IBM J. Res. Dev. 63, 1–13 (2019).
Article Google Scholar
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. 54, 1–35 (2021).
Article Google Scholar
He, B. et al. Blinded, randomized trial of sonographer versus AI cardiac function assessment. Nature 616, 520–524 (2023).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should i trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).
Raji, I. D. & Yang, J. About ML: annotation and benchmarking on understanding and transparency of machine learning lifecycles. Preprint at https://arxiv.org/abs/1912.06166 (2019).
Diakopoulos, N. Accountability in algorithmic decision making. Commun. ACM 59, 56–62 (2016).
Article Google Scholar
Selbst, A. & Powles, J. ‘Meaningful information’ and the right to explanation. In Conference on Fairness, Accountability and Transparency 48–48 (PMLR, 2018).
Kulesza, T., Burnett, M., Wong, W.-K. & Stumpf, S. Principles of explanatory debugging to personalize interactive machine learning. In Proc. 20th International Conference on Intelligent User Interfaces 126–137 (ACM, 2015).
Holland, S., Hosny, A., Newman, S., Joseph, J. & Chmielinski, K. The dataset nutrition label: a framework to drive higher data quality standards. Preprint at https://arxiv.org/abs/1805.03677 (2018).
Gebru, T. et al. Datasheets for datasets. Commun. ACM 64, 86–92 (2021).
Article Google Scholar
Hugging Face Model Card Guidebook (Hugging Face, accessed 7 May 2023).
Bracamonte, V., Pape, S., Löbner, S. & Tronnier, F. Effectiveness and information quality perception of an ai model card: a study among non-experts. In Proc. 2023 20th Annual International Conference on Privacy, Security and Trust (PST) 1–7 (IEEE, 2023).
Conover, M. et al. Hello Dolly: Democratizing the Magic of ChatGPT with Open Models (Hugging Face, 2023); https://HuggingFace.co/databricks/dolly-v1-6b
Chiang, W.-L. et al. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality (Limsys, 2023).
Bender, E. M. & Friedman, B. Data statements for natural language processing: toward mitigating system bias and enabling better science. Trans. Assoc. Comput. Linguistics 6, 587–604 (2018).
Article Google Scholar
Bender, E. M., Friedman, B. & McMillan-Major, A. A Guide for Writing Data Statements for Natural Language Processing (Tech Policy Lab, 2021).
Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: purposeful and transparent dataset documentation for responsible AI. In 2022 ACM Conference on Fairness, Accountability, and Transparency 1776–1826 (ACM, 2022).
Brundage, M. et al. Toward trustworthy AI development: mechanisms for supporting verifiable claims. Preprint at https://arxiv.org/abs/2004.07213 (2020).
Jobin, A., Ienca, M. & Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019).
Article Google Scholar
Mackiewicz, R. E. Overview of IEC 61850 and benefits. In 2006 IEEE Power Engineering Society General Meeting 8 (IEEE, 2006).
Taori, R. et al. Stanford Alpaca: An Instruction-Following llama Model (Hugging Face, 2023); https://HuggingFace.co/models/tatsu-lab/alpaca
Köpf, A. et al. Openassistant conversations–democratizing large language model alignment. Preprint at https://arxiv.org/abs/2304.07327 (2023).
Trending Model (Hugging Face, accessed 12 April 2023); https://Hugging Face.co/?trending=model
Displaying Carbon Emissions for Your Model (Hugging Face, accessed 12 April 2023); https://HuggingFace.co/docs/hub/model-cards-co2
AutoTrain (Hugging Face, accessed 12 April 2023); https://Hugging Face.co/autotrain
Model Card User Studies (Hugging Face, accessed 12 April 2023); https://Hugging Face.co/docs/hub/model-cards-user-studies
Smith, J. J., Amershi, S., Barocas, S., Wallach, H. M. & Vaughan, J. W. Real ML: recognizing, exploring, and articulating limitations of machine learning research. In 2022 ACM Conference on Fairness, Accountability, and Transparency (ACM, 2022).
Ioannidis, J. P. A. Limitations are not properly acknowledged in the scientific literature. J. Clin. Epidemiol. 60 4, 324–329 (2007).
Article Google Scholar
Sambasivan, N. et al. ‘Everyone wants to do the model work, not the data work’: data cascades in high-stakes AI. In Proc. 2021 CHI Conference on Human Factors in Computing Systems 1–15 (CHI, 2021).
State of Data Science 2020 (Anaconda, accessed 22 May 2023); https://www.anaconda.com/state-of-data-science-2020
Wolf, T. et al. Transformers: state-of-the-art natural language processing. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations 38–45 (ACL, 2020).
McMillan-Major, A. et al. Reusable templates and guides for documenting datasets and models for natural language processing and generation: a case study of the Hugging Face and GEM data and model cards. In Proc. 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) (eds Bosselut, A. et al.) 121–135 (ACL, 2021).
Kaggle ML and DS Survey (Kaggle, accessed 27 August 2020); https://www.kaggle.com/c/kagglesurvey-2019
Halevy, A., Norvig, P. & Pereira, F. The unreasonable effectiveness of data. IEEE Intell. Syst. 24, 8–12 (2009).
Article Google Scholar
Sculley, D. et al. Hidden technical debt in machine learning systems. In Advances in Neural Information Processing System Vol. 28 (NurIPS, 2015).
Qiu, H. S., Li, Y. L., Padala, S., Sarma, A. & Vasilescu, B. The signals that potential contributors look for when choosing open-source projects. In Proc. ACM on Human–Computer Interaction Vol. 3, 1–29 (ACM, 2019).
Vasilescu, B. et al. Gender and tenure diversity in github teams. In Proc. 33rd Annual ACM Conference on Human Factors in Computing Systems 3789–3798 (ACM, 2015).
Begel, A., Bosch, J. & Storey, M.-A. Social networking meets software development: perspectives from GitHub, MSDN, Stack Exchange, and Topcoder. IEEE Softw. 30, 52–66 (2013).
Article Google Scholar
Fan, Y., Xia, X., Lo, D., Hassan, A. E. & Li, S. What makes a popular academic AI repository? Empirical Softw. Eng. 26, 1–35 (2021).
Article Google Scholar
Fiesler, C., Garrett, N. & Beard, N. What do we teach when we teach tech ethics? A syllabi analysis. In Proc. 51st ACM Technical Symposium on Computer Science Education 289–295 (ACM, 2020).
Reich, R., Sahami, M., Weinstein, J. M. & Cohen, H. Teaching computer ethics: a deeply multidisciplinary approach. In Proc. 51st ACM Technical Symposium on Computer Science Education 296–302 (ACM, 2020).
Bates, J. et al. Integrating fate/critical data studies into data science curricula: where are we going and how do we get there? In Proc. 2020 Conference on Fairness, Accountability, and Transparency 425–435 (ACM, 2020).
Leidig, P. M. & Cassel, L. ACM taskforce efforts on computing competencies for undergraduate data science curricula. In Proc. 2020 ACM Conference on Innovation and Technology in Computer Science Education 519–520 (ACM, 2020).
Chmielinski, K. S. et al. The dataset nutrition label (2nd gen): leveraging context to mitigate harms in artificial intelligence. Preprint at https://arxiv.org/abs/2201.03954 (2022).
Hutchinson, B. et al. Towards accountability for machine learning datasets: practices from software engineering and infrastructure. In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 560–575 (ACM, 2021).
Zou, J. & Schiebinger, L. Design AI so that it’s fair. Nature 559, 324–326 (2018).
Article Google Scholar
Regulation, P. Regulation (EU) 2016/679 of the European parliament and of the council. Regulation 679, 2016 (2016).
Google Scholar
Goodman, B. & Flaxman, S. European union regulations on algorithmic decision-making and a ‘right to explanation’. AI Magazine 38, 50–57 (2017).
Article Google Scholar
Grootendorst, M. BERTtopic: neural topic modeling with a class-based TF-IDF procedure. Preprint at https://arxiv.org/abs/2203.05794 (2022).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018).
Reimers, N. & Gurevych, I. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing 4512–4525 (ACL, 2020).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
McInnes, L. & Healy, J. Accelerated hierarchical density based clustering. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW) 33–42 (IEEE, 2017).
Liang, W. & Yang, X. AI-Model-Card-Analysis-Hugging Face: Analysis of AI Model Cards on Hugging Face Hub (Zenodo, 2023); https://doi.org/10.5281/zenodo.11179952

Download references

Acknowledgements

We thank D. McFarland and H. Fang for discussions. J.Z. is supported by the National Science Foundation (grant nos. CCF 1763191 and CAREER 1942926), the US National Institutes of Health (grant nos. P30AG059307 and U01MH098953), and grants from Stanford HAI and the Chan Zuckerberg Initiative.

Author information

These authors contributed equally: Weixin Liang, Nazneen Rajani, Xinyu Yang.

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, USA
Weixin Liang & James Zou
Hugging Face, Inc., Palo Alto, CA, USA
Nazneen Rajani & Ezinwanne Ozoani
Department of Information Science, Cornell University, Ithaca, NY, USA
Xinyu Yang
Department of Electrical Engineering, Stanford University, Stanford, CA, USA
Eric Wu & James Zou
Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
Yiqun Chen & James Zou
Graduate School of Education, Stanford University, Stanford, CA, USA
Daniel Scott Smith

Authors

Weixin Liang
View author publications
You can also search for this author in PubMed Google Scholar
Nazneen Rajani
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ezinwanne Ozoani
View author publications
You can also search for this author in PubMed Google Scholar
Eric Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Scott Smith
View author publications
You can also search for this author in PubMed Google Scholar
James Zou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.L., N.R., X.Y. and D.S.S. designed the study framework and oversaw the systematic analysis. W.L. and X.Y. conducted the linguistic analysis of the model cards. W.L., X.Y. and J.Z. wrote the paper, with substantial input from all authors. N.R., E.O., E.W. and Y.C. contributed to data collection and preprocessing for the intervention study. J.Z. provided the overall direction and planning of the project.

Corresponding author

Correspondence to James Zou.

Ethics declarations

Competing interests

N.R. and E.O. are employees of Hugging Face. The other authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Odd Erik Gunderson, Arvind Narayanan, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Adherence to Hugging Face Model Card Norms.

This figure reveals that there is considerable room for improvement in model card adherence to established community norms. Specifically, only 20% of the top 100 model cards and 10.2% of the top 500 model cards fully incorporate all recommended sections. Furthermore, there is a significant correlation between a model card’s adherence to community standards and the model’s downloads. The fraction of model cards that adhere to the community norms, grouped by download frequency, is displayed on the y-axis with error bars representing the SEM within each group.

Extended Data Fig. 2 Differences in Model Card Practices Between Organizational and Individual Accounts.

This figure illustrates the disparities in model card practices between organizational and individual accounts. Download rankings are based on the entirety of model cards. For each account type, we count models of this type in every download group and calculate the percentage meeting to specified criteria. It highlights (a) The Degree of Adherence to Model Card Standards and (b) The Completion Rate of the Limitations Section. Organizational accounts show significantly greater compliance with model card norms, especially noted in their more thorough documentation of limitations across various download groups.

Extended Data Fig. 3 In-depth Analysis of Section Word Counts in Model Cards.

(a) Comparative Assessment of Average Section Lengths in Model Cards Based on Word Count. This figure displays the average section length, measured in word count, among completed sections for all model cards, the top 1000 model cards, and the top 100 model cards. Sections such as How to Start, Training, and Limitations are substantially longer, while Citation, Evaluation, Environmental Impact, and Intended Uses are relatively shorter. Interestingly, despite its lower completion rate, the Limitations section exhibits one of the highest average word counts (161 words in the top 1000 model cards). (b-c) Disparate Community Attention Patterns Across Model Card Sections, Analysed for both the top 100 model cards (b) and all model cards (c). The Environmental Impact section demonstrates both a low completion rate and a low average word count, indicating limited community attention. In contrast, the Training section displays high completion rates and average word counts, signifying greater community engagement.

Extended Data Fig. 4 Temporal Trends of Fraction of Model Cards Containing Limitation Section.

This figure illustrates the quarterly trends in the proportion of model cards that contain a limitations section, from 2020 to 2022. It highlights a noticeable decline in the occurrence of Limitation sections in model cards over time. Error bars in the plot represent the SEM, indicating the variability of the data within each quarter.

Extended Data Fig. 5 Model Card Intervention Study.

(a) Experimental design: A schematic representation of the Model Card Intervention Study, delineating the selection of models, division into treatment (two batches) and control groups, and model card intervention process. Analysis is conducted on the document level (26 in batch 1, 16 in batch 2, and 92 in control). (b) Outcome: Box plots displaying the percentage change in average weekly downloads for the treatment and control groups in Batches 1 and 2. For each colour-filled box, three horizontal lines correspond to the 25th, 50th, and 75th percentiles; the upper (lower) whiskers extend from the 75th (25th) percentiles to the largest (smallest) value no further than 1.5 × interquartile range. Statistical significance (two-sided p-values) derived from a difference-in-difference analysis (using robust linear regression) is included for both batches. Overall, our analysis revealed a moderate effect of model cards on model downloads.

Extended Data Table 1 Temporal trends in the completion rates of different model card sections

Full size table

Extended Data Table 2 Highly Downloaded Models Have Longer Model Cards

Full size table

Extended Data Table 3 Correlation between the comprehensiveness of model card sections and model downloads

Full size table

Extended Data Table 4 Temporal Trends in the Proportion of Discussions Related to Data in Limitation and Training Sections

Full size table

Supplementary information

Supplementary Information

Additional information on the model card intervention study.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, W., Rajani, N., Yang, X. et al. Systematic analysis of 32,111 AI model cards characterizes documentation practice in AI. Nat Mach Intell 6, 744–753 (2024). https://doi.org/10.1038/s42256-024-00857-z

Download citation

Received: 17 October 2023
Accepted: 21 May 2024
Published: 21 June 2024
Issue Date: July 2024
DOI: https://doi.org/10.1038/s42256-024-00857-z