Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Comment
  • Published:

The incentive gap in data work in the era of large models

There are repeated calls in the AI community to prioritize data work — collecting, curating, analysing and otherwise considering the quality of data. But this is not practised as much as advocates would like, often because of a lack of institutional and cultural incentives. One way to encourage data work would be to reframe it as more technically rigorous, and thereby integrate it into more-valued lines of research such as model innovation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

References

  1. Jiang, M., Rocktäschel, T. & Grefenstette, E. Preprint at https://arxiv.org/abs/2211.07819 (2022).

  2. Liang, W. et al. Nat. Mach. Intell. 4, 669–677 (2022).

    Article  Google Scholar 

  3. Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P. & Aroyo, L. M. in Proc. 2021 CHI Conference on Human Factors in Computing Systems 1–15 (Assoc. Computing Machinery, 2020).

  4. Liberman, M. Comp. Linguistics 36, 595–599 (2010).

    Article  Google Scholar 

  5. Zhou, K., Jurafsky, D. & Hashimoto, T. Preprint at https://arxiv.org/abs/2302.13439 (2023).

  6. Kaplan, J. et al. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020).

  7. Yang, K., Qinami, K., Fei-Fei, L., Deng, J. & Russakovsky, O. in Proc. 2020 Conference on Fairness, Accountability, and Transparency 547–558 (Assoc. Computing Machinery, 2020).

  8. Brown, T. B. et al. in Advances in Neural Information Processing Systems 33 https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html (NeurIPS, 2020).

  9. Narayanan, A. The Limits of the Quantitative Approach to Discrimination (James Baldwin Lecture, 2022).

  10. Birhane, A. et al. in 2022 ACM Conference on Fairness, Accountability, and Transparency 173–184 (Assoc. Computing Machinery, 2022).

  11. Faulkner, W. Social Studies Sci. 30, 759–792 (2000).

    Article  Google Scholar 

  12. Semenova, L., Rudin, C. & Parr, R. in 2022 ACM Conference on Fairness, Accountability, and Transparency 1827–1858 (Assoc. Computing Machinery, 2022).

  13. Koch, B., Denton, E., Hanna, A. & Foster, J. G. in 35th Conference on Neural Information Processing Systems (2021).

  14. Bandy, J. & Vincent, N. in Proc. Neural Information Processing Systems Track on Datasets and Benchmarks 1 https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021 (NeurIPS Datasets and Benchmarks, 2021).

  15. Caselli, T., Basile, V., Mitrović, J. & Granitzer, M. in Proc. 5th Workshop on Online Abuse and Harms https://aclanthology.org/2021.woah-1.3/ (WOAH, 2021).

  16. Borkan, D., Dixon, L., Sorensen, J., Thain, N. & Vasserman, L. in Companion Proc. 2019 World Wide Web Conference 491–500 (ACM, 2019).

  17. Sattigeri, P., Ghosh, S., Padhi, I., Dognin, P., & Varshney K. in Advances in Neural Information Processing Systems 35 (2022).

  18. Srivastava, A. et al. Preprint at https://arxiv.org/abs/2206.04615 (2022).

  19. Das, P. & Varshney, L. R. IEEE Signal Proc. Mag. 39, 85–95 (2022).

    Article  Google Scholar 

  20. Rothschild, A. et al. in Proc. ACM on Human–Computer Interaction 6 article 307 (Assoc. for Computing Machinery, 2022).

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Katy Ilonka Gero, Payel Das or Kush R. Varshney.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Margaret Mitchell for their contribution to the peer review of this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gero, K.I., Das, P., Dognin, P. et al. The incentive gap in data work in the era of large models. Nat Mach Intell 5, 565–567 (2023). https://doi.org/10.1038/s42256-023-00673-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00673-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing