The incentive gap in data work in the era of large models

Gero, Katy Ilonka; Das, Payel; Dognin, Pierre; Padhi, Inkit; Sattigeri, Prasanna; Varshney, Kush R.

doi:10.1038/s42256-023-00673-x

Comment
Published: 22 June 2023

The incentive gap in data work in the era of large models

Nature Machine Intelligence volume 5, pages 565–567 (2023)Cite this article

678 Accesses
2 Citations
8 Altmetric
Metrics details

Subjects

There are repeated calls in the AI community to prioritize data work — collecting, curating, analysing and otherwise considering the quality of data. But this is not practised as much as advocates would like, often because of a lack of institutional and cultural incentives. One way to encourage data work would be to reframe it as more technically rigorous, and thereby integrate it into more-valued lines of research such as model innovation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

References

Jiang, M., Rocktäschel, T. & Grefenstette, E. Preprint at https://arxiv.org/abs/2211.07819 (2022).
Liang, W. et al. Nat. Mach. Intell. 4, 669–677 (2022).
Article Google Scholar
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P. & Aroyo, L. M. in Proc. 2021 CHI Conference on Human Factors in Computing Systems 1–15 (Assoc. Computing Machinery, 2020).
Liberman, M. Comp. Linguistics 36, 595–599 (2010).
Article Google Scholar
Zhou, K., Jurafsky, D. & Hashimoto, T. Preprint at https://arxiv.org/abs/2302.13439 (2023).
Kaplan, J. et al. Preprint at https://doi.org/10.48550/arXiv.2001.08361 (2020).
Yang, K., Qinami, K., Fei-Fei, L., Deng, J. & Russakovsky, O. in Proc. 2020 Conference on Fairness, Accountability, and Transparency 547–558 (Assoc. Computing Machinery, 2020).
Brown, T. B. et al. in Advances in Neural Information Processing Systems 33 https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html (NeurIPS, 2020).
Narayanan, A. The Limits of the Quantitative Approach to Discrimination (James Baldwin Lecture, 2022).
Birhane, A. et al. in 2022 ACM Conference on Fairness, Accountability, and Transparency 173–184 (Assoc. Computing Machinery, 2022).
Faulkner, W. Social Studies Sci. 30, 759–792 (2000).
Article Google Scholar
Semenova, L., Rudin, C. & Parr, R. in 2022 ACM Conference on Fairness, Accountability, and Transparency 1827–1858 (Assoc. Computing Machinery, 2022).
Koch, B., Denton, E., Hanna, A. & Foster, J. G. in 35th Conference on Neural Information Processing Systems (2021).
Bandy, J. & Vincent, N. in Proc. Neural Information Processing Systems Track on Datasets and Benchmarks 1 https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021 (NeurIPS Datasets and Benchmarks, 2021).
Caselli, T., Basile, V., Mitrović, J. & Granitzer, M. in Proc. 5th Workshop on Online Abuse and Harms https://aclanthology.org/2021.woah-1.3/ (WOAH, 2021).
Borkan, D., Dixon, L., Sorensen, J., Thain, N. & Vasserman, L. in Companion Proc. 2019 World Wide Web Conference 491–500 (ACM, 2019).
Sattigeri, P., Ghosh, S., Padhi, I., Dognin, P., & Varshney K. in Advances in Neural Information Processing Systems 35 (2022).
Srivastava, A. et al. Preprint at https://arxiv.org/abs/2206.04615 (2022).
Das, P. & Varshney, L. R. IEEE Signal Proc. Mag. 39, 85–95 (2022).
Article Google Scholar
Rothschild, A. et al. in Proc. ACM on Human–Computer Interaction 6 article 307 (Assoc. for Computing Machinery, 2022).

Download references

Author information

Authors and Affiliations

IBM Research–T. J. Watson Research Center, Yorktown Heights, NY, USA
Katy Ilonka Gero, Payel Das, Pierre Dognin, Inkit Padhi, Prasanna Sattigeri & Kush R. Varshney
Columbia University, New York, NY, USA
Katy Ilonka Gero

Authors

Katy Ilonka Gero
View author publications
You can also search for this author in PubMed Google Scholar
Payel Das
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Dognin
View author publications
You can also search for this author in PubMed Google Scholar
Inkit Padhi
View author publications
You can also search for this author in PubMed Google Scholar
Prasanna Sattigeri
View author publications
You can also search for this author in PubMed Google Scholar
Kush R. Varshney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Katy Ilonka Gero, Payel Das or Kush R. Varshney.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Margaret Mitchell for their contribution to the peer review of this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gero, K.I., Das, P., Dognin, P. et al. The incentive gap in data work in the era of large models. Nat Mach Intell 5, 565–567 (2023). https://doi.org/10.1038/s42256-023-00673-x

Download citation

Published: 22 June 2023
Issue Date: June 2023
DOI: https://doi.org/10.1038/s42256-023-00673-x

This article is cited by

Getting real about synthetic data ethics
- Danielle Shanley
- Joshi Hogenboom
- Darian Meacham
EMBO Reports (2024)

The incentive gap in data work in the era of large models

Subjects

Access options

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Rights and permissions

About this article

Cite this article

This article is cited by

Getting real about synthetic data ethics

Search

Quick links

Subjects

Access options

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Getting real about synthetic data ethics

Search

Quick links