skip to main content
research-article

From Bias to Repair: Error as a Site of Collaboration and Negotiation in Applied Data Science Work

Published:16 April 2023Publication History
Skip Abstract Section

Abstract

Managing error has become an increasingly central and contested arena within data science work. While recent scholarship in artificial intelligence and machine learning has focused on limiting and eliminating error, practitioners have long used error as a site of collaboration and learning vis-à-vis labelers, domain experts, and the worlds data scientists seek to model and understand. Drawing from work in CSCW, STS, HCML, and repair studies, as well as from multi-sited ethnographic fieldwork within a government institution and a non-profit organization, we move beyond the notion of error as an edge case or anomaly to make three basic arguments. First, error discloses or calls to attention existing structures of collaboration unseen or underappreciated under 'working' systems. Second, error calls into being new forms and sites of collaboration (including, sometimes, new actors). Third, error redeploys old sites and actors in new ways, often through restructuring relations of hierarchy and expertise which recenter or devalue the position of different actors. We conclude by discussing how an artful living with error can better support the creative strategies of negotiation and adjustment which data scientists and their collaborators engage in when faced with disruption, breakdown, and friction in their work.

References

  1. Mark S. Ackerman. "The intellectual challenge of CSCW: the gap between social requirements and technical feasibility." Human--Computer Interaction 15, no. 2--3 (2000): 179--203.Google ScholarGoogle Scholar
  2. Mike Ananny. 2022. Seeing Like an Algorithmic Error: What are Algorithmic Mistakes, Why Do They Matter, How Might They Be Public Problems? In The Yale Information Society Project & Yale Journal Of Law And Technology White Paper Series. https://yjolt.org/sites/default/files/0_-_ananny_-_seeing_like_an_algorithmic_error.pdfGoogle ScholarGoogle Scholar
  3. Claudia Aradau and Tobias Blanke. 2021. Algorithmic Surveillance and the Political Life of Error. Journal for the History of Knowledge 2, no. 1: 10--10.Google ScholarGoogle ScholarCross RefCross Ref
  4. Cecilia Aragon, Shion Guha, Marina Kogan, Michael Muller, and Gina Neff. Human-Centered Data Science: An Introduction. Cambridge, MA: MIT Press, 2022.Google ScholarGoogle Scholar
  5. Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Nef, Wanli Xing, and Joseph Bayer. 2016. Developing a Research Agenda for Human-Centered Data Science. In Conference Companion Publication of the 2016 Conference on Computer Supported Cooperative Work and Social Computing. ACM Press, San Francisco, California, USA, 529--535. https: //doi.org/10.1145/2818052.2855518Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Atul Adya, Paramvir Bahl, Jitendra Padhye, Alec Wolman, and Lidong Zhou. 2004. A multi-radio unification protocol for IEEE 802.11 wireless networks. In Proceedings of the IEEE 1st International Conference on Broadnets Networks (BroadNets'04) . IEEE, Los Alamitos, CA, 210--217. https://doi.org/10.1109/BROADNETS.2004.8Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sam Anzaroot and Andrew McCallum. 2013. UMass Citation Field Extraction Dataset. Retrieved May 27, 2019 from http://www.iesl.cs.umass.edu/data/data-umasscitationfieldGoogle ScholarGoogle Scholar
  8. Seyram Avle and Silvia Lindtner. 2016. Design(ing) 'Here' and 'There': Tech Entrepreneurs, Global Markets, and Reflexivity in Design Processes. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, New York, NY, USA, 2233--2245. https://doi-org.proxy.library.cornell.edu/10.1145/2858036.2858509Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gregory Bateson, Don D. Jackson, Jay Haley, and John Weakland. 1956. "Toward a theory of schizophrenia." Behavioral science 1, no. 4: 251--264Google ScholarGoogle ScholarCross RefCross Ref
  10. Batran. 2021. A GIS Pipeline for LIDAR Point Cloud Feature Extraction. Towards Data Science. https://towardsdatascience.com/a-gis-pipeline-for-lidar-point-cloud-feature-extraction-8cd1c686468aGoogle ScholarGoogle Scholar
  11. Andrea Ballestero. 2015. The ethics of a formula: Calculating a financial--humanitarian price for water. American Ethnologist 42, no. 2: 262--278.Google ScholarGoogle ScholarCross RefCross Ref
  12. Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). Association for Computing Machinery, New York, NY, USA, 610--623. https://doi-org.proxy.library.cornell.edu/10.1145/3442188.3445922Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ruha Benjamin. 2019. How Race and Technology ?Shape Each Other'. Emerson Today. https://today.emerson.edu/2019/10/18/ruha-benjamin-how-race-and-technology-shape-each-other/Google ScholarGoogle Scholar
  14. Mélanie Bernhardt, Daniel C. Castro, Ryutaro Tanno, Anton Schwaighofer, Kerem C. Tezcan, Miguel Monteiro, Shruthi Bannur et al. 2022. Active label cleaning for improved dataset quality under resource constraints. Nature communications 13, no. 1 (2022), 1--11.Google ScholarGoogle Scholar
  15. Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xiaohua Zhai, Aäron van den Oord. 2020. Are we done with ImageNet? In Proceedings of Advances in Neural Information Processing Systems 2020. https://doi.org/10.48550/arXiv.2006.07159Google ScholarGoogle ScholarCross RefCross Ref
  16. Dan Bouk. 2020. Error, Uncertainty, and the Shifting Ground of Census Data. Harvard Data Science Review, 2(2). https://doi-org.proxy.library.cornell.edu/10.1162/99608f92.962cb309Google ScholarGoogle Scholar
  17. Dan Bouk and danah boyd. March 18, 2021. ?Democracy's Data Infrastructure: The technopolitics of the U.S. census." Knight First Amendment Institute at Columbia University. https://knightcolumbia.org/content/democracys-data-infrastructureGoogle ScholarGoogle Scholar
  18. Solon Barocas, Andrew D Selbst, and Manish Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 80--89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Joy Buolamwini, Sorelle A Friedler, and Christo Wilson. [n.d.]. Gender shades: Intersectional accuracy disparities in commercial gender classification. http: //proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf. Accessed: 2022--1--12.Google ScholarGoogle Scholar
  20. Meredith Broussard. Forthcoming. More Than a Glitch: Confronting Race, Gender, and Ability Bias in Tech. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  21. Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, and Michael Terry. 2019. Human-centered tools for coping with imperfect algorithms during medical decision-making. Conference on Human Factors in Computing Systems - Proceedings: 1--14. https://doi.org/10.1145/3290605.3300234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alexander Campolo. 2019. Steering by Sight: Data, Visualization, and the Birth of an Informational Worldview. PhD diss., New York University, 2019.Google ScholarGoogle Scholar
  23. Stevie Chancellor. 2022. Towards Practices for Human-Centered Machine Learning. arXiv preprint arXiv:2203.00432 (2022).Google ScholarGoogle Scholar
  24. Edwin Chen. 2022. 30% of Google's Emotions Dataset is Mislabeled. Surge AI. https://www.surgehq.ai//blog/30-percent-of-googles-reddit-emotions-dataset-is-mislabeledGoogle ScholarGoogle Scholar
  25. Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. 2022. Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations. Transactions of the Association for Computational Linguistics (2022) 10: 92--110.Google ScholarGoogle Scholar
  26. Lorraine Daston. 2005. Scientific error and the ethos of belief. Social Research: 1--28.Google ScholarGoogle Scholar
  27. Lorraine Daston. Cloud Physiognomy. Representations 135(1), pp.45--71.Google ScholarGoogle Scholar
  28. Aida Mostafazadeh Davani, Mark Díaz, and Vinodkumar Prabhakaran. 2022. Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations. Transactions of the Association for Computational Linguistics, 10:92--110.Google ScholarGoogle ScholarCross RefCross Ref
  29. John Dewey. 1998 The essential Dewey: Pragmatism, education, democracy. Vol. 1. Bloomington, IN: Indiana University Press.Google ScholarGoogle Scholar
  30. John Dewey. 1986. Experience and education. In The educational forum (Vol. 50, No. 3, pp. 241--252). Taylor & Francis Group.Google ScholarGoogle Scholar
  31. John Dewey. 1938. Logic: The Theory of Inquiry. H. Holt and company, New York.Google ScholarGoogle Scholar
  32. Catherine D'Ignazio and Lauren F Klein. 2020. Data Feminism. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  33. Anca Dumitrache, Lora Aroyo, and Chris Welty. 2015. CrowdTruth Measures for Language Ambiguity: The Case of Medical Relation Extraction. In In Proc. of LD4IE Workshop, ISWC. http://ceur-ws.org/Vol-1467/LD4IE2015_Dumitrache.pdfGoogle ScholarGoogle Scholar
  34. Virginia Eubanks. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin's Press, New York.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Elena Samuylova and Emeli Dral. 2021. My data drifted. What's next?" How to handle ML model drift in production. Evidently AI. https://evidentlyai.com/blog/ml-monitoring-data-drift-how-to-handleGoogle ScholarGoogle Scholar
  36. Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (June 1981), 381--395. https://doi.org/10.1145/358669.358692Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Batya Friedman and Helen Nissenbaum. "Bias in computer systems." In Computer Ethics, pp. 215--232. Routledge, 2017.Google ScholarGoogle Scholar
  38. Mitchell L. Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S. Bernstein. 2021. The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 388, 1--14. https://doi-org/10.1145/3411764.3445423Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Daniel Greene, Anna Lauren Hoffman, and Luke Stark. Better, Nicer, Clearer, Fairer: A Critical Assessment of the Movement for Ethical Artificial Intelligence and Machine Learning. In Proceedings of the 52nd Hawaii International Conference on System Sciences, 2122--2131. https://hdl.handle.net/10125/59651Google ScholarGoogle Scholar
  40. Matthew Van Gundy, Davide Balzarotti, and Giovanni Vigna. 2007. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT '07) . USENIX Association, Berkley, CA, Article 7, 9 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. James W. Demmel, Yozo Hida, William Kahan, Xiaoye S. Li, Soni Mukherjee, and Jason Riedy. 2005. Error Bounds from Extra Precise Iterative Refinement. Technical Report No. UCB/CSD-04--1344. University of California, Berkeley.Google ScholarGoogle Scholar
  42. Theodora Dryer. Designing Certainty: The Rise of Algorithmic Computing in an Age of Anxiety 1920--1970. University of California, San Diego, 2019.Google ScholarGoogle Scholar
  43. Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, 2952--2963. http://dx.doi.org/10.1145/3025453.3025837Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Clare Garvie. 2019. Garbage in, Garbage out. Face recognition on flawed data. Georgetown Law Center on Privacy & Technology (2019)Google ScholarGoogle Scholar
  45. Ian Hacking. 1990. The Taming of Chance. Cambridge University Press.Google ScholarGoogle Scholar
  46. Lara Houston, Steven J. Jackson, Daniela K. Rosner, Syed Ishtiaque Ahmed, Meg Young, and Laewoo Kang. 2016. Values in Repair. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). Association for Computing Machinery, New York, NY, USA, 1403--1414. https://doi-org.proxy.library.cornell.edu/10.1145/2858036.2858470Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jessica Hullman, Sayash Kapoor, Priyanka Nanayakkara, Andrew Gelman, and Arvind Narayanan. 2022. The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning. arXiv preprint arXiv:2203.06498 (2022).Google ScholarGoogle Scholar
  48. Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). Association for Computing Machinery, New York, NY, USA, 560--575. https://doi-org.proxy.library.cornell.edu/10.1145/3442188.3445918Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Steven J. Jackson and Lara Houston. 2020. The Poetics and Political Economy of Repair. in Janet Wasko and Jeremy Schwartz, eds. Media: A Transdisciplinary Inquiry. Intellect Books, University of Chicago Press: Chicago.Google ScholarGoogle Scholar
  50. Steven Jackson. 2014. Rethinking Repair, in T. Gillespie, P. Boczkowski, and K. Foot, eds. Media Technologies: Essays on Communication, Materiality and Society. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  51. Abigail Z Jacobs and Hanna Wallach. 2021. Measurement and Fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event Canada). ACM, New York, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Matthew Jones, 2018. How we became instrumentalists (again) data positivism since World War II. Historical Studies in the Natural Sciences, 48(5), pp.673--684.Google ScholarGoogle ScholarCross RefCross Ref
  53. Ju Yeon Jung, Tom Steinberger, John L. King, and Mark S. Ackerman. 2022. How Domain Experts Work with Data: Situating Data Science in the Practices and Settings of Craftwork. Proc. ACM Hum.-Comput. Interact. 6, CSCW1, Article 58 (April 2022), 29 pages. https://doi-org/10.1145/3512905Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Frederike Kaltheuner, Abeba Birhane, Inioluwa Deborah Raji, Razvan Amironesei, Emily Denton, Alex Hanna, Hilary Nicole, Andrew Smart, Serena Dokuaa Oduro, James Vincent, Alexander Reben, Gemma Milne, Crofton Black, Adam Harvey, Andrew Strait, Tulsi Parida, Aparna Ashok, Fieke Jansen, Corinne Cath, and Aidan Peppin. 2021. Fake AI. Meatspace Press.Google ScholarGoogle Scholar
  55. Daniel Kang, Nikos Arechiga, Sudeep Pillai, Peter D. Bailis, and Matei Zaharia. 2022. Finding Label and Model Errors in Perception Data With Learned Observation Assertions. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 496--505. https://doi-org.proxy.library.cornell.edu/10.1145/3514221.3517907Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Nathaniel Klemp, Ray McDermott, Jason Raley, Matthew Thibeault, Kimberly Powell, and Daniel J. Levitin. 2008. Plans, takes, and mis-takes. Outlines. Critical Practice Studies, 10(1), 4--21.Google ScholarGoogle ScholarCross RefCross Ref
  57. Will Knight. March 31, 2021. The Foundations of AI are riddled with error. Wired Magazine. https://www.wired.com/story/foundations-ai-riddled-errors/#: :text=The%20labels%20attached%20to%20images,driving%20cars%20and%20medical%20algorithms.Google ScholarGoogle Scholar
  58. P. M. Krafft, Meg Young, Michael Katell, Karen Huang, and Ghislain Bugingo. 2020. Defining AI in Policy versus Practice. Association for Computing Machinery, New York, NY, USA, 72--78. https://doi.org/10.1145/3375627.3375835Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Dongyue Li and Hongyang Zhang. 2021. Improved regularization and robustness for fine-tuning in neural networks." In 35th Conference on Neural Information Processing Systems (NeurIPS 2021): 27249--27262.Google ScholarGoogle Scholar
  60. Cindy Lin. 2020. How to make a forest. E-Flux. https://www.e-flux.com/architecture/at-the-border/325757/how-to-make-a-forest/Google ScholarGoogle Scholar
  61. Cindy Lin and Silvia Margot Lindtner. 2021. Techniques of Use: Confronting Value Systems of Productivity, Progress, and Usefulness in Computing and Design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 595, 1--16. https://doi-org.proxy.library.cornell.edu/10.1145/3411764.3445237Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Adrian Mackenzie. 2017. Machine Learners: Archaeology of a data practice. Cambridge, MA: MIT Press.Google ScholarGoogle ScholarCross RefCross Ref
  63. Donald MacKenzie. 1993. Inventing accuracy: A historical sociology of nuclear missile guidance. Cambridge, MA: MIT press.Google ScholarGoogle Scholar
  64. Donald MacKenzie. 1994. Computer-related accidental death: an empirical exploration. Science and Public Policy 21, no. 4: 233--248.Google ScholarGoogle ScholarCross RefCross Ref
  65. Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela. 2021. Dynaboard : An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021). https://proceedings.neurips.cc/paper/2021/hash/55b1927fdafef39c48e5b73b5d61ea60-Abstract.htmlGoogle ScholarGoogle Scholar
  66. McWilliam, N., R. Teeuw, M. Whiteside, and P. Zukowskyj. 2005. Chapter 8: Image Interpretation and Processing GIS, GPS and remote sensing. In The Expedition Advisory Centre Royal Geographical Society 1 Kensington Gore. https://www.rgs.org/CMSPages/GetFile.aspx?nodeguid=09c5b6e1--87f5--4ba9--9976-e03c383506ff&lang=en-GBGoogle ScholarGoogle Scholar
  67. Jacob Metcalf, Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, and Madeleine Clare Elish. 2021.. Algorithmic impact assessments and accountability: The co-construction of impacts. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 735--746. 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Milagros Miceli, Julian Posada, and Tianling Yang. 2022. Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power? Proc. ACM Hum.-Comput. Interact. 6, GROUP, Article 34 (January 2022), 14 pages. https://doi-org/10.1145/3492853Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, Paper 126, 1--15. https://doi-org.proxy.library.cornell.edu/10.1145/3290605.3300356Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Michael Muller, Melanie Feinberg, Timothy George, Steven J. Jackson, Bonnie E. John, Mary Beth Kery, and Samir Passi. 2019. Human-centered study of data science work practices. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1--8. https: //doi.org/10.1145/3290607.3299018Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Michael Muller, Cecilia Aragon, Shion Guha, Marina Kogan, Gina Nef, Cathrine Seidelin, Katie Shilton, and Anissa Tanweer. 2020. Interrogating data science. In Conference Companion Publication of the 2020 Conference on Computer Supported Cooperative Work and Social Computing. ACM, Virtual Event USA, 467--473. https://doi.org/10.1145/3406865.3418584Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Michael Muller, Christine T. Wolf, Josh Andres, Michael Desmond, Narendra Nath Joshi, Zahra Ashktorab, Aabhas Sharma, Kristina Brimijoin, Qian Pan, Evelyn Duesterwald, and Casey Dugan. 2021. Designing ground truth and the social life of labels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1--16. https://doi.org/10.1145/3411764.3445402Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Microsoft Azure. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets?tabs=pythonGoogle ScholarGoogle Scholar
  74. Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf. 2021. Assembling accountability: algorithmic impact assessment for the public interest. Available at SSRN 3877437.Google ScholarGoogle Scholar
  75. Arvind Narayanan. 2019. How to recognize AI snake oil. Arthur Miller Lecture on Science and Ethics.Google ScholarGoogle Scholar
  76. Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, and Ambuj Tewari. 2013. "Learning with noisy labels. In Proceedings of Advances in Neural Information Processing Systems 26 (2013): 1--9.Google ScholarGoogle Scholar
  77. Gina Neff and Dawn Nafus. 2016. Self-tracking. MIT PressGoogle ScholarGoogle Scholar
  78. Curtis G. Northcutt, Lu Jiang, Issac L. Chuang. 2021. Confident Learning: Estimating Uncertainty in Dataset Labels. Journal of Artificial Intelligence Research 70 (2021): 1373--1411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton Earnshaw, Imran Haque, Sara M Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang. 2021. WILDS: A Benchmark of in-the-Wild Distribution Shifts. Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5637--5664.Google ScholarGoogle Scholar
  80. Marina Kogan, Aaron Halfaker, Shion Guha, Cecilia Aragon, Michael Muller, and Stuart Geiger. 2020. Mapping out human-centered data science: Methods, approaches, and best practices. In Companion of the 2020 ACM International Conference on Supporting Group Work. ACM, Sanibel Island Florida USA, 151--156. https://doi.org/10.1145/3323994.3369898Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Desmond Patton, Philipp Blandfort, William Frey, Michael Gaskell, and Svebor Karaman. 2019. Annotating social media data from vulnerable populations: Evaluating disagreement between domain experts and graduate student annotators. In Proceedings of the 52nd Hawaii International Conference on System Sciences. https://hdl.handle.net/10125/59653Google ScholarGoogle ScholarCross RefCross Ref
  82. Precarity Lab. Technoprecarious. Goldsmiths Press, 2020.Google ScholarGoogle Scholar
  83. Roberta Raileanu, Maxwell Goldstein, Denis Yarats, Ilya Kostrikov, and Rob Fergus. 2021. Automatic data augmentation for generalization in reinforcement learning. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021): 5402--5415.Google ScholarGoogle Scholar
  84. Inioluwa Deborah Raji, I. Elizabeth Kumar, Aaron Horowitz, Andrew D. Selbst. 2022. The Fallacy of AI Funcitonality. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22). Association for Computing Machinery, New York, NY, USA, 959--972. https://doi-org.proxy.library.cornell.edu/10.1145/3531146.3533158Google ScholarGoogle Scholar
  85. Inioluwa Deborah Raji and Jingying Yang. 2019. About ml: Annotation and benchmarking on understanding and transparency of machine learning lifecycles." arXiv preprint arXiv:1912.06166.Google ScholarGoogle Scholar
  86. Rashida Richardson, Jason Schultz, and Kate Crawford. 2019. Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice. (Feb. 2019).Google ScholarGoogle Scholar
  87. Daniela K. Rosner and Morgan G. Ames. 2014. ?Designing for Repair? Infrastructures and Materialities of Breakdown." Proceedings of CSCW 2014, ACM Conference on Computer-Supported Cooperative Work and Social Computing. ACM Press, February 2014, 319--331.Google ScholarGoogle Scholar
  88. Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang , Huan Zhang , Ilya Razenshteyn , Sébastien Bubeck. 2019. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada: 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Samir Passi and Solon Barocas. 2019. Problem Formulation and Fairness. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* '19). Association for Computing Machinery, New York, NY, USA, 39--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Samir Passi and Steven Jackson. 2017. Data Vision: Learning to See Through Algorithmic Abstraction. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). Association for Computing Machinery, New York, NY, USA, 2436--2447. https://doi-org.proxy.library.cornell.edu/10.1145/2998181.2998331Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Nithya Sambasivan and Rajesh Veeraraghavan. 2022. The Deskilling of Domain Expertise in AI Development. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 587, 1--14. https://doi-org/10.1145/3491102.3517578Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Morgan Klaus Scheuerman, Alex Hanna, and Emily Denton. 2021. Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development. Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 317 (October 2021), 37 pages. https://doi-org.proxy.library.cornell.edu/10.1145/3476058Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Nick Seaver. 2021. Care and scale: decorrelative ethics in algorithmic recommendation. Cultural Anthropology 36, no. 3: 509--537.Google ScholarGoogle ScholarCross RefCross Ref
  94. Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, D. Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. In Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. arXiv:1711.08536v1Google ScholarGoogle Scholar
  95. Chirag Shah, Theresa Anderson, Loni Hagen, and Yin Zhang. 2021. An iSchool approach to data science: Human-centered, socially responsible, and context-driven. Journal of the Association for Information Science and Technology 72, 6 (2021), 793--796. https://doi.org/10.1002/asi.24444 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.24444.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Manu Siddharta. 2019. Regularization Techniques in Deep Learning. https://www.kaggle.com/code/sid321axn/regularization-techniques-in-deep-learning/notebookGoogle ScholarGoogle Scholar
  97. Rebecca Slayton. 2013. Arguments that Count: Physics, Computing, and Missile Defense, 1949--2012. Cambridge, MA: MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Luke Stark and Jevan Hutson. 2022. Physiognomic Artificial Intelligence. forthcoming in Fordham Intellectual Property, Media & Entertainment Law Journal XXXII (2022). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3927300Google ScholarGoogle Scholar
  99. Anissa Tanweer, Cecilia R Aragon, Michael Muller, Shion Guha, Samir Passi, Gina Neff, and Marina Kogan. 2022. Interrogating Human-centered Data Science: Taking Stock of Opportunities and Limitations. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA '22). Association for Computing Machinery, New York, NY, USA, Article 99, 1--6. https://doi-org.proxy.library.cornell.edu/10.1145/3491101.3503740Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Angelique Taylor, Hee Rin Lee, Alyssa Kubota, and Laurel D. Riek. 2019. Coordinating Clinical Teams: Using Robots to Empower Nurses to Stop the Line. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 221 (November 2019), 30 pages. https://doi.org/10.1145/3359323Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. The Engine Room. 2022. AT THE CONFLUENCE OF DIGITAL RIGHTS & CLIMATE JUSTICE. https://www.theengineroom.org/new-report-at-the-confluence-of-digital-rights-climate-justice/Google ScholarGoogle Scholar
  102. Anna L. Tsing. (2012). On NonscalabilityThe Living World Is Not Amenable to Precision-Nested Scales. Common knowledge, 18(3), 505--524.Google ScholarGoogle Scholar
  103. Pablo R. Velasco. 2019. Artificial Intelligibility and Proxy Error. spheres: Journal for Digital Cultures 5: 1--6.Google ScholarGoogle Scholar
  104. Richmond Y. Wong, Michael A. Madaio, and Nick Merrill. Seeing Like a Toolkit: How Toolkits Envision the Work of AI Ethics. arXiv preprint arXiv:2202.08792 (2022).Google ScholarGoogle Scholar
  105. Meg Young, Michael Katell, and P.M. Krafft. 2022. Confronting Power and Corporate Capture at the FAccT Conference. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22). Association for Computing Machinery, New York, NY, USA, 1375--1386. https://doi-org.proxy.library.cornell.edu/10.1145/3531146.3533194Google ScholarGoogle Scholar
  106. Songzhu Zheng, Pengxiang Wu, Aman Goswami, Mayank Goswami, Dimitris Metaxas, Chao Chen. 2020. Error-Bounded Correction of Noisy Labels. In International Conference on Machine Learning, pp. 11447--11457. PMLR, 2020.Google ScholarGoogle Scholar
  107. Le Zhang, Ryutaro Tanno, Mou-Cheng Xu, Chen Jin, Joseph Jacob, Olga Ciccarelli, Frederik Barkhof, and Daniel C. Alexander. 2020. Disentangling Human Error from the Ground Truth in Segmentation of Medical Images. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. https://proceedings.neurips.cc/paper/2020/file/b5d17ed2b502da15aa727af0d51508d6-Paper.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. From Bias to Repair: Error as a Site of Collaboration and Negotiation in Applied Data Science Work

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 7, Issue CSCW1
      CSCW
      April 2023
      3836 pages
      EISSN:2573-0142
      DOI:10.1145/3593053
      Issue’s Table of Contents

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 April 2023
      Published in pacmhci Volume 7, Issue CSCW1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader