ABSTRACT
In this paper, we analyze the relation between data-related biases and practices of data annotation, by placing them in the context of market economy. We understand annotation as a praxis related to the sensemaking of data and investigate annotation practices for vision models by focusing on the values that are prioritized by industrial decision-makers and practitioners. The quality of data is critical for machine learning models as it holds the power to (mis-)represent the population it is intended to analyze. For autonomous systems to be able to make sense of the world, humans first need to make sense of the data these systems will be trained on. This paper addresses this issue, guided by the following research questions: Which goals are prioritized by decision-makers at the data annotation stage? How do these priorities correlate with data-related bias issues? Focusing on work practices and their context, our research goal aims at understanding the logics driving companies and their impact on the performed annotations. The study follows a qualitative design and is based on 24 interviews with relevant actors and extensive participatory observations, including several weeks of fieldwork at two companies dedicated to data annotation for vision models in Buenos Aires, Argentina and Sofia, Bulgaria. The prevalence of market-oriented values over socially responsible approaches is argued based on three corporate priorities that inform work practices in this field and directly shape the annotations performed: profit (short deadlines connected to the strive for profit are prioritized over alternative approaches that could prevent biased outcomes), standardization (the strive for standardized and, in many cases, reductive or biased annotations to make data fit the products and revenue plans of clients), and opacity (related to client's power to impose their criteria on the annotations that are performed. Criteria that most of the times remain opaque due to corporate confidentiality). Finally, we introduce three elements, aiming at developing ethics-oriented practices of data annotation, that could help prevent biased outcomes: transparency (regarding the documentation of data transformations, including information on responsibilities and criteria for decision-making.), education (training on the potential harms caused by AI and its ethical implications, that could help data annotators and related roles adopt a more critical approach towards the interpretation and labeling of data), and regulations (clear guidelines for ethical AI developed at the governmental level and applied both in private and public organizations).
Index Terms
- Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-oriented Data Annotation Practices
Recommendations
Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision
CSCWThe interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data ...
Pre-annotating Clinical Notes and Clinical Trial Announcements for Gold Standard Corpus Development: Evaluating the Impact on Annotation Speed and Potential Bias
HISB '12: Proceedings of the 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems BiologyIn this study our aim was to present a series of experiments to evaluate the impact of pre-annotation: (1) on the speed of manual annotation of clinical notes and clinical trial announcements; and (2) test for potential bias if pre-annotation is ...
Turn to the Self in Human-Computer Interaction: Care of the Self in Negotiating the Human-Technology Relationship
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsEveryday life is increasingly mediated by technology. Technology is rapidly growing capacity and complexity, especially evident in developments in artificial intelligence and big data analytics. As human-computer interaction (HCI) endeavors to examine ...
Comments