Dirty data labeled dirt cheap: epistemic injustice in machine learning systems

Hull, Gordon

doi:10.1007/s10676-023-09712-y

Dirty data labeled dirt cheap: epistemic injustice in machine learning systems

Original Paper
Published: 07 July 2023

Volume 25, article number 38, (2023)
Cite this article

Ethics and Information Technology Aims and scope Submit manuscript

Gordon Hull ORCID: orcid.org/0000-0001-5406-6310¹

715 Accesses
4 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

Artificial intelligence (AI) and machine learning (ML) systems increasingly purport to deliver knowledge about people and the world. Unfortunately, they also seem to frequently present results that repeat or magnify biased treatment of racial and other vulnerable minorities. This paper proposes that at least some of the problems with AI’s treatment of minorities can be captured by the concept of epistemic injustice. To substantiate this claim, I argue that (1) pretrial detention and physiognomic AI systems commit testimonial injustice because their target variables reflect inaccurate and unjust proxies for what they claim to measure; (2) classification systems, such as facial recognition, commit hermeneutic injustice because their classification taxonomies, almost no matter how they are derived, reflect and perpetuate racial and other stereotypes; and (3) epistemic injustice better explains what is going wrong in these types of situations than does the more common focus on procedural (un)fairness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards a holistic view of bias in machine learning: bridging algorithmic fairness and imbalanced learning

Article Open access 04 April 2024

AI’s fairness problem: understanding wrongful discrimination in the context of automated decision-making

Article Open access 16 November 2022

Artificial Intelligence and Discrimination: Discriminating Against Discriminatory Systems

Notes

John Symons and Ramón Alvarado (2022), for example, identify injustice in the opacity of ML systems. Giorgia Pozzi (2023a, 2023b) looks at epistemic injustice in the context of automated opioid misuse prediction systems in healthcare. Epistemic injustice has also been identified in social media algorithms (Stewart, Cichocki, and McLeod, 2022) and natural-language processing systems (De Proost and Pozzi, 2023; Laacke, 2023).
For some prominent examples, see, e.g., (Barocas & Selbst, 2016; Benjamin, 2019; Buolamwini & Gebru, 2018; Eubanks, 2017; Friedman & Nissenbaum, 1996; Noble, 2018). I do not claim that this is a comprehensive list.
For facial recognition, see (Buolamwini & Gebru, 2018); on policing, see the discussion and references below.
On this concept, see, e.g., (Buchman, Ho, & Goldberg, 2017; Carel & Kidd, 2017; Collins, 2017; Fricker, 2007; Kidd, Medina, & Pohlhaus, 2017; Medina, 2018; Wardrope, 2015). I draw primarily from Fricker and Medina, though clearly other treatments are both possible and needed.
Similar obstacles have faced Black defendants improperly identified by facial recognition systems. See, e.g., Alicia Solow-Niederman’s account of the month Robert Williams spent in detention after being misidentified by facial recognition (2023, pp. 131–133). Solow-Niederman takes Williams’ case as exemplary of the “grey holes” that algorithmic systems can introduce: people are nominally protected from those systems, but their protections are practically unusable.
The example is analogous to Fabian Beigang’s example of a disease-modeling algorithm that correctly identifies 95% of people who have a given disease, and does so robustly across gender. However, because disease prevalence is much higher among men, the positive predictive value of the algorithm—“the probability of actually having the disease given that one receives a positive test result”—is much higher for men than women, who will much more frequently be erroneously told they have the disease. Beigang notes that “this is not due to bias in the testing device, but just to the prevalence of the disease, which differs across genders” (2023, p. 175).
For detailed summaries and relevant citations, see, e.g., (Green, 2021; Mayson, 2019).
For the impossibility theorem, see especially (Beigang, 2023).
Eva proposes base-rate tracking.
These epistemic problems also present a limitation to base-rate tracking: it only works if the base rate of the phenomenon in question is knowable. This is a particular problem for things like crime rates.
Cf. (Pozzi, 2023a), outlining the stigmatizing effects of poor proxy variable selection in opioid misuse risk assessment systems.
The New Inquiry’s “Heatmap of White Collar Crime” makes the point vividly. See: https://whitecollar.thenewinquiry.com/
The problem is not just pretrial detention. For example, the PredPol system seems to disproportionally target Black and Latino neighborhoods (Sankin, Mehrotra, Mattu, & Gilbertson, 2021). On predictive policing see also (Selbst, 2017).
For a thorough dismantling of this paper that connects it to physiognomic systems, see (Agüera y Arcas, Mitchell, & Todorov, 2017). In an apologia response piece, the authors claim that their only intent is academic research, and that they are shocked that it was taken otherwise, though they admit that “taking a court conviction at its face value, i.e., as the ‘ground truth’ for machine learning, was indeed a serious oversight on our part.” They also emphatically insist that critics commit a base rate fallacy: China has a low crime rate, so someone flagged by the algorithm “is found to have a probability of only 4.39% to break the law, despite being tested positive by a method of unbelievably high accuracy” (Wu and Zhang, 2017, pp. 1, 2). It seems to me that this underscores the epistemic injustice of taking a 4% chance that someone will commit a crime over evidence that they might provide through testimony of virtually any sort. Of course, someone who tests positive is predicted to be much more likely to commit a crime than someone who tests negative, and this jump in relative risk is all that a carceral system needs to claim that it ought to expand surveillance, harassment etc. for a positive case.
See also Ruha Benjamin’s (2019) discussion of the differential hypervisibility of Black celebrities versus communities. For the hyper-surveillance of the homeless, day laborers, undocumented migrants and those with felony convictions see (Gilman and Green, 2018).
This could “be due to a biological difference such as a difference in facial brightness. It could also be due to a group preference for makeup, or perhaps related to how the photograph is taken. Some people might use a mobile phone to take photographs for dating profiles and others might have them taken in a professionally lit photographic studio. The types of post-processing applied to photographs might vary between groups. It is also possible that photographs from different types of mobile phones or those that are uploaded to different dating websites are processed with different image compression algorithms and that there are artifacts resulting from these methods that are easily detectable by ML models” (Leuner, 2019, p. 51).
For this point in the context of pretrial detention programs, see (Mayson, 2019). The scalability of facial recognition systems is part of what is behind calls for their abolition, as for example in (Selinger & Hartzog, 2019).
(Citron, 2008, pp. 1271–1272). For some recent work, see (Araujo, Helberger, Kruikemeier, & de Vreese, 2020). The interaction between automated systems and human decisionmakers is complex and an area of ongoing research. See (Gerdon, Bach, Kern, and Kreuter, 2022, pp. 7–9).
Interestingly for the argument being developed here, those who felt that their harassment didn’t readily fit HeartMob’s classificatory system tended to report feeling unsupported (Blackwell et al., 2017).
For example, facial recognition systems tend to under-recognize dark-skinned women because they rely on training data that overrepresents white men (Buolamwini & Gebru, 2018). Natural language processing datasets underrepresent non-Western languages (Bender, Gebru, McMillan-Major, & Mitchell, 2021). Datasets of household objects perform poorly on objects from low and middle income countries (LMICs) because they rely on Flickr and English (DeVries, Misra, Wang, & Maaten, 2019). The English Colossal Clean Crawled Corpus contains surprising amounts of military text (from.mil domains), patents, and machine-generated translations, especially of non-English patents. The implications for that are unclear, but it should be apparent that most people to not speak in the idiom of patent applications. Initial efforts to curate datasets can introduce further such problems. For example, the cleaned version of the English Colossal Clean Crawled Corpus disproportionately blocks out mentions of sexual orientation, as well as texts that appear to be African-American English or Latinx English (Birhane et al., 2021b).
Other analyses show that this is typical (Scheuerman, Paul, & Brubaker, 2019; Scheuerman et al., 2020). For a comprehensive study of the ways that gender binarism is built into governmental database systems, see (Waldman, 2023).
For example, the computer vision datasets analyzed by Scheuerman, Denton and Hanna (2021) expressed concerns about scale, and comprehensiveness in order to achieve higher accuracy. (Bender et al., 2021) note a similar trajectory for NLP bases, as does (Birhane et al., 2021b) for multimodal datasets.
This state of affairs does not seem to trouble most of those who work on the datasets. As Scheuerman, Denton and Hanna summarize their comprehensive look at the documentation of several datasets, “valuing efficiency was at the cost of care, valuing slow and thoughtful decision-making and data processes, considering more ethical ways to collect data and treat annotators, and seeking fairer compensation—or even reporting compensation—for data labor” and “in general, there was little to no discussion about ethics when conducting work with annotators or with human subjects as data instances” (Scheuerman et al., 2021, p. 25).
This is particularly troubling given the prevalence of nonconsensual pornography online; not only are victims harmed when that material is disseminated, they are then forced to further their own sexualization by serving as data for the classification of people who look like them. One recent study shows that 1 in 12 people, mostly women, have been victims of nonconsensual pornography at least once (Ruvalcaba & Eaton, 2020).
For Mars Clickworkers, see (Benkler, 2006). On the failures of scaling, see (Birhane et al., 2021b) and the references in those papers.
For similar sentiments, see, e.g., (Birhane, 2021; Green, 2020; Green & Viljoen, 2020; Kalluri, 2020; Keyes, Hutson, & Durbin, 2019; Lin & Cameron Chen, 2022). See also the literature review in (Weinberg, 2022).
For a thorough discussion of the ways algorithmic systems embed values, and how those generate various forms of bias, see (Fazelpour & Danks, 2021). Beigang (2023) argues that the contradiction between predictive parity and equalized odds fairness can be resolved by accounting for different prevalence rates in the predicted populations. For example, to know if an algorithmic system discriminated against women in disease prediction, it would be necessary to know the prevalence of the disease in male and female subpopulations. This strategy essentially writes in the importance of context for understanding algorithmic fairness, though it does not resolve problems with knowing base rates, whether the target variable is a good proxy for the underlying social phenomenon, and whether reliance on (for example) carceral data is justified.
Poorly defined and controversial terms also risk sliding into the space of essentially-contested concepts (Mulligan, Koopman, & Doty, 2016; Mulligan, Kroll, Kohli, & Wong, 2019).
For a general critique of ideal theory, see (Mills, 2005).
For some initial work from within the computing community on the possible social roles of computing, see (Abebe et al., 2020). See also Barabas (2022) on the importance of developing capacities to refuse datafication.

References

Abebe, R., Barocas, S., Kleinberg, J., Levy, K., Raghavan, M., & Robinson, D. G. (2020). Roles for computing in social change. Paper presented at the Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain https://doi.org/10.1145/3351095.3372871
Acquisti, A. (2009). Nudging privacy: The behavioral economics of personal information. Security & Privacy, IEEE, 7(6), 82–85. https://doi.org/10.1109/MSP.2009.163
Article Google Scholar
Acquisti, A., Brandimarte, L., & Loewenstein, G. (2015). Privacy and human behavior in the age of information. Science, 347(6221), 509–514. https://doi.org/10.1126/science.aaa1465
Article Google Scholar
Agüera y Arcas, B., Mitchell, M., & Todorov, A. (2017). Physiognomy’s New Clothes. Retrieved from https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a
Agüera y Arcas, B., Todorov, A., & Mitchell, M. (2018). Do algorithms reveal sexual orientation or just expose our stereotypes? Retrieved from https://medium.com/@blaisea/do-algorithms-reveal-sexual-orientation-or-just-expose-our-stereotypes-d998fafdf477
Albright, A. (2019). If You Give a Judge a Risk Score: Evidence from Kentucky Bail Decisions. https://thelittledataset.com/about_files/albright_judge_score.pdf
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine Bias. ProPublica. Retrieved from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Araujo, T., Helberger, N., Kruikemeier, S., & de Vreese, C. H. (2020). In AI we trust? Perceptions about automated decision-making by artificial intelligence. AI & Society, 35(3), 611–623. https://doi.org/10.1007/s00146-019-00931-w
Article Google Scholar
Austin, L. M. (2014). Enough about Me: Why Privacy is about Power, not Consent (or Harm). In A. Sarat (Ed.), A World without Privacy: What Law Can and Should Do (pp. 131–189). Cambridge University Press.
Chapter Google Scholar
Barabas, C., Doyle, C., Rubinovitz, J., & Dinakar, K. (2020). Studying up: reorienting the study of algorithmic fairness around issues of power. Proceedings of the 2020 Conference on Fairness Accountability and Transparency https://doi.org/10.1145/3351095.3372859
Barabas, C. (2022). Refusal in data ethics: Re-imagining the code beneath the code of computation in the carceral state. Engaging Science, Technology, and Society, 8(2), 57–76.
Article Google Scholar
Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104, 671–732.
Google Scholar
Beigang, F. (2023). Reconciling algorithmic fairness criteria. Philosophy & Public Affairs, 51(2), 166–190. https://doi.org/10.1111/papa.12233
Article Google Scholar
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). https://doi.org/10.1145/3442188.3445922
Benjamin, R. (2019). Race after technology: Abolitionist tools for the new jim code. Wiley.
Google Scholar
Benkler, Y. (2006). The wealth of networks: How social production transforms markets and freedom. New Haven Yale University Press.
Google Scholar
Birhane, A., Prabhu, V. U., & Kahembwe, E. (2021b). Multimodal datasets: misogny, pornography, and malignant stereotypes. In Computers and Society. arXiv preprint. https://arxiv.org/abs/2110.01963
Birhane, A., Kalluri, P., Card, D., Agnew, W., Dotan, R., & Bao, M. (2021a). The Values Encoded in Machine Learning Research. In Machine Learning. arXiv preprint. https://arxiv.org/abs/2106.15590
Birhane, A. (2021). Algorithmic injustice: A relational ethics approach. Patterns, 2(2), 100205. https://doi.org/10.1016/j.patter.2021.100205
Article Google Scholar
Blackwell, L., Dimond, J., Schoenebeck, S., & Lampe, C. (2017). Classification and its consequences for online harassment: Design insights from heartmob. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3134659
Article Google Scholar
Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (Technology) is Power: A Critical Survey of “Bias” in NLP. Paper presented at the 58th Annual Meeting of the Association for Computational Linguistics, Online.
Browne, S. (2015). Dark matters: On the surveillance of blackness. Duke University Press.
Book Google Scholar
Buchman, D. Z., Ho, A., & Goldberg, D. S. (2017). Investigating trust, expertise, and epistemic injustice in chronic pain. Journal of Bioethical Inquiry, 14(1), 31–42. https://doi.org/10.1007/s11673-016-9761-x
Article Google Scholar
Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Paper presented at the Proceedings of the 1st Conference on Fairness, Accountability and Transparency, New York. http://proceedings.mlr.press
Carel, H., & Kidd, I. J. (2017). Epistemic Injustice in Medicine and Healthcare. In I. J. Kidd, J. Medina, & G. Pohlhaus (Eds.), The routledge handbook of epistemic injustice (pp. 336–346). Routledge.
Chapter Google Scholar
Citron, D. K. (2008). Technological due process. Washington University Law Review, 85, 1249–1313.
Google Scholar
Citron, D. K. (2014). Hate crimes in cyberspace. Harvard University Press.
Book Google Scholar
Citron, D. K., & Pasquale, F. (2014). The scored society: Due process for automated predictions. Washington University Law Review, 89, 1–33.
Google Scholar
Collins, P. H. (2017). Intersectionality and Epistemic Injustice. In I. J. Kidd, J. Medina, & G. Pohlhaus (Eds.), The Routledge handbook of epistemic injustice (pp. 115–124). Routledge.
Chapter Google Scholar
Crawford, K., & Paglen, T. (2019, Sept. 19). Excavating AI: The Politics of Images in Machine Learning Training Sets Retrieved from https://excavating.ai
Crawford, K. (2021). Atlas of AI. Yale University Press.
Book Google Scholar
De Proost, M., & Pozzi, G. (2023). Conversational artificial intelligence and the potential for epistemic injustice. The American Journal of Bioethics, 23(5), 51–53. https://doi.org/10.1080/15265161.2023.2191020
Article Google Scholar
Denton, E., Hanna, A., Amironesei, R., Smart, A., & Nicole, H. (2021). On the genealogy of machine learning datasets: A critical history of ImageNet. Big Data & Society, 8(2), 20539517211035956. https://doi.org/10.1177/20539517211035955
Article Google Scholar
DeVries, T., Misra, I., Wang, C., & Maaten, L. v. d. (2019). Does object recognition work for everyone? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. arXiv Preprint. https://arxiv.org/abs/1902.10739
Dwork, C., & Mulligan, D. K. (2013). It’s not Privacy, and It’s not Fair. Stanford Law Review Online, 66, 35–40.
Google Scholar
Eubanks, V. (2017). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press.
Google Scholar
Eva, B. (2022). Algorithmic fairness and base rate tracking. Philosophy & Public Affairs, 50(2), 239–266. https://doi.org/10.1111/papa.12211
Article Google Scholar
Fazelpour, S., & Danks, D. (2021). Algorithmic bias: Senses, sources, solutions. Philosophy Compass, 16(8), e12760. https://doi.org/10.1111/phc3.12760
Article Google Scholar
Fleisher, W. (2021). What's Fair about Individual Fairness? In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society: Association for Computing Machinery (pp. 480–490)
Foucault, M. (1970 [1966]). The Order of Things. New York: Random House.
Fricker, M. (2007). Epistemic Injustice: Power and the Ethics of Knowing. Oxford University Press.
Book Google Scholar
Friedman, B., & Nissenbaum, H. (1996). Bias in computer systems. ACM Trans. Inf. Syst., 14(3), 330–347. https://doi.org/10.1145/230538.230561
Article Google Scholar
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., III., Daume, H., & Crawford, K. (2020). Datasheets for datasets. Communications of the ACM, 64(12), 86–92.
Article Google Scholar
Gerdon, F., Bach, R. L., Kern, C., & Kreuter, F. (2022). Social impacts of algorithmic decision-making: A research agenda for the social sciences. Big Data & Society, 9(1), 20539517221089304. https://doi.org/10.1177/20539517221089305
Article Google Scholar
Gilman, M., & Green, R. (2018). The surveillance gap: The harms of extreme privacy and data marginalization. N.Y.U. Review of Law and Social Change, 42, 253–307.
Google Scholar
Green, B., & Viljoen, S. (2020). Algorithmic realism: expanding the boundaries of algorithmic thought. Paper presented at the Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain. https://doi.org/10.1145/3351095.3372840
Green, B. (2020). Data science as political action: Gounding data science in a politics of justice. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3658431
Article Google Scholar
Green, B. (2022). Escaping the impossibility of fairness: From formal to substantive algorithmic fairness. Philosophy & Technology, 35, 1–32. https://doi.org/10.1007/s13347-022-00584-6
Article Google Scholar
Greene, D., Hoffmann, A. L., & Stark, L. (2019). Better, Nicer, Clearer, Fairer: A Critical Assessment of the Movement for Ethical Artificial Intelligence and Machine Learning. Paper presented at the Proceedings of the 52nd Hawaii International Conference on System Sciences, Hawaii, USA.
Hanna, A., Denton, E., Smart, A., & Smith-Loud, J. (2020). Towards a critical race methodology in algorithmic fairness. Paper presented at the Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain.
Hedden, B. (2021). On statistical criteria of algorithmic fairness. Philosophy & Public Affairs, 49(2), 209–231. https://doi.org/10.1111/papa.12189
Article Google Scholar
Hoffmann, A. L. (2019). Where fairness fails: Data, algorithms, and the limits of antidiscrimination discourse. Information, Communication & Society, 22(7), 900–915. https://doi.org/10.1080/1369118X.2019.1573912
Article Google Scholar
Hu, M. (2015). Big data blacklisting. Florida Law Review, 67, 1735–1811.
Google Scholar
Hu, M. (2017). Algorithmic Jim Crow. Fordham Law Review, 86(2), 633–696.
Google Scholar
Hull, G. (2015). Successful failure: What foucault can teach us about privacy self-management in a world of Facebook and big data. Ethics and Information Technology, 17(2), 89–101. https://doi.org/10.1007/s10676-015-9363-z
Article MathSciNet Google Scholar
Hull, G. (2021). The death of the data subject. Law Culture and the Humanities. https://doi.org/10.1177/17438721211049376
Article Google Scholar
Hull, G. (2022). Infrastructure, modulation, portal: Thinking with foucault about how internet architecture shapes subjects. Techné Research in Philosophy and Technology, 26(1), 84–114. https://doi.org/10.5840/techne2022425155
Article MathSciNet Google Scholar
Kalluri, P. (2020). Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature. https://doi.org/10.1038/d41586-020-02003-2
Article Google Scholar
Katz, Y. (2020). Artificial whiteness: Politics and ideology in artificial intelligence. Columbia Unveristy Press.
Book Google Scholar
Keyes, O., Hutson, J., & Durbin, M. (2019). A Mulching Proposal: Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry. Paper presented at the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, Scotland Uk. https://doi.org/10.1145/3290607.3310433
Keyes, O., & Creel, K. (2022). Artificial knowing otherwise. Feminist Philosophy Quarterly, 8(3/4), 1–26.
Google Scholar
Kidd, I. J., Medina, J., & Pohlhaus, G. (2017). The routledge handbook of epistemic injustice. Routledge.
Book Google Scholar
Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15), 5802–5805. https://doi.org/10.1073/pnas.1218772110
Article Google Scholar
Kroll, J. A. (2018). The fallacy of inscrutability. Philosophical Transactions of the Royal Society a: Mathematical, Physical and Engineering Sciences, 376(2133), 1–14. https://doi.org/10.1098/rsta.2018.0084
Article Google Scholar
Laacke, S. (2023). Bias and epistemic injustice in conversational AI. The American Journal of Bioethics, 23(5), 46–48. https://doi.org/10.1080/15265161.2023.2191055
Article Google Scholar
Le Bui, M., & Noble, S. U. (2020). We’re missing a moral framework of justice in artificial intelligence: On the limits, failings, and ethics of fairness. In M. D. Dubber, F. Pasquale, & S. Das (Eds.), The oxford handbook of ethics of AI. Oxford University Press.
Google Scholar
Lerman, J. (2013). Big data and its exclusions. Stanford Law Review Online, 66, 55–63.
Google Scholar
Leuner, J. (2019). A replication study: machine learning models are capable of predicting sexual orientation from facial images. Computer Vision and Pattern Recognition. https://doi.org/10.48550/arXiv.1902.10739
Article Google Scholar
Lin, T.-A., & Cameron Chen, P.-H. (2022). Artificial intelligence in a strucurally unjust society. Feminist Philosophy Quarterly, 8(3/4), 1–32.
Google Scholar
Malevé, N. (2019). An Introduction to Image Datasets. Retrieved from https://unthinking.photography/articles/an-introduction-to-image-datasets
Mason, R. (2021). Hermeneutical Injustice. In J. Khoo & R. K. Sterken (Eds.), The routledge handbook of social and political philosophy of language (pp. 247–258). Routledge.
Chapter Google Scholar
Matz, S. C., Kosinski, M., Nave, G., & Stillwell, D. J. (2017). Psychological targeting as an effective approach to digital mass persuasion. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1710966114
Article Google Scholar
Mayson, S. G. (2019). Bias in, bias out. Yale Law Journal, 128, 2218–2300.
Google Scholar
Medina, J. (2018). Misrecognition and epistemic injustice. Feminist Philosophy Quarterly. https://doi.org/10.5206/fpq/2018.4.6233
Article Google Scholar
Mills, C. W. (2005). “Ideal theory” as ideology. Hypatia, 20(3), 165–183. https://doi.org/10.1111/j.1527-2001.2005.tb00493.x
Article Google Scholar
Mills, C. W. (2017). Ideology. In I. J. Kidd, J. Medina, & G. Pohlhaus (Eds.), The routledge handbook of epistemic injustice (pp. 100–112). Routledge.
Chapter Google Scholar
Mulligan, D. K., Koopman, C., & Doty, N. (2016). Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy. Philosophical Transactions of the Royal Society A, 374(20160118), 1–17.
Google Scholar
Mulligan, D. K., Kroll, J. A., Kohli, N., & Wong, R. Y. (2019). This thing called fairness: Disciplinary confusion realizing a value in technology. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3359221
Article Google Scholar
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.
Book Google Scholar
Okidegbe, N. (2022). Discredited Data. Cornell Law Review, 107 (forthcoming). doi:(on file with author)
Pohlhaus, G., Jr. (2017). Varieties of Epistemic Injustice. In I. J. Kidd, J. Medina, & G. Pohlhaus (Eds.), The routledge handbook of epistemic injustice (pp. 13–26). Routledge.
Chapter Google Scholar
Pozzi, G. (2023a). Automated opioid risk scores: A case for machine learning-induced epistemic injustice in healthcare. Ethics and Information Technology, 25(1), 3. https://doi.org/10.1007/s10676-023-09676-z
Article Google Scholar
Pozzi, G. (2023). Testimonial injustice in medical machine learning. J Med Ethics. https://doi.org/10.1136/jme-2022-108630
Article Google Scholar
Ranchordás, S. (2022). Empathy in the digital administrative state. Duke Law Journal, 71, 1341–1389.
Google Scholar
Rhue, L. (2018). Racial influence on automated perceptions of emotions. SSRN eLibrary. https://doi.org/10.2139/ssrn.3281765
Article Google Scholar
Ruvalcaba, Y., & Eaton, A. A. (2020). Nonconsensual pornography among U.S. adults: A sexual scripts framework on victimization, perpetration, and health correlates for women and men. Psychology of Violence, 10(1), 68–78. https://doi.org/10.1037/vio0000233
Article Google Scholar
Sankin, A., Mehrotra, D., Mattu, S., & Gilbertson, A. (2021). Crime Prediction Software Promised to Be Free of Biases. New Data Shows It Perpetuates Them. The Markup. Retrieved from https://themarkup.org/prediction-bias/2021/12/02/crime-prediction-software-promised-to-be-free-of-biases-new-data-shows-it-perpetuates-them
Scheuerman, M. K., Hanna, A., & Denton, E. (2021). Do datasets have politics? Disciplinary values in computer vision dataset development. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3476058
Article Google Scholar
Scheuerman, M. K., Paul, J. M., & Brubaker, J. R. (2019). How computers see gender: An evaluation of gender classification in commercial facial analysis services. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3359246
Article Google Scholar
Scheuerman, M. K., Wade, K., Lustig, C., & Brubaker, J. R. (2020). How we’ve taught algorithms to see identity: Constructing race and gender in image databases for facial analysis. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3392866
Article Google Scholar
Selbst, A. D., boyd, d., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and Abstraction in Sociotechnical Systems. Paper presented at the Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA
Selbst, A. D. (2017). Disparate impact in big data policing. Georgia Law Review, 52, 109–195.
Google Scholar
Selinger, E., & Hartzog, W. (2019). The inconsentability of facial surveilance. Loyola Law Review, 66, 101–122.
Google Scholar
Skinner-Thompson, S. (2021). Privacy at the Margins. Cambridge University Press.
Google Scholar
Solove, D. J. (2013). Privacy self-management and the consent dilemma. Harvard Law Review, 126, 1880–1903.
Google Scholar
Solow-Niederman, A. (2023). Algorithmic grey holes. Journal of Law and Innovation, 5(1), 116–139.
Google Scholar
Stark, L., & Hutson, J. (2022). Physiognomic artificial intelligence. Fordham Intellectual Property, Media & Entertainment Law Journal, 32(4), 922–978.
Google Scholar
State v. Loomis, 371 Wis. 2d 235 (Sup. Ct. Wisc. 2016).
Stevens, N., & Keyes, O. (2021). Seeing infrastructure: Race, facial recognition and the politics of data. Cultural Studies, 35(4–5), 833–853. https://doi.org/10.1080/09502386.2021.1895252
Article Google Scholar
Stewart, H., Cichocki, E., & McLeod, C. (2022). A perfect storm for epistemic injustice: algorithmic targeting and sorting on social media. Feminist Philosophy Quarterly, 8(3/4), 1–29.
Google Scholar
Symons, J., & Alvarado, R. (2022). Epistemic injustice and data science technologies. Synthese, 200(2), 87. https://doi.org/10.1007/s11229-022-03631-z
Article Google Scholar
Tucker, E. (2022). Deliberate disorder: How policing algorithms make thinking about policing harder. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4047082
Article Google Scholar
Waldman, A. E. (2021). Industry unbound: The inside story of privacy, data, and corporate power. Cambridge University Press.
Book Google Scholar
Waldman, A. E. (2022). Disorderly content. Washington Law Review, 97(4), 907–976.
Google Scholar
Waldman, A. E. (2023). Gender data in the automated administrative state. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4358437
Article Google Scholar
Wang, Y., & Kosinski, M. (2018). Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. Journal of Personality and Social Psychology, 114(2), 246–257. https://doi.org/10.1037/pspa0000098
Article Google Scholar
Wardrope, A. (2015). Medicalization and epistemic injustice. Medicine, Health Care and Philosophy, 18(3), 341–352. https://doi.org/10.1007/s11019-014-9608-3
Article Google Scholar
Weinberg, L. (2022). Rethinking fairness: An interdisciplinary survey of critiques of hegemonic ML fairness approaches. Journal of Artificial Intelligence Research, 74, 75–109. https://doi.org/10.1613/jair.1.13196
Article MathSciNet MATH Google Scholar
Wright, J. (2021). Suspect AI: Vibraimage emotion recognition technology and algorithmic opacity. Science Technology and Society. https://doi.org/10.1177/09717218211003411
Article Google Scholar
Wu, X., & Zhang, X. (2016). Automated inference on criminality using face images. Computer Vision and Pattern Recognition. https://doi.org/10.48550/arXiv.1611.04135
Article Google Scholar
Wu, X., & Zhang, X. (2017). Responses to critiques on machine learning of criminality perceptions. High Energy Physics. https://doi.org/10.48550/arXiv.1611.04135
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Philosophy, UNC Charlotte, Charlotte, NC, 28223, USA
Gordon Hull

Authors

Gordon Hull
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gordon Hull.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hull, G. Dirty data labeled dirt cheap: epistemic injustice in machine learning systems. Ethics Inf Technol 25, 38 (2023). https://doi.org/10.1007/s10676-023-09712-y

Download citation

Accepted: 23 June 2023
Published: 07 July 2023
DOI: https://doi.org/10.1007/s10676-023-09712-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dirty data labeled dirt cheap: epistemic injustice in machine learning systems

Abstract

Access this article

Similar content being viewed by others

Towards a holistic view of bias in machine learning: bridging algorithmic fairness and imbalanced learning

AI’s fairness problem: understanding wrongful discrimination in the context of automated decision-making

Artificial Intelligence and Discrimination: Discriminating Against Discriminatory Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dirty data labeled dirt cheap: epistemic injustice in machine learning systems

Abstract

Access this article

Similar content being viewed by others

Towards a holistic view of bias in machine learning: bridging algorithmic fairness and imbalanced learning

AI’s fairness problem: understanding wrongful discrimination in the context of automated decision-making

Artificial Intelligence and Discrimination: Discriminating Against Discriminatory Systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation