skip to main content
10.1145/3411764.3445604acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits

Published: 07 May 2021 Publication History

Abstract

In order to support fairness-forward thinking by machine learning (ML) practitioners, fairness researchers have created toolkits that aim to transform state-of-the-art research contributions into easily-accessible APIs. Despite these efforts, recent research indicates a disconnect between the needs of practitioners and the tools offered by fairness research. By engaging 20 ML practitioners in a simulated scenario in which they utilize fairness toolkits to make critical decisions, this work aims to utilize practitioner feedback to inform recommendations for the design and creation of fair ML toolkits. Through the use of survey and interview data, our results indicate that though fair ML toolkits are incredibly impactful on users’ decision-making, there is much to be desired in the design and demonstration of fairness results. To support the future development and evaluation of toolkits, this work offers a rubric that can be used to identify critical components of Fair ML toolkits.

Supplementary Material

Supplementary Materials (3411764.3445604_supplementalmaterials.zip)

References

[1]
2020. TensorFlow Extended (TFX) | ML Production Pipelines. https://www.tensorflow.org/tfx
[2]
2020. Tensorflow’s Fairness Evaluation and Visualization Toolkit. https://github.com/tensorflow/fairness-indicators
[3]
ACM. 2020. ACM FAccT. https://facctconference.org/
[4]
Aniya Aggarwal, Seema Nagar, and Diptikalyan Saha. 2019. Black Box Fairness Testing of Machine Learning Models. 11 (2019). https://doi.org/10.1145/3338906.3338937
[5]
Rikke Sand Andersen and Mette Bech Risør. 2014. The importance of contextualization. Anthropological reflections on descriptive analysis, its limitations and implications. Anthropology and Medicine 21, 3 (sep 2014), 345–356. https://doi.org/10.1080/13648470.2013.876355
[6]
Vijay Arya, Rachel K E Bellamy, Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, Sami Mourad, Pablo Pedemonte, Ramya Raghavendra, John Richards, Prasanna Sattigeri, Karthikeyan Shanmugam, Moninder Singh, Kush R Varshney, Dennis Wei, and Yunfeng Zhang. [n.d.]. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. Technical Report. arxiv:1909.03012v2http://aix360.
[7]
Tobias Baer. 2019. Understand, Manage, and Prevent Algorithmic Bias: A Guide for Business Users and Data Scientists. Apress.
[8]
Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and machine learning. fairmlbook.org. https://fairmlbook.org/index.html
[9]
Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. Advances in Neural Information Processing Systems 2017-Decem, Nips (oct 2018), 5681–5690. arxiv:1810.01943http://arxiv.org/abs/1810.01943
[10]
Jay Budzik and Kristian Hammond. 1999. Watson: Anticipating and Contextualizing Information Needs. In Proceedings of the Sixty-second Annual Meeting of the American Society for Information Science.
[11]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair Clustering Through Fairlets. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS). Long Beach, CA, 5029–5037.
[12]
Civil Comments. 2019. Jigsaw Unintended Bias in Toxicity Classification. https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification
[13]
Sam Corbett-Davies and Sharad Goel. 2018. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. (jul 2018). arxiv:1808.00023http://arxiv.org/abs/1808.00023
[14]
Henriette Cramer, Sravana Reddy, Romain Takeo Bouyer, Jean Garcia-Gathright, and Aaron Springer. 2019. Translation, tracks & Data: An algorithmic bias effort in practice. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3290607.3299057
[15]
Kate Crawford. 2017. The Trouble with Bias. https://www.youtube.com/watch?v=fMym_BKWQzk
[16]
Kimberle Crenshaw. 1989. Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum 1989, 1 (1989), 139–167. http://chicagounbound.uchicago.edu/uclfhttp://chicagounbound.uchicago.edu/uclf/vol1989/iss1/8
[17]
Matt Day. 2016. How LinkedIn’s search engine may reflect a gender bias. https://www.seattletimes.com/business/microsoft/how-linkedins-search-engine-may-reflect-a-bias/
[18]
Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, and Casey Dugan. 2019. Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment. International Conference on Intelligent User Interfaces, Proceedings IUI Part F147615 (jan 2019), 275–285. https://doi.org/10.1145/3301275.3302310 arxiv:1901.07694
[19]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[20]
Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2018. A comparative study of fairness-enhancing interventions in machine learning. FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (feb 2018), 329–338. arxiv:1802.04422http://arxiv.org/abs/1802.04422
[21]
Salvador García, Sergio Ramírez-Gallego, Julián Luengo, José Manuel Benítez, and Francisco Herrera. 2016. Big data preprocessing: methods and prospects. Big Data Analytics 1, 1 (2016), 9.
[22]
Jean Garcia-Gathright, Aaron Springer, and Henriette Cramer. 2018. Assessing and Addressing Algorithmic Bias - But Before We Get There. In Proceedings ofthe AAAI2018 Spring Symposium: Designing the User Experience ofArtificial Intelligence. arxiv:1809.03332http://arxiv.org/abs/1809.03332
[23]
Google. 2020. Google Colaboratory.
[24]
Google. 2020. What-If Tool. https://pair-code.github.io/what-if-tool/get-started/
[25]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.
[26]
Anna Lauren Hoffmann. 2019. Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse. Information, Communication & Society 22, 7 (2019), 900–915. https://doi.org/10.1080/1369118X.2019.1573912
[27]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudík, and Hanna Wallach. 2018. Improving fairness in machine learning systems: What do industry practitioners need?Conference on Human Factors in Computing Systems - Proceedings (dec 2018). https://doi.org/10.1145/3290605.3300830 arxiv:1812.05239
[28]
Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (sep 2019), 389–399. https://doi.org/10.1038/s42256-019-0088-2
[29]
M. I. Jordan and T. M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects., 255–260 pages. https://doi.org/10.1126/science.aaa8415
[30]
Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (oct 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8
[31]
Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3313831.3376219
[32]
Ronny Kohavi and Barry Becker. 1996. Adult Data Set. http://archive.ics.uci.edu/ml/datasets/Adult
[33]
Himabindu Lakkaraju and Osbert Bastani. 2019. ”How do I fool you?”: Manipulating User Trust via Misleading Black Box Explanations. AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (nov 2019), 79–85. arxiv:1911.06473http://arxiv.org/abs/1911.06473
[34]
Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. [n.d.]. The Impact of Presentation Style on Human-In-The-Loop Detection of Algorithmic Bias. Technical Report. arxiv:2004.12388v3https://youtu.be/8ZqCKxsbMHg
[35]
Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. 2020. Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study. In Proceedings of the CHI 2020 Workshop on Detection and Design for Cognitive Biases in People and Computing Systems.
[36]
Pranay K. Lohia, Karthikeyan Natesan Ramamurthy, Manish Bhide, Diptikalyan Saha, Kush R. Varshney, and Ruchir Puri. 2018. Bias Mitigation Post-processing for Individual and Group Fairness. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2019-May (dec 2018), 2847–2851. arxiv:1812.06135http://arxiv.org/abs/1812.06135
[37]
Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. In CHI Conference on Human Factors in Computing Systems. ACM, Honolulu. https://doi.org/10.1145/3313831.3376445
[38]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A Survey on Bias and Fairness in Machine Learning. (2019). arxiv:1908.09635v2
[39]
Microsoft. 2020. Fairlearn. https://fairlearn.github.io/
[40]
Arvind Narayanan. 2018. 21 fairness definitions and their politics. https://fairmlbook.org/tutorial2.html,https://www.youtube.com/embed/jIXIuYdnyyk
[41]
David T. Newman, Nathanael J. Fast, and Derek J. Harmon. 2020. When eliminating bias isn’t fair: Algorithmic reductionism and procedural justice in human resource decisions. Organizational Behavior and Human Decision Processes 160 (sep 2020), 149–167. https://doi.org/10.1016/j.obhdp.2020.03.008
[42]
Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism (first ed.). NYU Press. https://doi.org/10.2307/j.ctt1pwt9w5
[43]
Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. 2019. Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data 2, 13 (jul 2019), 13. https://doi.org/10.3389/fdata.2019.00013
[44]
Qualtrics. 2020. Qualtrics. https://www.qualtrics.com
[45]
Tahleen Rahman, Bartlomiej Surma, Michael Backes, and Yang Zhang. 2019. FairWalk: Towards fair graph embedding. In IJCAI International Joint Conference on Artificial Intelligence, Vol. 2019-August. International Joint Conferences on Artificial Intelligence, 3289–3295. https://doi.org/10.24963/ijcai.2019/456
[46]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. [n.d.]. ”Why Should I Trust You?” Explaining the Predictions of Any Classifier. Technical Report. arxiv:1602.04938v1
[47]
Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, and Rayid Ghani. 2018. Aequitas: A Bias and Fairness Audit Toolkit. (nov 2018). arxiv:1811.05577http://arxiv.org/abs/1811.05577
[48]
Marcus Specht, Andreas Lorenz, and Andreas Zimmermann. 2006. An architecture for contextualized learning experiences. In Proceedings - Sixth International Conference on Advanced Learning Technologies, ICALT 2006, Vol. 2006. 169–173. https://doi.org/10.1109/icalt.2006.1652397
[49]
Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu. 2020. Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit. https://engineering.linkedin.com/blog/2020/lift-addressing-bias-in-large-scale-ai-applications
[50]
Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2 (2017). https://doi.org/10.1177/2053951717743530
[51]
Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making. Conference on Human Factors in Computing Systems - Proceedings 2018-April (feb 2018). https://doi.org/10.1145/3173574.3174014 arxiv:1802.01029
[52]
Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. IEEE/ACM International Workshop on Software Fairness 18 (2018). https://doi.org/10.1145/3194770.3194776
[53]
Benjamin Wilson, Judy Hoffman, and Jamie Morgenstern. 2019. Predictive Inequity in Object Detection. (feb 2019). arxiv:1902.11097http://arxiv.org/abs/1902.11097
[54]
Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A fair top-k ranking algorithm. In International Conference on Information and Knowledge Management, Proceedings, Vol. Part F131841. Association for Computing Machinery, New York, NY, USA, 1569–1578. https://doi.org/10.1145/3132847.3132938

Cited By

View all
  • (2025)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/366592657:6(1-38)Online publication date: 10-Feb-2025
  • (2025)Fairness in Deep Learning: A Survey on Vision and Language ResearchACM Computing Surveys10.1145/363754957:6(1-40)Online publication date: 10-Feb-2025
  • (2024)Designing multi-model conversational AI financial systems: understanding sensitive values of women entrepreneurs in BrazilProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672409(11-18)Online publication date: 12-Jun-2024
  • Show More Cited By

Index Terms

  1. Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
        May 2021
        10862 pages
        ISBN:9781450380966
        DOI:10.1145/3411764
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 May 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. AI
        2. ML
        3. algorithmic bias
        4. ethics
        5. fairness
        6. machine learning fairness
        7. user-centric evaluation

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        CHI '21
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

        Upcoming Conference

        CHI 2025
        ACM CHI Conference on Human Factors in Computing Systems
        April 26 - May 1, 2025
        Yokohama , Japan

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)215
        • Downloads (Last 6 weeks)61
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/366592657:6(1-38)Online publication date: 10-Feb-2025
        • (2025)Fairness in Deep Learning: A Survey on Vision and Language ResearchACM Computing Surveys10.1145/363754957:6(1-40)Online publication date: 10-Feb-2025
        • (2024)Designing multi-model conversational AI financial systems: understanding sensitive values of women entrepreneurs in BrazilProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672409(11-18)Online publication date: 12-Jun-2024
        • (2024)Tinker, Tailor, Configure, Customize: The Articulation Work of Contextualizing an AI Fairness ChecklistProceedings of the ACM on Human-Computer Interaction10.1145/36537058:CSCW1(1-20)Online publication date: 26-Apr-2024
        • (2024)The four-fifths rule is not disparate impact: A woeful tale of epistemic trespassing in algorithmic fairnessProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658938(764-775)Online publication date: 3-Jun-2024
        • (2024)Implications of Regulations on the Use of AI and Generative AI for Human-Centered Responsible Artificial IntelligenceExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3643979(1-4)Online publication date: 11-May-2024
        • (2024)JupyterLab in Retrograde: Contextual Notifications That Highlight Fairness and Bias Issues for Data ScientistsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642755(1-19)Online publication date: 11-May-2024
        • (2024)Bitacora: A Toolkit for Supporting NonProfits to Critically Reflect on Social Media Data UseProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642673(1-29)Online publication date: 11-May-2024
        • (2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
        • (2024)A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness EvaluationsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642398(1-24)Online publication date: 11-May-2024
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media