research-article

Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits

Authors:

Brianna Richardson,

Jean Garcia-Gathright,

Henriette CramerAuthors Info & Claims

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Article No.: 236, Pages 1 - 13

https://doi.org/10.1145/3411764.3445604

Published: 07 May 2021 Publication History

Abstract

In order to support fairness-forward thinking by machine learning (ML) practitioners, fairness researchers have created toolkits that aim to transform state-of-the-art research contributions into easily-accessible APIs. Despite these efforts, recent research indicates a disconnect between the needs of practitioners and the tools offered by fairness research. By engaging 20 ML practitioners in a simulated scenario in which they utilize fairness toolkits to make critical decisions, this work aims to utilize practitioner feedback to inform recommendations for the design and creation of fair ML toolkits. Through the use of survey and interview data, our results indicate that though fair ML toolkits are incredibly impactful on users’ decision-making, there is much to be desired in the design and demonstration of fairness results. To support the future development and evaluation of toolkits, this work offers a rubric that can be used to identify critical components of Fair ML toolkits.

Supplementary Material

Supplementary Materials (3411764.3445604_supplementalmaterials.zip)

Download
455.42 KB

References

[1]

2020. TensorFlow Extended (TFX) | ML Production Pipelines. https://www.tensorflow.org/tfx

[2]

2020. Tensorflow’s Fairness Evaluation and Visualization Toolkit. https://github.com/tensorflow/fairness-indicators

[3]

ACM. 2020. ACM FAccT. https://facctconference.org/

[4]

Aniya Aggarwal, Seema Nagar, and Diptikalyan Saha. 2019. Black Box Fairness Testing of Machine Learning Models. 11 (2019). https://doi.org/10.1145/3338906.3338937

Digital Library

[5]

Rikke Sand Andersen and Mette Bech Risør. 2014. The importance of contextualization. Anthropological reflections on descriptive analysis, its limitations and implications. Anthropology and Medicine 21, 3 (sep 2014), 345–356. https://doi.org/10.1080/13648470.2013.876355

[6]

Vijay Arya, Rachel K E Bellamy, Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, Sami Mourad, Pablo Pedemonte, Ramya Raghavendra, John Richards, Prasanna Sattigeri, Karthikeyan Shanmugam, Moninder Singh, Kush R Varshney, Dennis Wei, and Yunfeng Zhang. [n.d.]. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. Technical Report. arxiv:1909.03012v2http://aix360.

[7]

Tobias Baer. 2019. Understand, Manage, and Prevent Algorithmic Bias: A Guide for Business Users and Data Scientists. Apress.

[8]

Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and machine learning. fairmlbook.org. https://fairmlbook.org/index.html

[9]

Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. Advances in Neural Information Processing Systems 2017-Decem, Nips (oct 2018), 5681–5690. arxiv:1810.01943http://arxiv.org/abs/1810.01943

[10]

Jay Budzik and Kristian Hammond. 1999. Watson: Anticipating and Contextualizing Information Needs. In Proceedings of the Sixty-second Annual Meeting of the American Society for Information Science.

[11]

Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair Clustering Through Fairlets. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS). Long Beach, CA, 5029–5037.

Digital Library

[12]

Civil Comments. 2019. Jigsaw Unintended Bias in Toxicity Classification. https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification

[13]

Sam Corbett-Davies and Sharad Goel. 2018. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. (jul 2018). arxiv:1808.00023http://arxiv.org/abs/1808.00023

[14]

Henriette Cramer, Sravana Reddy, Romain Takeo Bouyer, Jean Garcia-Gathright, and Aaron Springer. 2019. Translation, tracks & Data: An algorithmic bias effort in practice. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3290607.3299057

Digital Library

[15]

Kate Crawford. 2017. The Trouble with Bias. https://www.youtube.com/watch?v=fMym_BKWQzk

[16]

Kimberle Crenshaw. 1989. Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum 1989, 1 (1989), 139–167. http://chicagounbound.uchicago.edu/uclfhttp://chicagounbound.uchicago.edu/uclf/vol1989/iss1/8

[17]

Matt Day. 2016. How LinkedIn’s search engine may reflect a gender bias. https://www.seattletimes.com/business/microsoft/how-linkedins-search-engine-may-reflect-a-bias/

[18]

Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, and Casey Dugan. 2019. Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment. International Conference on Intelligent User Interfaces, Proceedings IUI Part F147615 (jan 2019), 275–285. https://doi.org/10.1145/3301275.3302310 arxiv:1901.07694

Digital Library

[19]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[20]

Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2018. A comparative study of fairness-enhancing interventions in machine learning. FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (feb 2018), 329–338. arxiv:1802.04422http://arxiv.org/abs/1802.04422

[21]

Salvador García, Sergio Ramírez-Gallego, Julián Luengo, José Manuel Benítez, and Francisco Herrera. 2016. Big data preprocessing: methods and prospects. Big Data Analytics 1, 1 (2016), 9.

[22]

Jean Garcia-Gathright, Aaron Springer, and Henriette Cramer. 2018. Assessing and Addressing Algorithmic Bias - But Before We Get There. In Proceedings ofthe AAAI2018 Spring Symposium: Designing the User Experience ofArtificial Intelligence. arxiv:1809.03332http://arxiv.org/abs/1809.03332

[23]

Google. 2020. Google Colaboratory.

[24]

Google. 2020. What-If Tool. https://pair-code.github.io/what-if-tool/get-started/

[25]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.

[26]

Anna Lauren Hoffmann. 2019. Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse. Information, Communication & Society 22, 7 (2019), 900–915. https://doi.org/10.1080/1369118X.2019.1573912

[27]

Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudík, and Hanna Wallach. 2018. Improving fairness in machine learning systems: What do industry practitioners need?Conference on Human Factors in Computing Systems - Proceedings (dec 2018). https://doi.org/10.1145/3290605.3300830 arxiv:1812.05239

Digital Library

[28]

Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (sep 2019), 389–399. https://doi.org/10.1038/s42256-019-0088-2

[29]

M. I. Jordan and T. M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects., 255–260 pages. https://doi.org/10.1126/science.aaa8415

[30]

Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (oct 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8

Digital Library

[31]

Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3313831.3376219

Digital Library

[32]

Ronny Kohavi and Barry Becker. 1996. Adult Data Set. http://archive.ics.uci.edu/ml/datasets/Adult

[33]

Himabindu Lakkaraju and Osbert Bastani. 2019. ”How do I fool you?”: Manipulating User Trust via Misleading Black Box Explanations. AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (nov 2019), 79–85. arxiv:1911.06473http://arxiv.org/abs/1911.06473

[34]

Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. [n.d.]. The Impact of Presentation Style on Human-In-The-Loop Detection of Algorithmic Bias. Technical Report. arxiv:2004.12388v3https://youtu.be/8ZqCKxsbMHg

[35]

Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. 2020. Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study. In Proceedings of the CHI 2020 Workshop on Detection and Design for Cognitive Biases in People and Computing Systems.

[36]

Pranay K. Lohia, Karthikeyan Natesan Ramamurthy, Manish Bhide, Diptikalyan Saha, Kush R. Varshney, and Ruchir Puri. 2018. Bias Mitigation Post-processing for Individual and Group Fairness. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2019-May (dec 2018), 2847–2851. arxiv:1812.06135http://arxiv.org/abs/1812.06135

[37]

Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. In CHI Conference on Human Factors in Computing Systems. ACM, Honolulu. https://doi.org/10.1145/3313831.3376445

Digital Library

[38]

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A Survey on Bias and Fairness in Machine Learning. (2019). arxiv:1908.09635v2

[39]

Microsoft. 2020. Fairlearn. https://fairlearn.github.io/

[40]

Arvind Narayanan. 2018. 21 fairness definitions and their politics. https://fairmlbook.org/tutorial2.html,https://www.youtube.com/embed/jIXIuYdnyyk

[41]

David T. Newman, Nathanael J. Fast, and Derek J. Harmon. 2020. When eliminating bias isn’t fair: Algorithmic reductionism and procedural justice in human resource decisions. Organizational Behavior and Human Decision Processes 160 (sep 2020), 149–167. https://doi.org/10.1016/j.obhdp.2020.03.008

[42]

Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism (first ed.). NYU Press. https://doi.org/10.2307/j.ctt1pwt9w5

[43]

Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. 2019. Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data 2, 13 (jul 2019), 13. https://doi.org/10.3389/fdata.2019.00013

[44]

Qualtrics. 2020. Qualtrics. https://www.qualtrics.com

[45]

Tahleen Rahman, Bartlomiej Surma, Michael Backes, and Yang Zhang. 2019. FairWalk: Towards fair graph embedding. In IJCAI International Joint Conference on Artificial Intelligence, Vol. 2019-August. International Joint Conferences on Artificial Intelligence, 3289–3295. https://doi.org/10.24963/ijcai.2019/456

[46]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. [n.d.]. ”Why Should I Trust You?” Explaining the Predictions of Any Classifier. Technical Report. arxiv:1602.04938v1

[47]

Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, and Rayid Ghani. 2018. Aequitas: A Bias and Fairness Audit Toolkit. (nov 2018). arxiv:1811.05577http://arxiv.org/abs/1811.05577

[48]

Marcus Specht, Andreas Lorenz, and Andreas Zimmermann. 2006. An architecture for contextualized learning experiences. In Proceedings - Sixth International Conference on Advanced Learning Technologies, ICALT 2006, Vol. 2006. 169–173. https://doi.org/10.1109/icalt.2006.1652397

[49]

Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu. 2020. Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit. https://engineering.linkedin.com/blog/2020/lift-addressing-bias-in-large-scale-ai-applications

[50]

Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2 (2017). https://doi.org/10.1177/2053951717743530

[51]

Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making. Conference on Human Factors in Computing Systems - Proceedings 2018-April (feb 2018). https://doi.org/10.1145/3173574.3174014 arxiv:1802.01029

Digital Library

[52]

Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. IEEE/ACM International Workshop on Software Fairness 18 (2018). https://doi.org/10.1145/3194770.3194776

Digital Library

[53]

Benjamin Wilson, Judy Hoffman, and Jamie Morgenstern. 2019. Predictive Inequity in Object Detection. (feb 2019). arxiv:1902.11097http://arxiv.org/abs/1902.11097

[54]

Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A fair top-k ranking algorithm. In International Conference on Information and Knowledge Management, Proceedings, Vol. Part F131841. Association for Computing Machinery, New York, NY, USA, 1569–1578. https://doi.org/10.1145/3132847.3132938

Digital Library

Cited By

Tocchetti ACorti LBalayn AYurrita MLippmann PBrambilla MYang J(2025)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/366592657:6(1-38)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3665926
Parraga OMore MOliveira CGavenski NKupssinskü LMedronha AMoura LSimões GBarros R(2025)Fairness in Deep Learning: A Survey on Vision and Language ResearchACM Computing Surveys10.1145/363754957:6(1-40)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3637549
Candello HSoella GNascimento L(2024)Designing multi-model conversational AI financial systems: understanding sensitive values of women entrepreneurs in BrazilProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672409(11-18)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672406.3672409
Show More Cited By

Index Terms

Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits

Index terms have been assigned to the content through auto-classification.

Recommendations

Airtime Fairness for IEEE 802.11 Multirate Networks

Under a multi rate network scenario, the IEEE 802.11 DCF MAC fails to provide air-time fairness for all competing stations since the protocol is designed for ensuring max-min throughput fairness and the maximum achievable throughput by any station gets ...
Fairness in multi-hop wireless backhaul networks: a dynamic estimation approach
QShine '08: Proceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness

In this work, we consider the problem of fairness for Transit Access Points (TAP) in multi-hop wireless backhaul networks. Existing approaches are not practical due to the requirement for modifications to the MAC layer or queueing operations of TAPs, or ...
Fair round robin binary countdown to achieve QoS guarantee and fairness in WLANs

How to guarantee both quality of service (QoS) and fairness in wireless local area networks (WLANs) is a challenging issue. To touch this issue, a fair medium access control (MAC) scheme called fair round robin binary countdown (FRRBC) adopting the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

May 2021

10862 pages

ISBN:9781450380966

DOI:10.1145/3411764

General Chairs:
Yoshifumi Kitamura
Tohoku University, Japan
,
Aaron Quigley
University of New South Wales, Australia
,
Program Chairs:
Katherine Isbister
University of California Santa Cruz, USA
,
Takeo Igarashi
The University of Tokyo, Japan
,
Publications Chairs:
Pernille Bjørn
University of Copenhagen, Denmark
,
Steven Drucker
Microsoft Research, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CHI '21

Sponsor:

SIGCHI

CHI '21: CHI Conference on Human Factors in Computing Systems

May 8 - 13, 2021

Yokohama, Japan

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
1,345
Total Downloads

Downloads (Last 12 months)215
Downloads (Last 6 weeks)61

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tocchetti ACorti LBalayn AYurrita MLippmann PBrambilla MYang J(2025)A.I. Robustness: a Human-Centered Perspective on Technological Challenges and OpportunitiesACM Computing Surveys10.1145/366592657:6(1-38)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3665926
Parraga OMore MOliveira CGavenski NKupssinskü LMedronha AMoura LSimões GBarros R(2025)Fairness in Deep Learning: A Survey on Vision and Language ResearchACM Computing Surveys10.1145/363754957:6(1-40)Online publication date: 10-Feb-2025
https://dl.acm.org/doi/10.1145/3637549
Candello HSoella GNascimento L(2024)Designing multi-model conversational AI financial systems: understanding sensitive values of women entrepreneurs in BrazilProceedings of the 2024 ACM International Conference on Interactive Media Experiences Workshops10.1145/3672406.3672409(11-18)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672406.3672409
Madaio MChen JWallach HWortman Vaughan J(2024)Tinker, Tailor, Configure, Customize: The Articulation Work of Contextualizing an AI Fairness ChecklistProceedings of the ACM on Human-Computer Interaction10.1145/36537058:CSCW1(1-20)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3653705
Watkins EChen J(2024)The four-fifths rule is not disparate impact: A woeful tale of epistemic trespassing in algorithmic fairnessProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658938(764-775)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3630106.3658938
Constantinides MTahaei MQuercia DStumpf SMadaio MKennedy SWilcox LVitak JCramer HBogucka EBaeza-Yates RLuger EHolbrook JMuller MBlumenfeld IPistilli G(2024)Implications of Regulations on the Use of AI and Generative AI for Human-Centered Responsible Artificial IntelligenceExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3643979(1-4)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3643979
Harrison GBryson KBamba ADovichi LBinion ABorem AUr B(2024)JupyterLab in Retrograde: Contextual Notifications That Highlight Fairness and Bias Issues for Data ScientistsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642755(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642755
Alvarado Garcia AWong-Villacres MHernández BLe Dantec C(2024)Bitacora: A Toolkit for Supporting NonProfits to Critically Reflect on Social Media Data UseProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642673(1-29)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642673
Kommiya Mothilal RGuha SAhmed S(2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642501
Berman GGoyal NMadaio M(2024)A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness EvaluationsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642398(1-24)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642398
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten