skip to main content
10.1145/3411764.3445604acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits

Published:07 May 2021Publication History

ABSTRACT

In order to support fairness-forward thinking by machine learning (ML) practitioners, fairness researchers have created toolkits that aim to transform state-of-the-art research contributions into easily-accessible APIs. Despite these efforts, recent research indicates a disconnect between the needs of practitioners and the tools offered by fairness research. By engaging 20 ML practitioners in a simulated scenario in which they utilize fairness toolkits to make critical decisions, this work aims to utilize practitioner feedback to inform recommendations for the design and creation of fair ML toolkits. Through the use of survey and interview data, our results indicate that though fair ML toolkits are incredibly impactful on users’ decision-making, there is much to be desired in the design and demonstration of fairness results. To support the future development and evaluation of toolkits, this work offers a rubric that can be used to identify critical components of Fair ML toolkits.

Skip Supplemental Material Section

Supplemental Material

References

  1. 2020. TensorFlow Extended (TFX) | ML Production Pipelines. https://www.tensorflow.org/tfxGoogle ScholarGoogle Scholar
  2. 2020. Tensorflow’s Fairness Evaluation and Visualization Toolkit. https://github.com/tensorflow/fairness-indicatorsGoogle ScholarGoogle Scholar
  3. ACM. 2020. ACM FAccT. https://facctconference.org/Google ScholarGoogle Scholar
  4. Aniya Aggarwal, Seema Nagar, and Diptikalyan Saha. 2019. Black Box Fairness Testing of Machine Learning Models. 11 (2019). https://doi.org/10.1145/3338906.3338937Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rikke Sand Andersen and Mette Bech Risør. 2014. The importance of contextualization. Anthropological reflections on descriptive analysis, its limitations and implications. Anthropology and Medicine 21, 3 (sep 2014), 345–356. https://doi.org/10.1080/13648470.2013.876355Google ScholarGoogle ScholarCross RefCross Ref
  6. Vijay Arya, Rachel K E Bellamy, Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, Sami Mourad, Pablo Pedemonte, Ramya Raghavendra, John Richards, Prasanna Sattigeri, Karthikeyan Shanmugam, Moninder Singh, Kush R Varshney, Dennis Wei, and Yunfeng Zhang. [n.d.]. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. Technical Report. arxiv:1909.03012v2http://aix360.Google ScholarGoogle Scholar
  7. Tobias Baer. 2019. Understand, Manage, and Prevent Algorithmic Bias: A Guide for Business Users and Data Scientists. Apress.Google ScholarGoogle Scholar
  8. Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and machine learning. fairmlbook.org. https://fairmlbook.org/index.htmlGoogle ScholarGoogle Scholar
  9. Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. Advances in Neural Information Processing Systems 2017-Decem, Nips (oct 2018), 5681–5690. arxiv:1810.01943http://arxiv.org/abs/1810.01943Google ScholarGoogle Scholar
  10. Jay Budzik and Kristian Hammond. 1999. Watson: Anticipating and Contextualizing Information Needs. In Proceedings of the Sixty-second Annual Meeting of the American Society for Information Science.Google ScholarGoogle Scholar
  11. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair Clustering Through Fairlets. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS). Long Beach, CA, 5029–5037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Civil Comments. 2019. Jigsaw Unintended Bias in Toxicity Classification. https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classificationGoogle ScholarGoogle Scholar
  13. Sam Corbett-Davies and Sharad Goel. 2018. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. (jul 2018). arxiv:1808.00023http://arxiv.org/abs/1808.00023Google ScholarGoogle Scholar
  14. Henriette Cramer, Sravana Reddy, Romain Takeo Bouyer, Jean Garcia-Gathright, and Aaron Springer. 2019. Translation, tracks & Data: An algorithmic bias effort in practice. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3290607.3299057Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kate Crawford. 2017. The Trouble with Bias. https://www.youtube.com/watch?v=fMym_BKWQzkGoogle ScholarGoogle Scholar
  16. Kimberle Crenshaw. 1989. Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum 1989, 1 (1989), 139–167. http://chicagounbound.uchicago.edu/uclfhttp://chicagounbound.uchicago.edu/uclf/vol1989/iss1/8Google ScholarGoogle Scholar
  17. Matt Day. 2016. How LinkedIn’s search engine may reflect a gender bias. https://www.seattletimes.com/business/microsoft/how-linkedins-search-engine-may-reflect-a-bias/Google ScholarGoogle Scholar
  18. Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, and Casey Dugan. 2019. Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment. International Conference on Intelligent User Interfaces, Proceedings IUI Part F147615 (jan 2019), 275–285. https://doi.org/10.1145/3301275.3302310 arxiv:1901.07694Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  20. Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2018. A comparative study of fairness-enhancing interventions in machine learning. FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (feb 2018), 329–338. arxiv:1802.04422http://arxiv.org/abs/1802.04422Google ScholarGoogle Scholar
  21. Salvador García, Sergio Ramírez-Gallego, Julián Luengo, José Manuel Benítez, and Francisco Herrera. 2016. Big data preprocessing: methods and prospects. Big Data Analytics 1, 1 (2016), 9.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jean Garcia-Gathright, Aaron Springer, and Henriette Cramer. 2018. Assessing and Addressing Algorithmic Bias - But Before We Get There. In Proceedings ofthe AAAI2018 Spring Symposium: Designing the User Experience ofArtificial Intelligence. arxiv:1809.03332http://arxiv.org/abs/1809.03332Google ScholarGoogle Scholar
  23. Google. 2020. Google Colaboratory.Google ScholarGoogle Scholar
  24. Google. 2020. What-If Tool. https://pair-code.github.io/what-if-tool/get-started/Google ScholarGoogle Scholar
  25. Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.Google ScholarGoogle Scholar
  26. Anna Lauren Hoffmann. 2019. Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse. Information, Communication & Society 22, 7 (2019), 900–915. https://doi.org/10.1080/1369118X.2019.1573912Google ScholarGoogle ScholarCross RefCross Ref
  27. Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudík, and Hanna Wallach. 2018. Improving fairness in machine learning systems: What do industry practitioners need?Conference on Human Factors in Computing Systems - Proceedings (dec 2018). https://doi.org/10.1145/3290605.3300830 arxiv:1812.05239Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (sep 2019), 389–399. https://doi.org/10.1038/s42256-019-0088-2Google ScholarGoogle ScholarCross RefCross Ref
  29. M. I. Jordan and T. M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. , 255–260 pages. https://doi.org/10.1126/science.aaa8415Google ScholarGoogle ScholarCross RefCross Ref
  30. Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (oct 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3313831.3376219Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ronny Kohavi and Barry Becker. 1996. Adult Data Set. http://archive.ics.uci.edu/ml/datasets/AdultGoogle ScholarGoogle Scholar
  33. Himabindu Lakkaraju and Osbert Bastani. 2019. ”How do I fool you?”: Manipulating User Trust via Misleading Black Box Explanations. AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (nov 2019), 79–85. arxiv:1911.06473http://arxiv.org/abs/1911.06473Google ScholarGoogle Scholar
  34. Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. [n.d.]. The Impact of Presentation Style on Human-In-The-Loop Detection of Algorithmic Bias. Technical Report. arxiv:2004.12388v3https://youtu.be/8ZqCKxsbMHgGoogle ScholarGoogle Scholar
  35. Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. 2020. Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study. In Proceedings of the CHI 2020 Workshop on Detection and Design for Cognitive Biases in People and Computing Systems.Google ScholarGoogle ScholarCross RefCross Ref
  36. Pranay K. Lohia, Karthikeyan Natesan Ramamurthy, Manish Bhide, Diptikalyan Saha, Kush R. Varshney, and Ruchir Puri. 2018. Bias Mitigation Post-processing for Individual and Group Fairness. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2019-May (dec 2018), 2847–2851. arxiv:1812.06135http://arxiv.org/abs/1812.06135Google ScholarGoogle Scholar
  37. Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. In CHI Conference on Human Factors in Computing Systems. ACM, Honolulu. https://doi.org/10.1145/3313831.3376445Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A Survey on Bias and Fairness in Machine Learning. (2019). arxiv:1908.09635v2Google ScholarGoogle Scholar
  39. Microsoft. 2020. Fairlearn. https://fairlearn.github.io/Google ScholarGoogle Scholar
  40. Arvind Narayanan. 2018. 21 fairness definitions and their politics. https://fairmlbook.org/tutorial2.html,https://www.youtube.com/embed/jIXIuYdnyykGoogle ScholarGoogle Scholar
  41. David T. Newman, Nathanael J. Fast, and Derek J. Harmon. 2020. When eliminating bias isn’t fair: Algorithmic reductionism and procedural justice in human resource decisions. Organizational Behavior and Human Decision Processes 160 (sep 2020), 149–167. https://doi.org/10.1016/j.obhdp.2020.03.008Google ScholarGoogle ScholarCross RefCross Ref
  42. Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism (first ed.). NYU Press. https://doi.org/10.2307/j.ctt1pwt9w5Google ScholarGoogle ScholarCross RefCross Ref
  43. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. 2019. Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data 2, 13 (jul 2019), 13. https://doi.org/10.3389/fdata.2019.00013Google ScholarGoogle ScholarCross RefCross Ref
  44. Qualtrics. 2020. Qualtrics. https://www.qualtrics.comGoogle ScholarGoogle Scholar
  45. Tahleen Rahman, Bartlomiej Surma, Michael Backes, and Yang Zhang. 2019. FairWalk: Towards fair graph embedding. In IJCAI International Joint Conference on Artificial Intelligence, Vol. 2019-August. International Joint Conferences on Artificial Intelligence, 3289–3295. https://doi.org/10.24963/ijcai.2019/456Google ScholarGoogle ScholarCross RefCross Ref
  46. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. [n.d.]. ”Why Should I Trust You?” Explaining the Predictions of Any Classifier. Technical Report. arxiv:1602.04938v1Google ScholarGoogle Scholar
  47. Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, and Rayid Ghani. 2018. Aequitas: A Bias and Fairness Audit Toolkit. (nov 2018). arxiv:1811.05577http://arxiv.org/abs/1811.05577Google ScholarGoogle Scholar
  48. Marcus Specht, Andreas Lorenz, and Andreas Zimmermann. 2006. An architecture for contextualized learning experiences. In Proceedings - Sixth International Conference on Advanced Learning Technologies, ICALT 2006, Vol. 2006. 169–173. https://doi.org/10.1109/icalt.2006.1652397Google ScholarGoogle ScholarCross RefCross Ref
  49. Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu. 2020. Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit. https://engineering.linkedin.com/blog/2020/lift-addressing-bias-in-large-scale-ai-applicationsGoogle ScholarGoogle Scholar
  50. Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2 (2017). https://doi.org/10.1177/2053951717743530Google ScholarGoogle ScholarCross RefCross Ref
  51. Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making. Conference on Human Factors in Computing Systems - Proceedings 2018-April (feb 2018). https://doi.org/10.1145/3173574.3174014 arxiv:1802.01029Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. IEEE/ACM International Workshop on Software Fairness 18 (2018). https://doi.org/10.1145/3194770.3194776Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Benjamin Wilson, Judy Hoffman, and Jamie Morgenstern. 2019. Predictive Inequity in Object Detection. (feb 2019). arxiv:1902.11097http://arxiv.org/abs/1902.11097Google ScholarGoogle Scholar
  54. Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A fair top-k ranking algorithm. In International Conference on Information and Knowledge Management, Proceedings, Vol. Part F131841. Association for Computing Machinery, New York, NY, USA, 1569–1578. https://doi.org/10.1145/3132847.3132938Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
          May 2021
          10862 pages
          ISBN:9781450380966
          DOI:10.1145/3411764

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 May 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate6,199of26,314submissions,24%

          Upcoming Conference

          CHI '24
          CHI Conference on Human Factors in Computing Systems
          May 11 - 16, 2024
          Honolulu , HI , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format