ABSTRACT
In order to support fairness-forward thinking by machine learning (ML) practitioners, fairness researchers have created toolkits that aim to transform state-of-the-art research contributions into easily-accessible APIs. Despite these efforts, recent research indicates a disconnect between the needs of practitioners and the tools offered by fairness research. By engaging 20 ML practitioners in a simulated scenario in which they utilize fairness toolkits to make critical decisions, this work aims to utilize practitioner feedback to inform recommendations for the design and creation of fair ML toolkits. Through the use of survey and interview data, our results indicate that though fair ML toolkits are incredibly impactful on users’ decision-making, there is much to be desired in the design and demonstration of fairness results. To support the future development and evaluation of toolkits, this work offers a rubric that can be used to identify critical components of Fair ML toolkits.
Supplemental Material
Available for Download
- 2020. TensorFlow Extended (TFX) | ML Production Pipelines. https://www.tensorflow.org/tfxGoogle Scholar
- 2020. Tensorflow’s Fairness Evaluation and Visualization Toolkit. https://github.com/tensorflow/fairness-indicatorsGoogle Scholar
- ACM. 2020. ACM FAccT. https://facctconference.org/Google Scholar
- Aniya Aggarwal, Seema Nagar, and Diptikalyan Saha. 2019. Black Box Fairness Testing of Machine Learning Models. 11 (2019). https://doi.org/10.1145/3338906.3338937Google ScholarDigital Library
- Rikke Sand Andersen and Mette Bech Risør. 2014. The importance of contextualization. Anthropological reflections on descriptive analysis, its limitations and implications. Anthropology and Medicine 21, 3 (sep 2014), 345–356. https://doi.org/10.1080/13648470.2013.876355Google ScholarCross Ref
- Vijay Arya, Rachel K E Bellamy, Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, Sami Mourad, Pablo Pedemonte, Ramya Raghavendra, John Richards, Prasanna Sattigeri, Karthikeyan Shanmugam, Moninder Singh, Kush R Varshney, Dennis Wei, and Yunfeng Zhang. [n.d.]. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. Technical Report. arxiv:1909.03012v2http://aix360.Google Scholar
- Tobias Baer. 2019. Understand, Manage, and Prevent Algorithmic Bias: A Guide for Business Users and Data Scientists. Apress.Google Scholar
- Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and machine learning. fairmlbook.org. https://fairmlbook.org/index.htmlGoogle Scholar
- Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. Advances in Neural Information Processing Systems 2017-Decem, Nips (oct 2018), 5681–5690. arxiv:1810.01943http://arxiv.org/abs/1810.01943Google Scholar
- Jay Budzik and Kristian Hammond. 1999. Watson: Anticipating and Contextualizing Information Needs. In Proceedings of the Sixty-second Annual Meeting of the American Society for Information Science.Google Scholar
- Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. 2017. Fair Clustering Through Fairlets. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS). Long Beach, CA, 5029–5037.Google ScholarDigital Library
- Civil Comments. 2019. Jigsaw Unintended Bias in Toxicity Classification. https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classificationGoogle Scholar
- Sam Corbett-Davies and Sharad Goel. 2018. The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. (jul 2018). arxiv:1808.00023http://arxiv.org/abs/1808.00023Google Scholar
- Henriette Cramer, Sravana Reddy, Romain Takeo Bouyer, Jean Garcia-Gathright, and Aaron Springer. 2019. Translation, tracks & Data: An algorithmic bias effort in practice. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3290607.3299057Google ScholarDigital Library
- Kate Crawford. 2017. The Trouble with Bias. https://www.youtube.com/watch?v=fMym_BKWQzkGoogle Scholar
- Kimberle Crenshaw. 1989. Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics. University of Chicago Legal Forum 1989, 1 (1989), 139–167. http://chicagounbound.uchicago.edu/uclfhttp://chicagounbound.uchicago.edu/uclf/vol1989/iss1/8Google Scholar
- Matt Day. 2016. How LinkedIn’s search engine may reflect a gender bias. https://www.seattletimes.com/business/microsoft/how-linkedins-search-engine-may-reflect-a-bias/Google Scholar
- Jonathan Dodge, Q. Vera Liao, Yunfeng Zhang, Rachel K. E. Bellamy, and Casey Dugan. 2019. Explaining Models: An Empirical Study of How Explanations Impact Fairness Judgment. International Conference on Intelligent User Interfaces, Proceedings IUI Part F147615 (jan 2019), 275–285. https://doi.org/10.1145/3301275.3302310 arxiv:1901.07694Google ScholarDigital Library
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2018. A comparative study of fairness-enhancing interventions in machine learning. FAT* 2019 - Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (feb 2018), 329–338. arxiv:1802.04422http://arxiv.org/abs/1802.04422Google Scholar
- Salvador García, Sergio Ramírez-Gallego, Julián Luengo, José Manuel Benítez, and Francisco Herrera. 2016. Big data preprocessing: methods and prospects. Big Data Analytics 1, 1 (2016), 9.Google ScholarCross Ref
- Jean Garcia-Gathright, Aaron Springer, and Henriette Cramer. 2018. Assessing and Addressing Algorithmic Bias - But Before We Get There. In Proceedings ofthe AAAI2018 Spring Symposium: Designing the User Experience ofArtificial Intelligence. arxiv:1809.03332http://arxiv.org/abs/1809.03332Google Scholar
- Google. 2020. Google Colaboratory.Google Scholar
- Google. 2020. What-If Tool. https://pair-code.github.io/what-if-tool/get-started/Google Scholar
- Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.Google Scholar
- Anna Lauren Hoffmann. 2019. Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse. Information, Communication & Society 22, 7 (2019), 900–915. https://doi.org/10.1080/1369118X.2019.1573912Google ScholarCross Ref
- Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, Miro Dudík, and Hanna Wallach. 2018. Improving fairness in machine learning systems: What do industry practitioners need?Conference on Human Factors in Computing Systems - Proceedings (dec 2018). https://doi.org/10.1145/3290605.3300830 arxiv:1812.05239Google ScholarDigital Library
- Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1, 9 (sep 2019), 389–399. https://doi.org/10.1038/s42256-019-0088-2Google ScholarCross Ref
- M. I. Jordan and T. M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. , 255–260 pages. https://doi.org/10.1126/science.aaa8415Google ScholarCross Ref
- Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (oct 2012), 1–33. https://doi.org/10.1007/s10115-011-0463-8Google ScholarDigital Library
- Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3313831.3376219Google ScholarDigital Library
- Ronny Kohavi and Barry Becker. 1996. Adult Data Set. http://archive.ics.uci.edu/ml/datasets/AdultGoogle Scholar
- Himabindu Lakkaraju and Osbert Bastani. 2019. ”How do I fool you?”: Manipulating User Trust via Misleading Black Box Explanations. AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (nov 2019), 79–85. arxiv:1911.06473http://arxiv.org/abs/1911.06473Google Scholar
- Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. [n.d.]. The Impact of Presentation Style on Human-In-The-Loop Detection of Algorithmic Bias. Technical Report. arxiv:2004.12388v3https://youtu.be/8ZqCKxsbMHgGoogle Scholar
- Po-Ming Law, Sana Malik, Fan Du, and Moumita Sinha. 2020. Designing Tools for Semi-Automated Detection of Machine Learning Biases: An Interview Study. In Proceedings of the CHI 2020 Workshop on Detection and Design for Cognitive Biases in People and Computing Systems.Google ScholarCross Ref
- Pranay K. Lohia, Karthikeyan Natesan Ramamurthy, Manish Bhide, Diptikalyan Saha, Kush R. Varshney, and Ruchir Puri. 2018. Bias Mitigation Post-processing for Individual and Group Fairness. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2019-May (dec 2018), 2847–2851. arxiv:1812.06135http://arxiv.org/abs/1812.06135Google Scholar
- Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. In CHI Conference on Human Factors in Computing Systems. ACM, Honolulu. https://doi.org/10.1145/3313831.3376445Google ScholarDigital Library
- Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A Survey on Bias and Fairness in Machine Learning. (2019). arxiv:1908.09635v2Google Scholar
- Microsoft. 2020. Fairlearn. https://fairlearn.github.io/Google Scholar
- Arvind Narayanan. 2018. 21 fairness definitions and their politics. https://fairmlbook.org/tutorial2.html,https://www.youtube.com/embed/jIXIuYdnyykGoogle Scholar
- David T. Newman, Nathanael J. Fast, and Derek J. Harmon. 2020. When eliminating bias isn’t fair: Algorithmic reductionism and procedural justice in human resource decisions. Organizational Behavior and Human Decision Processes 160 (sep 2020), 149–167. https://doi.org/10.1016/j.obhdp.2020.03.008Google ScholarCross Ref
- Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism (first ed.). NYU Press. https://doi.org/10.2307/j.ctt1pwt9w5Google ScholarCross Ref
- Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. 2019. Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data 2, 13 (jul 2019), 13. https://doi.org/10.3389/fdata.2019.00013Google ScholarCross Ref
- Qualtrics. 2020. Qualtrics. https://www.qualtrics.comGoogle Scholar
- Tahleen Rahman, Bartlomiej Surma, Michael Backes, and Yang Zhang. 2019. FairWalk: Towards fair graph embedding. In IJCAI International Joint Conference on Artificial Intelligence, Vol. 2019-August. International Joint Conferences on Artificial Intelligence, 3289–3295. https://doi.org/10.24963/ijcai.2019/456Google ScholarCross Ref
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. [n.d.]. ”Why Should I Trust You?” Explaining the Predictions of Any Classifier. Technical Report. arxiv:1602.04938v1Google Scholar
- Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, and Rayid Ghani. 2018. Aequitas: A Bias and Fairness Audit Toolkit. (nov 2018). arxiv:1811.05577http://arxiv.org/abs/1811.05577Google Scholar
- Marcus Specht, Andreas Lorenz, and Andreas Zimmermann. 2006. An architecture for contextualized learning experiences. In Proceedings - Sixth International Conference on Advanced Learning Technologies, ICALT 2006, Vol. 2006. 169–173. https://doi.org/10.1109/icalt.2006.1652397Google ScholarCross Ref
- Sriram Vasudevan, Cyrus DiCiccio, and Kinjal Basu. 2020. Addressing bias in large-scale AI applications: The LinkedIn Fairness Toolkit. https://engineering.linkedin.com/blog/2020/lift-addressing-bias-in-large-scale-ai-applicationsGoogle Scholar
- Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2 (2017). https://doi.org/10.1177/2053951717743530Google ScholarCross Ref
- Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making. Conference on Human Factors in Computing Systems - Proceedings 2018-April (feb 2018). https://doi.org/10.1145/3173574.3174014 arxiv:1802.01029Google ScholarDigital Library
- Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. IEEE/ACM International Workshop on Software Fairness 18 (2018). https://doi.org/10.1145/3194770.3194776Google ScholarDigital Library
- Benjamin Wilson, Judy Hoffman, and Jamie Morgenstern. 2019. Predictive Inequity in Object Detection. (feb 2019). arxiv:1902.11097http://arxiv.org/abs/1902.11097Google Scholar
- Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*IR: A fair top-k ranking algorithm. In International Conference on Information and Knowledge Management, Proceedings, Vol. Part F131841. Association for Computing Machinery, New York, NY, USA, 1569–1578. https://doi.org/10.1145/3132847.3132938Google ScholarDigital Library
Index Terms
- Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits
Recommendations
Airtime Fairness for IEEE 802.11 Multirate Networks
Under a multi rate network scenario, the IEEE 802.11 DCF MAC fails to provide air-time fairness for all competing stations since the protocol is designed for ensuring max-min throughput fairness and the maximum achievable throughput by any station gets ...
Fairness in multi-hop wireless backhaul networks: a dynamic estimation approach
QShine '08: Proceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and RobustnessIn this work, we consider the problem of fairness for Transit Access Points (TAP) in multi-hop wireless backhaul networks. Existing approaches are not practical due to the requirement for modifications to the MAC layer or queueing operations of TAPs, or ...
Fair round robin binary countdown to achieve QoS guarantee and fairness in WLANs
How to guarantee both quality of service (QoS) and fairness in wireless local area networks (WLANs) is a challenging issue. To touch this issue, a fair medium access control (MAC) scheme called fair round robin binary countdown (FRRBC) adopting the ...
Comments