ABSTRACT
As machine learning models become more widely used in important decision-making processes, the need for identifying and mitigating potential sources of bias has increased substantially. Using two-distribution (specified complexity) hypothesis tests, we identify biases in training data with respect to proposed distributions and without the need to train a model, distinguishing our methods from common output-based fairness tests. Furthermore, our methods allow us to return a "closest plausible explanation" for a given dataset, potentially revealing underlying biases in the processes that generated them. We also show that a binomial variation of this hypothesis test could be used to identify bias in certain directions, or towards certain outcomes, and again return a closest plausible explanation. The benefits of this binomial variation are compared with other hypothesis tests, including the exact binomial. Lastly, potential industrial applications of our methods are shown using two real-world datasets.
Supplemental Material
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Accessed 07/05/2021.Google Scholar
- Arthur Asuncion and David Newman. 2007. UCI Machine Learning Repository.Google Scholar
- Onur Avci, Osama Abdeljaber, Serkan Kiranyaz, Mohammed Hussein, Moncef Gabbouj, and Daniel J Inman. 2021. A Review of Vibration-Based Damage Detection in Civil Structures: From Traditional Methods to Machine Learning and Deep Learning Applications. Mechanical systems and signal processing, Vol. 147 (2021), 107077.Google Scholar
- Agathe Balayn, Christoph Lofi, and Geert-Jan Houben. 2021. Managing Bias and Unfairness in Data for Decision Support: A Survey of Machine Learning and Data Engineering Approaches to Identify and Mitigate Bias and Unfairness within Data Management and Analytics Systems. The VLDB Journal (2021), 1--30.Google Scholar
- Matias Barenstein. 2019. ProPublica's COMPAS Data Revisited. arXiv preprint arXiv:1906.04711 (2019).Google Scholar
- Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilović, et almbox. 2019. AI Fairness 360: An Extensible Toolkit for Detecting and Mitigating Algorithmic Bias. IBM Journal of Research and Development, Vol. 63, 4/5 (2019), 4--1.Google ScholarCross Ref
- Simon Caton and Christian Haas. 2020. Fairness in Machine Learning: A Survey. arXiv preprint arXiv:2010.04053 (2020).Google Scholar
- L. Elisa Celis, Vijay Keswani, and Nisheeth Vishnoi. 2020. Data Preprocessing to Mitigate Bias: A Maximum Entropy Based Approach. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 1349--1359.Google Scholar
- Andrew Cotter, Heinrich Jiang, Maya R Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan. 2019. Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals. J. Mach. Learn. Res., Vol. 20, 172 (2019), 1--59.Google Scholar
- Daniel Andrés D'iaz-Pachón, Juan Pablo Sáenz, and J Sunil Rao. 2020. Hypothesis Testing with Active Information . Statistics & Probability Letters, Vol. 161 (2020), 108742.Google ScholarCross Ref
- Cyrus DiCiccio, Sriram Vasudevan, Kinjal Basu, Krishnaram Kenthapadi, and Deepak Agarwal. 2020. Evaluating Fairness Using Permutation Tests. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1467--1477.Google ScholarDigital Library
- Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259--268.Google Scholar
- Robert M Hazen, Patrick L Griffin, James M Carothers, and Jack W Szostak. 2007. Functional Information and the Emergence of Biocomplexity. Proceedings of the National Academy of Sciences, Vol. 104, suppl 1 (2007), 8574--8581.Google ScholarCross Ref
- Megan L Head, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. 2015. The Extent and Consequences of P-Hacking in Science. PLoS biology, Vol. 13, 3 (2015), e1002106.Google Scholar
- Cynthia Hom, Amani Maina-Kilaas, Kevin Ginta, Cindy Lay, and George D Montañez. 2021. The Gopher's Gambit: Survival Advantages of Artifact-Based Intention Perception. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,, Ana Paula Rocha, Luc Steels, and H. Jaap van den Herik (Eds.). INSTICC, SciTePress, 205--215. https://doi.org/10.5220/0010207502050215Google ScholarCross Ref
- Heinrich Jiang and Ofir Nachum. 2020. Identifying and Correcting Label Bias in Machine Learning. In International Conference on Artificial Intelligence and Statistics. PMLR, 702--712.Google Scholar
- Konstantina Kourou, Themis P Exarchos, Konstantinos P Exarchos, Michalis V Karamouzis, and Dimitrios I Fotiadis. 2015. Machine Learning Applications in Cancer Prognosis and Prediction. Computational and structural biotechnology journal, Vol. 13 (2015), 8--17.Google Scholar
- Dieter Kraft. 1988. A Software Package for Sequential Quadratic Programming. Technical Report DFVLR-FB 88--28. DLR German Aerospace Center -- Institute for Flight Mechanics, Koln, Germany.Google Scholar
- Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys (CSUR), Vol. 54, 6 (2021), 1--35.Google ScholarDigital Library
- George D. Montañez. 2018. A Unified Model of Complex Specified Information . BIO-Complexity, Vol. 2018, 4 (2018).Google ScholarCross Ref
- Razieh Nabi-Abdolyousefi et al. 2021. Causal Inference Methods for Bias Correction in Data Analyses. Ph.,D. Dissertation. Johns Hopkins University.Google Scholar
- Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T Rodolfa, and Rayid Ghani. 2018. Aequitas: A Bias and Fairness Audit Toolkit. arXiv preprint arXiv:1811.05577 (2018).Google Scholar
- Bahar Taskesen, Jose Blanchet, Daniel Kuhn, and Viet Anh Nguyen. 2021. A Statistical Test for Probabilistic Fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 648--665.Google ScholarDigital Library
- Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2017. Fairtest: Discovering Unwarranted Associations in Data-Driven Applications. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 401--416.Google Scholar
- Masatoshi Tsuchiya. 2018. Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).Google Scholar
- Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart M Shieber. 2020. Investigating Gender Bias in Language Models Using Causal Mediation Analysis.. In NeurIPS.Google Scholar
- Lu Zhang, Yongkai Wu, and Xintao Wu. 2017a. A Causal Framework for Discovering and Removing Direct and Indirect Discrimination. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3929--3935.Google ScholarCross Ref
- Lu Zhang, Yongkai Wu, and Xintao Wu. 2017b. Achieving Non-Discrimination in Data Release. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1335--1344.Google ScholarDigital Library
Index Terms
- Identifying Bias in Data Using Two-Distribution Hypothesis Tests
Recommendations
Bias on Demand: A Modelling Framework That Generates Synthetic Data With Bias
FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and TransparencyNowadays, Machine Learning (ML) systems are widely used in various businesses and are increasingly being adopted to make decisions that can significantly impact people’s lives. However, these decision-making systems rely on data-driven learning, which ...
Fuzzy statistics: hypothesis testing
Our method of estimation of parameters in statistics uses a set of confidence intervals producing a triangular shaped fuzzy number for the estimator. Using this fuzzy estimator in hypothesis testing produces a fuzzy test statistic and fuzzy critical ...
Universal and Composite Hypothesis Testing via Mismatched Divergence
For the universal hypothesis testing problem, where the goal is to decide between the known null hypothesis distribution and some other unknown distribution, Hoeffding proposed a universal test in the nineteen sixties. Hoeffding's universal test ...
Comments