skip to main content
10.1145/3514094.3534169acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article
Public Access

Identifying Bias in Data Using Two-Distribution Hypothesis Tests

Published:27 July 2022Publication History

ABSTRACT

As machine learning models become more widely used in important decision-making processes, the need for identifying and mitigating potential sources of bias has increased substantially. Using two-distribution (specified complexity) hypothesis tests, we identify biases in training data with respect to proposed distributions and without the need to train a model, distinguishing our methods from common output-based fairness tests. Furthermore, our methods allow us to return a "closest plausible explanation" for a given dataset, potentially revealing underlying biases in the processes that generated them. We also show that a binomial variation of this hypothesis test could be used to identify bias in certain directions, or towards certain outcomes, and again return a closest plausible explanation. The benefits of this binomial variation are compared with other hypothesis tests, including the exact binomial. Lastly, potential industrial applications of our methods are shown using two real-world datasets.

Skip Supplemental Material Section

Supplemental Material

aies134.mp4

mp4

99.2 MB

References

  1. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Accessed 07/05/2021.Google ScholarGoogle Scholar
  2. Arthur Asuncion and David Newman. 2007. UCI Machine Learning Repository.Google ScholarGoogle Scholar
  3. Onur Avci, Osama Abdeljaber, Serkan Kiranyaz, Mohammed Hussein, Moncef Gabbouj, and Daniel J Inman. 2021. A Review of Vibration-Based Damage Detection in Civil Structures: From Traditional Methods to Machine Learning and Deep Learning Applications. Mechanical systems and signal processing, Vol. 147 (2021), 107077.Google ScholarGoogle Scholar
  4. Agathe Balayn, Christoph Lofi, and Geert-Jan Houben. 2021. Managing Bias and Unfairness in Data for Decision Support: A Survey of Machine Learning and Data Engineering Approaches to Identify and Mitigate Bias and Unfairness within Data Management and Analytics Systems. The VLDB Journal (2021), 1--30.Google ScholarGoogle Scholar
  5. Matias Barenstein. 2019. ProPublica's COMPAS Data Revisited. arXiv preprint arXiv:1906.04711 (2019).Google ScholarGoogle Scholar
  6. Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilović, et almbox. 2019. AI Fairness 360: An Extensible Toolkit for Detecting and Mitigating Algorithmic Bias. IBM Journal of Research and Development, Vol. 63, 4/5 (2019), 4--1.Google ScholarGoogle ScholarCross RefCross Ref
  7. Simon Caton and Christian Haas. 2020. Fairness in Machine Learning: A Survey. arXiv preprint arXiv:2010.04053 (2020).Google ScholarGoogle Scholar
  8. L. Elisa Celis, Vijay Keswani, and Nisheeth Vishnoi. 2020. Data Preprocessing to Mitigate Bias: A Maximum Entropy Based Approach. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 1349--1359.Google ScholarGoogle Scholar
  9. Andrew Cotter, Heinrich Jiang, Maya R Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan. 2019. Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals. J. Mach. Learn. Res., Vol. 20, 172 (2019), 1--59.Google ScholarGoogle Scholar
  10. Daniel Andrés D'iaz-Pachón, Juan Pablo Sáenz, and J Sunil Rao. 2020. Hypothesis Testing with Active Information . Statistics & Probability Letters, Vol. 161 (2020), 108742.Google ScholarGoogle ScholarCross RefCross Ref
  11. Cyrus DiCiccio, Sriram Vasudevan, Kinjal Basu, Krishnaram Kenthapadi, and Deepak Agarwal. 2020. Evaluating Fairness Using Permutation Tests. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1467--1477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259--268.Google ScholarGoogle Scholar
  13. Robert M Hazen, Patrick L Griffin, James M Carothers, and Jack W Szostak. 2007. Functional Information and the Emergence of Biocomplexity. Proceedings of the National Academy of Sciences, Vol. 104, suppl 1 (2007), 8574--8581.Google ScholarGoogle ScholarCross RefCross Ref
  14. Megan L Head, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. 2015. The Extent and Consequences of P-Hacking in Science. PLoS biology, Vol. 13, 3 (2015), e1002106.Google ScholarGoogle Scholar
  15. Cynthia Hom, Amani Maina-Kilaas, Kevin Ginta, Cindy Lay, and George D Montañez. 2021. The Gopher's Gambit: Survival Advantages of Artifact-Based Intention Perception. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,, Ana Paula Rocha, Luc Steels, and H. Jaap van den Herik (Eds.). INSTICC, SciTePress, 205--215. https://doi.org/10.5220/0010207502050215Google ScholarGoogle ScholarCross RefCross Ref
  16. Heinrich Jiang and Ofir Nachum. 2020. Identifying and Correcting Label Bias in Machine Learning. In International Conference on Artificial Intelligence and Statistics. PMLR, 702--712.Google ScholarGoogle Scholar
  17. Konstantina Kourou, Themis P Exarchos, Konstantinos P Exarchos, Michalis V Karamouzis, and Dimitrios I Fotiadis. 2015. Machine Learning Applications in Cancer Prognosis and Prediction. Computational and structural biotechnology journal, Vol. 13 (2015), 8--17.Google ScholarGoogle Scholar
  18. Dieter Kraft. 1988. A Software Package for Sequential Quadratic Programming. Technical Report DFVLR-FB 88--28. DLR German Aerospace Center -- Institute for Flight Mechanics, Koln, Germany.Google ScholarGoogle Scholar
  19. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys (CSUR), Vol. 54, 6 (2021), 1--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. George D. Montañez. 2018. A Unified Model of Complex Specified Information . BIO-Complexity, Vol. 2018, 4 (2018).Google ScholarGoogle ScholarCross RefCross Ref
  21. Razieh Nabi-Abdolyousefi et al. 2021. Causal Inference Methods for Bias Correction in Data Analyses. Ph.,D. Dissertation. Johns Hopkins University.Google ScholarGoogle Scholar
  22. Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T Rodolfa, and Rayid Ghani. 2018. Aequitas: A Bias and Fairness Audit Toolkit. arXiv preprint arXiv:1811.05577 (2018).Google ScholarGoogle Scholar
  23. Bahar Taskesen, Jose Blanchet, Daniel Kuhn, and Viet Anh Nguyen. 2021. A Statistical Test for Probabilistic Fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 648--665.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2017. Fairtest: Discovering Unwarranted Associations in Data-Driven Applications. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 401--416.Google ScholarGoogle Scholar
  25. Masatoshi Tsuchiya. 2018. Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).Google ScholarGoogle Scholar
  26. Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart M Shieber. 2020. Investigating Gender Bias in Language Models Using Causal Mediation Analysis.. In NeurIPS.Google ScholarGoogle Scholar
  27. Lu Zhang, Yongkai Wu, and Xintao Wu. 2017a. A Causal Framework for Discovering and Removing Direct and Indirect Discrimination. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3929--3935.Google ScholarGoogle ScholarCross RefCross Ref
  28. Lu Zhang, Yongkai Wu, and Xintao Wu. 2017b. Achieving Non-Discrimination in Data Release. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1335--1344.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Identifying Bias in Data Using Two-Distribution Hypothesis Tests

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society
              July 2022
              939 pages
              ISBN:9781450392471
              DOI:10.1145/3514094

              Copyright © 2022 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 July 2022

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate61of162submissions,38%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader