Skip to main content

Detecting Simpson’s Paradox: A Step Towards Fairness in Machine Learning

  • Conference paper
  • First Online:
New Trends in Database and Information Systems (ADBIS 2022)

Abstract

In the last two decades, artificial intelligence (AI) and machine learning (ML) have grown tremendously. However, understanding and assessing the impacts of causality and statistical paradoxes are still some of the critical challenges in their domains. Currently, these terms are widely discussed within the context of explainable AI (XAI) and algorithmic fairness. However, they are still not in the mainstream AI and ML application development scenarios. In this paper, first, we discuss the impact of Simpson’s paradox on linear trends, i.e., on continuous values, and then we demonstrate its effects via three benchmark training datasets used in ML. Next, we provide an algorithm for detecting Simpson’s paradox. The algorithm has experimented with the three datasets and appears beneficial in detecting the cases of Simpson’s paradox in continuous values. In future, the algorithm can be utilized in designing a certain next-generation platform for fairness in ML.

This work has been partially conducted in the project “ICT programme” which was supported by the European Union through the European Social Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB 1994 - the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann (1994)

    Google Scholar 

  2. Blyth, C.R.: On Simpson’s paradox and the sure-thing principle. J. Am. Stat. Assoc. 67(338), 364–366 (1972)

    Article  MathSciNet  Google Scholar 

  3. Cattell, R.B.: P-technique factorization and the determination of individual dynamic structure. J. Clin. Psychol. 8, 5–10 (1952)

    Article  Google Scholar 

  4. Conger, A.J.: A revised definition for suppressor variables: a guide to their identification and interpretation. Educ. Psychol. Meas. 34(1), 35–46 (1974)

    Article  Google Scholar 

  5. Dawid, A.P.: Conditional independence in statistical theory. J. Roy. Stat. Soc. Ser. B (Methodol.) 41(1), 1–15 (1979). https://doi.org/10.1111/j.2517-6161.1979.tb01052.x

    Article  MathSciNet  MATH  Google Scholar 

  6. Draheim, D.: DEXA 2019 keynote presentation: future perspectives of association rule mining based on partial conditionalization, Linz, Austria, 28th August 2019. https://doi.org/10.13140/RG.2.2.17763.48163

  7. Draheim, D.: Future perspectives of association rule mining based on partial conditionalization. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) Proceedings of DEXA’2019 - the 30th International Conference on Database and Expert Systems Applications. LNCS, vol. 11706, p. xvi. Springer, Heidelberg (2019)

    Google Scholar 

  8. Fisher, R.A.: The use of multiple measurement in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936). https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

    Article  Google Scholar 

  9. Fisher, R.A.: Iii. the influence of rainfall on the yield of wheat at rothamsted. Phil. Trans. Roy. Soc. Lond. Ser. B Containing Papers Biol. Charact. 213(402–410), 89–142 (1925)

    Google Scholar 

  10. Freitas, A.A., McGarry, K.J., Correa, E.S.: Integrating bayesian networks and simpson’s paradox in data mining. In: Texts in Philosophy. College Publications (2007)

    Google Scholar 

  11. Gorman, K.B., Williams, T.D., Fraser, W.R.: Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (genus pygoscelis). PLOS ONE 9(3), 1–14 (2014). https://doi.org/10.1371/journal.pone.0090081

  12. Horst, A.M., Hill, A.P., Gorman, K.B.: palmerpenguins: Palmer Archipelago (Antarctica) penguin data (2020). https://doi.org/10.5281/zenodo.3960218, https://allisonhorst.github.io/palmerpenguins/, r package version 0.1.0

  13. Julia, A., Jeff, L., Surya, M., Lauren, K.: Machine Bias, www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing?token=TiqCeZIj4uLbXl91e3wM2PnmnWbCVOvS

  14. Kaushik, M., Sharma, R., Peious, S.A., Shahin, M., Ben Yahia, S., Draheim, D.: On the potential of numerical association rule mining. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) FDSE 2020. CCIS, vol. 1306, pp. 3–20. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_1

    Chapter  Google Scholar 

  15. Kaushik, M., Sharma, R., Peious, S.A., Shahin, M., Yahia, S.B., Draheim, D.: A systematic assessment of numerical association rule mining methods. SN Comput. Sci. 2(5), 1–13 (2021). https://doi.org/10.1007/s42979-021-00725-2

    Article  Google Scholar 

  16. King, G., Roberts, M.: Ei: a (n r) program for ecological inference. Harvard University (2012)

    Google Scholar 

  17. MacKinnon, D.P., Fairchild, A.J., Fritz, M.S.: Mediation analysis. Ann. Rev. Psychol. 58(1), 593–614 (2007). https://doi.org/10.1146/annurev.psych.58.110405.085542

    Article  Google Scholar 

  18. O’Neil, C.: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group, New York (2016)

    MATH  Google Scholar 

  19. Pearl, J.: Causal inference without counterfactuals: comment. J. Am. Stat. Assoc. 95(450), 428–431 (2000)

    Google Scholar 

  20. Pearl, J.: Understanding Simpson’s paradox. SSRN Electron. J. 68 (2013). https://doi.org/10.2139/ssrn.2343788

  21. Pearson Karl, L.A., Leslie, B.M.: Genetic (reproductive) selection: inheritance of fertility in man, and of fecundity in thoroughbred racehorses. Phil. Trans. Roy. Soc. Lond. Ser. A 192, 257–330 (1899)

    Article  Google Scholar 

  22. Quinlan, J.: Combining instance-based and model-based learning. In: Machine Learning Proceedings 1993, pp. 236–243. Elsevier (1993). https://doi.org/10.1016/B978-1-55860-307-3.50037-X

  23. Robinson, W.S.: Ecological correlations and the behavior of individuals. Am. Sociol. Rev. 15(3), 351–357 (1950)

    Article  Google Scholar 

  24. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)

    Article  MathSciNet  Google Scholar 

  25. Sharma, R., Peious, S.A.: Towards unification of decision support technologies: Statistical reasoning, OLAP and association rule mining. https://github.com/rahulgla/unification

  26. Simpson, E.H.: The interpretation of interaction in contingency tables. J. Roy. Stat. Soc. Ser. B (Methodol.) 13(2), 238–241 (1951)

    MathSciNet  MATH  Google Scholar 

  27. Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (1996)

    Google Scholar 

  28. Tu, Y.K., Gunnell, D., Gilthorpe, M.S.: Simpson’s paradox, lord’s paradox, and suppression effects are the same phenomenon-the reversal paradox. Emerg. Themes Epidemiol. 5(1), 1–9 (2008)

    Article  Google Scholar 

  29. Yule, G.U.: Notes on the theory of association of attributes in statistics. Biometrika 2(2), 121–134 (1903)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, R. et al. (2022). Detecting Simpson’s Paradox: A Step Towards Fairness in Machine Learning. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15743-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15742-4

  • Online ISBN: 978-3-031-15743-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics