Skip to main content

Analysis of Graphical Causal Models with Discretized Data

  • Conference paper
  • First Online:
Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2022)

Abstract

In several fields, sample data are observed at discrete instead of continuous levels. For example, in psychology an individual’s disease level is typically observed as ‘mild’, ‘moderate’ or ‘strong’, while the underlying mental disorder intensity is potentially a continuous variable. Implications of such discretization in linear regression are well-known: uncertainty increases and estimated causal relations become biased and inconsistent. For more complex models, implications of discretization are not theoretically studied. This paper considers an empirical study of complex models where causal relationships are unknown, some variables are discretized and graphical causal models are used to estimate causal relationships and effects. We study the implications of discretization on the obtained results using simulations. We show that discretization affects the correct estimation of causal relations and the uncertainty of obtained causal relations between discretized variables and non-discretized variables. In addition, we show that discretization influences estimated causal effects and we relate this influence to the properties of discretized data and sample size.

We thank David JeanJean for his preliminary research on this project. Baştürk, Hanoch and Habtewold are financially supported by the Netherlands Organization for Scientific Research (NWO) under grant number 195.187.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Similar conclusions with more involved derivations hold when a constant is included and \(X_1, X_2,X_3\) are matrices.

References

  1. Almeida, R.J., Adriaans, G., Shapovalova, Y.: Graphical causal models and imputing missing data: a preliminary study. In: Lesot, M.-J., Vieira, S., Reformat, M.Z., Carvalho, J.P., Wilbik, A., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2020. CCIS, vol. 1237, pp. 485–496. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50146-4_36

    Chapter  Google Scholar 

  2. Barnwell-Ménard, J.L., Li, Q., Cohen, A.A.: Effects of categorization method, regression type, and variable distribution on the inflation of type-i error rate when categorizing a confounding variable. Stat. Med. 34(6), 936–949 (2015)

    Article  MathSciNet  Google Scholar 

  3. Cornelisz, I., Cuijpers, P., Donker, T., van Klaveren, C.: Addressing missing data in randomized clinical trials: a causal inference perspective. PloS one 15(7), e0234349 (2020)

    Article  Google Scholar 

  4. Cui, R., Groot, P., Heskes, T.: Learning causal structure from mixed data with missing values using Gaussian copula models. Stat. Comput. 29(2), 311–333 (2018). https://doi.org/10.1007/s11222-018-9810-x

    Article  MathSciNet  MATH  Google Scholar 

  5. Handhayani, T., Cussens, J.: Kernel-based approach to handle mixed data for inferring causal graphs. arXiv preprint arXiv:1910.03055 (2019)

  6. Kalisch, M., Bühlman, P.: Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8(3), 613–636 (2007)

    MATH  Google Scholar 

  7. Kalisch, M., Mächler, M., Colombo, D., Maathuis, M.H., Bühlmann, P.: Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47(11), 1–26 (2012)

    Article  Google Scholar 

  8. Maathuis, M.H., Kalisch, M., Bühlmann, P.: Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37(6A), 3133–3164 (2009)

    Article  MathSciNet  Google Scholar 

  9. Maxwell, S.E., Delaney, H.D.: Bivariate median splits and spurious statistical significance. Psychol. Bull. 113(1), 181 (1993)

    Article  Google Scholar 

  10. Pearl, J., Verma, T.S.: A statistical semantics for causation. Stat. Comput. 2(2), 91–95 (1992)

    Article  Google Scholar 

  11. Rohrer, J.M.: Thinking clearly about correlations and causation: graphical causal models for observational data. Adv. Methods Pract. Psychol. Sci. 1(1), 27–42 (2018)

    Article  MathSciNet  Google Scholar 

  12. Sokolova, E., Groot, P., Claassen, T., von Rhein, D., Buitelaar, J., Heskes, T.: Causal discovery from medical data: dealing with missing values and a mixture of discrete and continuous data. In: Holmes, J.H., Bellazzi, R., Sacchi, L., Peek, N. (eds.) AIME 2015. LNCS (LNAI), vol. 9105, pp. 177–181. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19551-3_23

    Chapter  Google Scholar 

  13. Spirtes, P., Glymour, C.: An algorithm for fast recovery of sparse causal graphs. Social Sci. Comput. Rev. 9(1), 62–72 (1991)

    Article  Google Scholar 

  14. Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT press, Cambridge (2000)

    MATH  Google Scholar 

  15. Thoresen, M.: Spurious interaction as a result of categorization. BMC Med. Res. Methodol. 19(1), 1–8 (2019)

    Article  Google Scholar 

  16. Zhong, W., et al.: Inferring regulatory networks from mixed observational data using directed acyclic graphs. Front. Genet. 11, 8 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Jorge Almeida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hanoch, O., Baştürk, N., Almeida, R.J., Habtewold, T.D. (2022). Analysis of Graphical Causal Models with Discretized Data. In: Ciucci, D., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2022. Communications in Computer and Information Science, vol 1602. Springer, Cham. https://doi.org/10.1007/978-3-031-08974-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08974-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08973-2

  • Online ISBN: 978-3-031-08974-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics