Abstract
In several fields, sample data are observed at discrete instead of continuous levels. For example, in psychology an individual’s disease level is typically observed as ‘mild’, ‘moderate’ or ‘strong’, while the underlying mental disorder intensity is potentially a continuous variable. Implications of such discretization in linear regression are well-known: uncertainty increases and estimated causal relations become biased and inconsistent. For more complex models, implications of discretization are not theoretically studied. This paper considers an empirical study of complex models where causal relationships are unknown, some variables are discretized and graphical causal models are used to estimate causal relationships and effects. We study the implications of discretization on the obtained results using simulations. We show that discretization affects the correct estimation of causal relations and the uncertainty of obtained causal relations between discretized variables and non-discretized variables. In addition, we show that discretization influences estimated causal effects and we relate this influence to the properties of discretized data and sample size.
We thank David JeanJean for his preliminary research on this project. Baştürk, Hanoch and Habtewold are financially supported by the Netherlands Organization for Scientific Research (NWO) under grant number 195.187.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Similar conclusions with more involved derivations hold when a constant is included and \(X_1, X_2,X_3\) are matrices.
References
Almeida, R.J., Adriaans, G., Shapovalova, Y.: Graphical causal models and imputing missing data: a preliminary study. In: Lesot, M.-J., Vieira, S., Reformat, M.Z., Carvalho, J.P., Wilbik, A., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2020. CCIS, vol. 1237, pp. 485–496. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50146-4_36
Barnwell-Ménard, J.L., Li, Q., Cohen, A.A.: Effects of categorization method, regression type, and variable distribution on the inflation of type-i error rate when categorizing a confounding variable. Stat. Med. 34(6), 936–949 (2015)
Cornelisz, I., Cuijpers, P., Donker, T., van Klaveren, C.: Addressing missing data in randomized clinical trials: a causal inference perspective. PloS one 15(7), e0234349 (2020)
Cui, R., Groot, P., Heskes, T.: Learning causal structure from mixed data with missing values using Gaussian copula models. Stat. Comput. 29(2), 311–333 (2018). https://doi.org/10.1007/s11222-018-9810-x
Handhayani, T., Cussens, J.: Kernel-based approach to handle mixed data for inferring causal graphs. arXiv preprint arXiv:1910.03055 (2019)
Kalisch, M., Bühlman, P.: Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8(3), 613–636 (2007)
Kalisch, M., Mächler, M., Colombo, D., Maathuis, M.H., Bühlmann, P.: Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47(11), 1–26 (2012)
Maathuis, M.H., Kalisch, M., Bühlmann, P.: Estimating high-dimensional intervention effects from observational data. Ann. Stat. 37(6A), 3133–3164 (2009)
Maxwell, S.E., Delaney, H.D.: Bivariate median splits and spurious statistical significance. Psychol. Bull. 113(1), 181 (1993)
Pearl, J., Verma, T.S.: A statistical semantics for causation. Stat. Comput. 2(2), 91–95 (1992)
Rohrer, J.M.: Thinking clearly about correlations and causation: graphical causal models for observational data. Adv. Methods Pract. Psychol. Sci. 1(1), 27–42 (2018)
Sokolova, E., Groot, P., Claassen, T., von Rhein, D., Buitelaar, J., Heskes, T.: Causal discovery from medical data: dealing with missing values and a mixture of discrete and continuous data. In: Holmes, J.H., Bellazzi, R., Sacchi, L., Peek, N. (eds.) AIME 2015. LNCS (LNAI), vol. 9105, pp. 177–181. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19551-3_23
Spirtes, P., Glymour, C.: An algorithm for fast recovery of sparse causal graphs. Social Sci. Comput. Rev. 9(1), 62–72 (1991)
Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT press, Cambridge (2000)
Thoresen, M.: Spurious interaction as a result of categorization. BMC Med. Res. Methodol. 19(1), 1–8 (2019)
Zhong, W., et al.: Inferring regulatory networks from mixed observational data using directed acyclic graphs. Front. Genet. 11, 8 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Hanoch, O., Baştürk, N., Almeida, R.J., Habtewold, T.D. (2022). Analysis of Graphical Causal Models with Discretized Data. In: Ciucci, D., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2022. Communications in Computer and Information Science, vol 1602. Springer, Cham. https://doi.org/10.1007/978-3-031-08974-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-08974-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08973-2
Online ISBN: 978-3-031-08974-9
eBook Packages: Computer ScienceComputer Science (R0)