Abstract
Missing data present a challenge for most machine learning approaches. When a generative probabilistic model of the data is available, an effective approach is to marginalize missing values out. Probabilistic circuits are expressive generative models that allow for efficient exact inference. However, data is often missing not at random, and marginalization can lead to overconfident and wrong conclusions. In this work, we develop an efficient algorithm for assessing the robustness of classifications made by probabilistic circuits to imputations of the non-ignorable portion of missing data at prediction time. We show that our algorithm is exact when the model satisfies certain constraints, which is the case for the recent proposed Generative Random Forests, that equip Random Forest Classifiers with a full probabilistic model of the data. We also show how to extend our approach to handle non-ignorable missing data at training time.
Supported by CAPES Finance 001, CNPQ grant #304012/2019-0.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Smoothness is also called completeness in the context of Sum-Product Networks.
- 2.
Determinism is also called selectivity in the context of Sum-Product Networks.
- 3.
By contributing to the PC value, we mean that there is path from the root to \(\mathtt {M_j}\) where each node evaluates to a positive value for the given instance.
- 4.
References
Antonucci, A., Piatti, A.: Modeling unreliable observations in Bayesian networks by credal networks. In: Proceedings of the Third International Conference on Scalable Uncertainty Management (SUM), pp. 28–39 (2009)
Antonucci, A., Zaffalon, M.: Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks. Int. J. Approximate Reasoning 49(2), 345–361 (2008)
Azur, M.J., Stuart, E.A., Frangakis, C., Leaf, P.J.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 40–49 (2011)
Choi, Y., Vergari, A., Van den Broeck, G.: Probabilistic circuits: a unifying framework for tractable probabilistic models (2020)
Correia, A.H.C., Peharz, R., de Campos, C.P.: Joints in random forests. In: Advances in Neural Information Processing Systems 33 (NeurIPS) (2020)
Davis, J., Domingos, P.: Bottom-up learning of Markov network structure. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 271–280 (2010)
Khosravi, P., Choi, Y., Liang, Y., Vergari, A., Van den Broeck, G.: On tractable computation of expected predictions. In: Advances in Neural Information Processing Systems 32 (NeurIPS) (2019)
Khosravi, P., Liang, Y., Choi, Y., Van den Broeck, G.: What to expect of classifiers? Reasoning about logistic regression with missing features. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI) (2019)
Kisa, D., Van den Broeck, G., Choi, A., Darwiche, A.: Probabilistic sentential decision diagrams. In: Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (PKDD), pp. 1–10 (2014)
Levray, A., Belle, V.: Learning credal sum-product networks. In: Proceedings of the 2nd Conference on Automated Knowledge Base Construction (2020)
Liang, Y., Van den Broeck, G.: Learning logistic circuits. In: Proceedings of the 33rd Conference on Artificial Intelligence (AAAI) (2019)
Llerena, J.V., Mauá, D.D.: Efficient algorithms for robustness analysis of maximum a posteriori inference in selective sum-product networks. Int. J. Approximate Reasoning 126, 158–180 (2020)
Manski, C.F.: Partial identification with missing data: concepts and findings. Int. J. Approximate Reasoning 39(2–3), 151–165 (2005)
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Recommender systems: missing data and statistical model estimation. In: Proceedings of the 22nd International Joint Conference in Artificial Intelligence (IJCAI) (2011)
Mauá, D.D., De Campos, C.P., Benavoli, A., Antonucci, A.: Probabilistic inference in credal networks: new complexity results. J. Artif. Intell. Res. 50, 603–637 (2014)
Mauá, D.D., Conaty, D., Cozman, F.G., Poppenhaeger, K., de Campos, C.P.: Robustifying sum-product networks. Int. J. Approximate Reasoning 101, 163–180 (2018)
Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 1277–1285 (2013)
Peharz, R., Gens, R., Domingos, P.: Learning selective sum-product networks. In: Proceedings of the Workshop on Learning Tractable Probabilistic Models (2014)
Peharz, R., Gens, R., Pernkopf, F., Domingos, P.: On the latent variable interpretation in sum-product networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(10), 2030–2044 (2017)
Peharz, R., et al.: Random sum-product networks: a simple and effective approach to probabilistic deep learning. In: Proceedings of The 35th Uncertainty in Artificial Intelligence Conference (UAI) (2020)
Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 337–346 (2011)
Rahman, T., Kothalkar, P., Gogate, V.: Cutset networks: a simple, tractable, and scalable approach for improving the accuracy of Chow-Liu trees. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 630–645 (2014)
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Shao, X., Alejandro Molina, A.V., Stelzner, K., Peharz, R., Liebig, T., Kersting, K.: Conditional sum-product networks: imposing structure on deep probabilistic architectures. In: Proceedings of the 10th International Conference on Probabilistic Graphical Models (PGM) (2020)
Shen, Y., Choi, A., Darwiche, A.: A tractable probabilistic model for subset selection. In: Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI) (2017)
Shen, Y., Goyanka, A., Darwiche, A., Choi, A.: Structured Bayesian networks: from inference to learning with routes. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI) (2019)
Shin, J., Wu, S., Wang, F., Sa, C.D., Zhang, C., Ré, C.: Incremental knowledge base construction using deepdive. In: Proceedings of the VLDB Endowment (2015)
Zaffalon, M.: Conservative rules for predictive inference with incomplete data. In: Proceedings of the 4th International Symposium on Imprecise Probabilities and Their Applications (ISIPTA), pp. 406–415 (2005)
Zaffalon, M., Corani, G., Mauá, D.: Evaluating credal classifiers by utility-discounted predictive accuracy. Int. J. Approximate Reasoning 53(8), 1282–1301 (2012)
Zaffalon, M., Miranda, E.: Conservative inference rule for uncertain reasoning under incompleteness. J. Artif. Intell. Res. 34, 757–821 (2009)
Zheng, K., Pronobis, A., Rao, R.P.N.: Learning graph-structured sum-product networks for probabilistic semantic maps. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Proof of Theorem 1
Proof of Theorem 1
Membership in coNP is trivial: given a configuration \(\mathbf {u}\) we can compute \(\mathtt {M}(y',\mathbf {o},\mathbf {u})\) and \(\mathtt {M}(y'',\mathbf {o},\mathbf {u})\) in linear time and decide the sign of its difference in constant time. Hence we have a polynomial certificate that the problem is not in the language.
We show hardness by reduction from the subset sum problem: Given positive integers \(z_1,\dotsc ,z_n\), decide
To solve that problem, build a tree-shaped deterministic PC as shown above, where \(U_i\) are binary variables, \(P_1(u) = \prod _i e^{-2v_iu_i}\) and \(P_2(u)=\prod _i e^{-v_iu_i}\). Note that the PC is not class-factorized. Use the PC to compute:
If we call \(x:=\exp (-\sum _i v_i u_i)\), the above expression is the minimum for positive x of \(f(x):=ax^2 - bx\). Function f is a strictly convex function minimized at \(x=b/(2a)\). Selecting a and b such that \(b/(2a)=e^{-1}\) makes the minimum occur at \(\sum _i v_iu_i = 1\). Thus, there is a solution to (11) if and only if \(\delta (y',y'') \le -ae^{-2}\). This proof is not quite valid because the distributions \(P_1(u)\) and \(P_2(u)\) use non-rational numbers. However, we can use the same strategy as used to prove Theorem 5 in [16] and exploit the rational gap between yes and no instances of the original problem to encode a rational approximation of \(P_1\) and \(P_2\) of polynomial size. \(\square \)
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Llerena, J.V., Mauá, D.D., Antonucci, A. (2021). Cautious Classification with Data Missing Not at Random Using Generative Random Forests. In: Vejnarová, J., Wilson, N. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2021. Lecture Notes in Computer Science(), vol 12897. Springer, Cham. https://doi.org/10.1007/978-3-030-86772-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-86772-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86771-3
Online ISBN: 978-3-030-86772-0
eBook Packages: Computer ScienceComputer Science (R0)