Cautious Classification with Data Missing Not at Random Using Generative Random Forests

Llerena, Julissa Villanueva; Mauá, Denis Deratani; Antonucci, Alessandro

doi:10.1007/978-3-030-86772-0_21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12897))

Included in the following conference series:

European Conference on Symbolic and Quantitative Approaches with Uncertainty

750 Accesses

Abstract

Missing data present a challenge for most machine learning approaches. When a generative probabilistic model of the data is available, an effective approach is to marginalize missing values out. Probabilistic circuits are expressive generative models that allow for efficient exact inference. However, data is often missing not at random, and marginalization can lead to overconfident and wrong conclusions. In this work, we develop an efficient algorithm for assessing the robustness of classifications made by probabilistic circuits to imputations of the non-ignorable portion of missing data at prediction time. We show that our algorithm is exact when the model satisfies certain constraints, which is the case for the recent proposed Generative Random Forests, that equip Random Forest Classifiers with a full probabilistic model of the data. We also show how to extend our approach to handle non-ignorable missing data at training time.

Supported by CAPES Finance 001, CNPQ grant #304012/2019-0.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Smoothness is also called completeness in the context of Sum-Product Networks.
2.
Determinism is also called selectivity in the context of Sum-Product Networks.
3.
By contributing to the PC value, we mean that there is path from the root to $\mathtt {M_j}$ where each node evaluates to a positive value for the given instance.
4.
http://eigentaste.berkeley.edu/dataset.

References

Antonucci, A., Piatti, A.: Modeling unreliable observations in Bayesian networks by credal networks. In: Proceedings of the Third International Conference on Scalable Uncertainty Management (SUM), pp. 28–39 (2009)
Google Scholar
Antonucci, A., Zaffalon, M.: Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks. Int. J. Approximate Reasoning 49(2), 345–361 (2008)
Article MathSciNet Google Scholar
Azur, M.J., Stuart, E.A., Frangakis, C., Leaf, P.J.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 40–49 (2011)
Google Scholar
Choi, Y., Vergari, A., Van den Broeck, G.: Probabilistic circuits: a unifying framework for tractable probabilistic models (2020)
Google Scholar
Correia, A.H.C., Peharz, R., de Campos, C.P.: Joints in random forests. In: Advances in Neural Information Processing Systems 33 (NeurIPS) (2020)
Google Scholar
Davis, J., Domingos, P.: Bottom-up learning of Markov network structure. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 271–280 (2010)
Google Scholar
Khosravi, P., Choi, Y., Liang, Y., Vergari, A., Van den Broeck, G.: On tractable computation of expected predictions. In: Advances in Neural Information Processing Systems 32 (NeurIPS) (2019)
Google Scholar
Khosravi, P., Liang, Y., Choi, Y., Van den Broeck, G.: What to expect of classifiers? Reasoning about logistic regression with missing features. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI) (2019)
Google Scholar
Kisa, D., Van den Broeck, G., Choi, A., Darwiche, A.: Probabilistic sentential decision diagrams. In: Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (PKDD), pp. 1–10 (2014)
Google Scholar
Levray, A., Belle, V.: Learning credal sum-product networks. In: Proceedings of the 2nd Conference on Automated Knowledge Base Construction (2020)
Google Scholar
Liang, Y., Van den Broeck, G.: Learning logistic circuits. In: Proceedings of the 33rd Conference on Artificial Intelligence (AAAI) (2019)
Google Scholar
Llerena, J.V., Mauá, D.D.: Efficient algorithms for robustness analysis of maximum a posteriori inference in selective sum-product networks. Int. J. Approximate Reasoning 126, 158–180 (2020)
Article MathSciNet Google Scholar
Manski, C.F.: Partial identification with missing data: concepts and findings. Int. J. Approximate Reasoning 39(2–3), 151–165 (2005)
Article MathSciNet Google Scholar
Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Recommender systems: missing data and statistical model estimation. In: Proceedings of the 22nd International Joint Conference in Artificial Intelligence (IJCAI) (2011)
Google Scholar
Mauá, D.D., De Campos, C.P., Benavoli, A., Antonucci, A.: Probabilistic inference in credal networks: new complexity results. J. Artif. Intell. Res. 50, 603–637 (2014)
Article MathSciNet Google Scholar
Mauá, D.D., Conaty, D., Cozman, F.G., Poppenhaeger, K., de Campos, C.P.: Robustifying sum-product networks. Int. J. Approximate Reasoning 101, 163–180 (2018)
Google Scholar
Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 1277–1285 (2013)
Google Scholar
Peharz, R., Gens, R., Domingos, P.: Learning selective sum-product networks. In: Proceedings of the Workshop on Learning Tractable Probabilistic Models (2014)
Google Scholar
Peharz, R., Gens, R., Pernkopf, F., Domingos, P.: On the latent variable interpretation in sum-product networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(10), 2030–2044 (2017)
Article Google Scholar
Peharz, R., et al.: Random sum-product networks: a simple and effective approach to probabilistic deep learning. In: Proceedings of The 35th Uncertainty in Artificial Intelligence Conference (UAI) (2020)
Google Scholar
Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 337–346 (2011)
Google Scholar
Rahman, T., Kothalkar, P., Gogate, V.: Cutset networks: a simple, tractable, and scalable approach for improving the accuracy of Chow-Liu trees. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 630–645 (2014)
Google Scholar
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Article MathSciNet Google Scholar
Shao, X., Alejandro Molina, A.V., Stelzner, K., Peharz, R., Liebig, T., Kersting, K.: Conditional sum-product networks: imposing structure on deep probabilistic architectures. In: Proceedings of the 10th International Conference on Probabilistic Graphical Models (PGM) (2020)
Google Scholar
Shen, Y., Choi, A., Darwiche, A.: A tractable probabilistic model for subset selection. In: Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI) (2017)
Google Scholar
Shen, Y., Goyanka, A., Darwiche, A., Choi, A.: Structured Bayesian networks: from inference to learning with routes. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI) (2019)
Google Scholar
Shin, J., Wu, S., Wang, F., Sa, C.D., Zhang, C., Ré, C.: Incremental knowledge base construction using deepdive. In: Proceedings of the VLDB Endowment (2015)
Google Scholar
Zaffalon, M.: Conservative rules for predictive inference with incomplete data. In: Proceedings of the 4th International Symposium on Imprecise Probabilities and Their Applications (ISIPTA), pp. 406–415 (2005)
Google Scholar
Zaffalon, M., Corani, G., Mauá, D.: Evaluating credal classifiers by utility-discounted predictive accuracy. Int. J. Approximate Reasoning 53(8), 1282–1301 (2012)
Article MathSciNet Google Scholar
Zaffalon, M., Miranda, E.: Conservative inference rule for uncertain reasoning under incompleteness. J. Artif. Intell. Res. 34, 757–821 (2009)
Article MathSciNet Google Scholar
Zheng, K., Pronobis, A., Rao, R.P.N.: Learning graph-structured sum-product networks for probabilistic semantic maps. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Statistics, Universidade de São Paulo, São Paulo, Brazil
Julissa Villanueva Llerena & Denis Deratani Mauá
Dalle Molle Institute for Artificial Intelligence, Lugano, Switzerland
Alessandro Antonucci

Authors

Julissa Villanueva Llerena
View author publications
You can also search for this author in PubMed Google Scholar
Denis Deratani Mauá
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Antonucci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julissa Villanueva Llerena .

Editor information

Editors and Affiliations

Institute of Information Theory and Automation, Prague, Czech Republic
Jiřina Vejnarová
Insight Centre for Data Analytics, School of Computer Science and IT University College Cork, Cork, Ireland
Nic Wilson

Proof of Theorem 1

Membership in coNP is trivial: given a configuration $\mathbf {u}$ we can compute $\mathtt {M}(y',\mathbf {o},\mathbf {u})$ and $\mathtt {M}(y'',\mathbf {o},\mathbf {u})$ in linear time and decide the sign of its difference in constant time. Hence we have a polynomial certificate that the problem is not in the language.

We show hardness by reduction from the subset sum problem: Given positive integers $z_1,\dotsc ,z_n$, decide

$$\begin{aligned} \exists u \in \{0,1\}^n: \sum _{i \in [n]} v_i u_i = 1\, , \quad \text {where } v_i = \frac{2z_i}{\sum _{i \in [n]} z_i}\,. \end{aligned}$$

(11)

To solve that problem, build a tree-shaped deterministic PC as shown above, where $U_i$ are binary variables, $P_1(u) = \prod _i e^{-2v_iu_i}$ and $P_2(u)=\prod _i e^{-v_iu_i}$. Note that the PC is not class-factorized. Use the PC to compute:

$$ \delta (y',y'') = \min _u \left[ a\exp \left( -2\sum _i v_iu_i \right) - b\exp \left( -\sum _i v_iu_i \right) \right] \,. $$

If we call $x:=\exp (-\sum _i v_i u_i)$, the above expression is the minimum for positive x of $f(x):=ax^2 - bx$. Function f is a strictly convex function minimized at $x=b/(2a)$. Selecting a and b such that $b/(2a)=e^{-1}$ makes the minimum occur at $\sum _i v_iu_i = 1$. Thus, there is a solution to (11) if and only if $\delta (y',y'') \le -ae^{-2}$. This proof is not quite valid because the distributions $P_1(u)$ and $P_2(u)$ use non-rational numbers. However, we can use the same strategy as used to prove Theorem 5 in [16] and exploit the rational gap between yes and no instances of the original problem to encode a rational approximation of $P_1$ and $P_2$ of polynomial size. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Llerena, J.V., Mauá, D.D., Antonucci, A. (2021). Cautious Classification with Data Missing Not at Random Using Generative Random Forests. In: Vejnarová, J., Wilson, N. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2021. Lecture Notes in Computer Science(), vol 12897. Springer, Cham. https://doi.org/10.1007/978-3-030-86772-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-86772-0_21
Published: 19 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86771-3
Online ISBN: 978-3-030-86772-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cautious Classification with Data Missing Not at Random Using Generative Random Forests

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Proof of Theorem 1

Proof of Theorem 1

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation