Skip to main content

Cautious Classification with Data Missing Not at Random Using Generative Random Forests

  • Conference paper
  • First Online:
Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2021)

Abstract

Missing data present a challenge for most machine learning approaches. When a generative probabilistic model of the data is available, an effective approach is to marginalize missing values out. Probabilistic circuits are expressive generative models that allow for efficient exact inference. However, data is often missing not at random, and marginalization can lead to overconfident and wrong conclusions. In this work, we develop an efficient algorithm for assessing the robustness of classifications made by probabilistic circuits to imputations of the non-ignorable portion of missing data at prediction time. We show that our algorithm is exact when the model satisfies certain constraints, which is the case for the recent proposed Generative Random Forests, that equip Random Forest Classifiers with a full probabilistic model of the data. We also show how to extend our approach to handle non-ignorable missing data at training time.

Supported by CAPES Finance 001, CNPQ grant #304012/2019-0.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Smoothness is also called completeness in the context of Sum-Product Networks.

  2. 2.

    Determinism is also called selectivity in the context of Sum-Product Networks.

  3. 3.

    By contributing to the PC value, we mean that there is path from the root to \(\mathtt {M_j}\) where each node evaluates to a positive value for the given instance.

  4. 4.

    http://eigentaste.berkeley.edu/dataset.

References

  1. Antonucci, A., Piatti, A.: Modeling unreliable observations in Bayesian networks by credal networks. In: Proceedings of the Third International Conference on Scalable Uncertainty Management (SUM), pp. 28–39 (2009)

    Google Scholar 

  2. Antonucci, A., Zaffalon, M.: Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks. Int. J. Approximate Reasoning 49(2), 345–361 (2008)

    Article  MathSciNet  Google Scholar 

  3. Azur, M.J., Stuart, E.A., Frangakis, C., Leaf, P.J.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 40–49 (2011)

    Google Scholar 

  4. Choi, Y., Vergari, A., Van den Broeck, G.: Probabilistic circuits: a unifying framework for tractable probabilistic models (2020)

    Google Scholar 

  5. Correia, A.H.C., Peharz, R., de Campos, C.P.: Joints in random forests. In: Advances in Neural Information Processing Systems 33 (NeurIPS) (2020)

    Google Scholar 

  6. Davis, J., Domingos, P.: Bottom-up learning of Markov network structure. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 271–280 (2010)

    Google Scholar 

  7. Khosravi, P., Choi, Y., Liang, Y., Vergari, A., Van den Broeck, G.: On tractable computation of expected predictions. In: Advances in Neural Information Processing Systems 32 (NeurIPS) (2019)

    Google Scholar 

  8. Khosravi, P., Liang, Y., Choi, Y., Van den Broeck, G.: What to expect of classifiers? Reasoning about logistic regression with missing features. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI) (2019)

    Google Scholar 

  9. Kisa, D., Van den Broeck, G., Choi, A., Darwiche, A.: Probabilistic sentential decision diagrams. In: Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning (PKDD), pp. 1–10 (2014)

    Google Scholar 

  10. Levray, A., Belle, V.: Learning credal sum-product networks. In: Proceedings of the 2nd Conference on Automated Knowledge Base Construction (2020)

    Google Scholar 

  11. Liang, Y., Van den Broeck, G.: Learning logistic circuits. In: Proceedings of the 33rd Conference on Artificial Intelligence (AAAI) (2019)

    Google Scholar 

  12. Llerena, J.V., Mauá, D.D.: Efficient algorithms for robustness analysis of maximum a posteriori inference in selective sum-product networks. Int. J. Approximate Reasoning 126, 158–180 (2020)

    Article  MathSciNet  Google Scholar 

  13. Manski, C.F.: Partial identification with missing data: concepts and findings. Int. J. Approximate Reasoning 39(2–3), 151–165 (2005)

    Article  MathSciNet  Google Scholar 

  14. Marlin, B.M., Zemel, R.S., Roweis, S.T., Slaney, M.: Recommender systems: missing data and statistical model estimation. In: Proceedings of the 22nd International Joint Conference in Artificial Intelligence (IJCAI) (2011)

    Google Scholar 

  15. Mauá, D.D., De Campos, C.P., Benavoli, A., Antonucci, A.: Probabilistic inference in credal networks: new complexity results. J. Artif. Intell. Res. 50, 603–637 (2014)

    Article  MathSciNet  Google Scholar 

  16. Mauá, D.D., Conaty, D., Cozman, F.G., Poppenhaeger, K., de Campos, C.P.: Robustifying sum-product networks. Int. J. Approximate Reasoning 101, 163–180 (2018)

    Google Scholar 

  17. Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 1277–1285 (2013)

    Google Scholar 

  18. Peharz, R., Gens, R., Domingos, P.: Learning selective sum-product networks. In: Proceedings of the Workshop on Learning Tractable Probabilistic Models (2014)

    Google Scholar 

  19. Peharz, R., Gens, R., Pernkopf, F., Domingos, P.: On the latent variable interpretation in sum-product networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(10), 2030–2044 (2017)

    Article  Google Scholar 

  20. Peharz, R., et al.: Random sum-product networks: a simple and effective approach to probabilistic deep learning. In: Proceedings of The 35th Uncertainty in Artificial Intelligence Conference (UAI) (2020)

    Google Scholar 

  21. Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 337–346 (2011)

    Google Scholar 

  22. Rahman, T., Kothalkar, P., Gogate, V.: Cutset networks: a simple, tractable, and scalable approach for improving the accuracy of Chow-Liu trees. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 630–645 (2014)

    Google Scholar 

  23. Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)

    Article  MathSciNet  Google Scholar 

  24. Shao, X., Alejandro Molina, A.V., Stelzner, K., Peharz, R., Liebig, T., Kersting, K.: Conditional sum-product networks: imposing structure on deep probabilistic architectures. In: Proceedings of the 10th International Conference on Probabilistic Graphical Models (PGM) (2020)

    Google Scholar 

  25. Shen, Y., Choi, A., Darwiche, A.: A tractable probabilistic model for subset selection. In: Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI) (2017)

    Google Scholar 

  26. Shen, Y., Goyanka, A., Darwiche, A., Choi, A.: Structured Bayesian networks: from inference to learning with routes. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI) (2019)

    Google Scholar 

  27. Shin, J., Wu, S., Wang, F., Sa, C.D., Zhang, C., Ré, C.: Incremental knowledge base construction using deepdive. In: Proceedings of the VLDB Endowment (2015)

    Google Scholar 

  28. Zaffalon, M.: Conservative rules for predictive inference with incomplete data. In: Proceedings of the 4th International Symposium on Imprecise Probabilities and Their Applications (ISIPTA), pp. 406–415 (2005)

    Google Scholar 

  29. Zaffalon, M., Corani, G., Mauá, D.: Evaluating credal classifiers by utility-discounted predictive accuracy. Int. J. Approximate Reasoning 53(8), 1282–1301 (2012)

    Article  MathSciNet  Google Scholar 

  30. Zaffalon, M., Miranda, E.: Conservative inference rule for uncertain reasoning under incompleteness. J. Artif. Intell. Res. 34, 757–821 (2009)

    Article  MathSciNet  Google Scholar 

  31. Zheng, K., Pronobis, A., Rao, R.P.N.: Learning graph-structured sum-product networks for probabilistic semantic maps. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julissa Villanueva Llerena .

Editor information

Editors and Affiliations

Proof of Theorem 1

Proof of Theorem 1

Membership in coNP is trivial: given a configuration \(\mathbf {u}\) we can compute \(\mathtt {M}(y',\mathbf {o},\mathbf {u})\) and \(\mathtt {M}(y'',\mathbf {o},\mathbf {u})\) in linear time and decide the sign of its difference in constant time. Hence we have a polynomial certificate that the problem is not in the language.

We show hardness by reduction from the subset sum problem: Given positive integers \(z_1,\dotsc ,z_n\), decide

$$\begin{aligned} \exists u \in \{0,1\}^n: \sum _{i \in [n]} v_i u_i = 1\, , \quad \text {where } v_i = \frac{2z_i}{\sum _{i \in [n]} z_i}\,. \end{aligned}$$
(11)
figure a

To solve that problem, build a tree-shaped deterministic PC as shown above, where \(U_i\) are binary variables, \(P_1(u) = \prod _i e^{-2v_iu_i}\) and \(P_2(u)=\prod _i e^{-v_iu_i}\). Note that the PC is not class-factorized. Use the PC to compute:

$$ \delta (y',y'') = \min _u \left[ a\exp \left( -2\sum _i v_iu_i \right) - b\exp \left( -\sum _i v_iu_i \right) \right] \,. $$

If we call \(x:=\exp (-\sum _i v_i u_i)\), the above expression is the minimum for positive x of \(f(x):=ax^2 - bx\). Function f is a strictly convex function minimized at \(x=b/(2a)\). Selecting a and b such that \(b/(2a)=e^{-1}\) makes the minimum occur at \(\sum _i v_iu_i = 1\). Thus, there is a solution to (11) if and only if \(\delta (y',y'') \le -ae^{-2}\). This proof is not quite valid because the distributions \(P_1(u)\) and \(P_2(u)\) use non-rational numbers. However, we can use the same strategy as used to prove Theorem 5 in [16] and exploit the rational gap between yes and no instances of the original problem to encode a rational approximation of \(P_1\) and \(P_2\) of polynomial size.    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Llerena, J.V., Mauá, D.D., Antonucci, A. (2021). Cautious Classification with Data Missing Not at Random Using Generative Random Forests. In: Vejnarová, J., Wilson, N. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2021. Lecture Notes in Computer Science(), vol 12897. Springer, Cham. https://doi.org/10.1007/978-3-030-86772-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86772-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86771-3

  • Online ISBN: 978-3-030-86772-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics