Skip to main content
Log in

Stochastic optimization for bayesian network classifiers

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

How to reduce the complexity of network topology and make the learned joint probability distribution fit data are two important but inconsistent issues for learning Bayesian network classifier (BNC). By transforming one single high-order topology into a set of low-order ones, ensemble learning algorithms can include more hypothesis implicated in training data and help achieve the tradeoff between bias and variance. Resampling from training data can vary the results of member classifiers of the ensemble, whereas the potentially lost information may bias the estimate of conditional probability distribution and then introduce insignificant rather than significant dependency relationships into the network topology of BNC. In this paper, we propose to learn from training data as a whole and apply heuristic search strategy to flexibly identify the significant conditional dependencies, and then the attribute order is determined implicitly. Random sampling is introduced to make each member of the ensemble “unstable” and fully represent the conditional dependencies. The experimental evaluation on 40 UCI datasets reveals that the proposed algorithm, called random Bayesian forest (RBF), achieves remarkable classification performance compared to the extended version of state-of-the-art out-of-core BNCs (e.g., SKDB, WATAN, WAODE, SA2DE, SASA2DE and IWAODE).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope

  2. https://archive.ics.uci.edu/ml/datasets/Localization+Data+for+Person+Activity

References

  1. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29 (2-3):131–163

    Article  Google Scholar 

  2. Jiang LX, Zhang LG, Li CQ, Wu J (2019) A Correlation-Based feature weighting filter for naive bayes. IEEE Trans Knowl Data Eng 31(2):201–213

    Article  Google Scholar 

  3. Chickering DM, Heckerman D, Meek C (2004) Large-sample learning of Bayesian networks is NP-hard. J Mach Learn Res 5:1287–1330

    MathSciNet  MATH  Google Scholar 

  4. Jiang LX, Zhang LG, Yu LJ, Wang DH (2019) Class-specific attribute weighted naive Bayes. Pattern Recogn 88:321–330

    Article  Google Scholar 

  5. Wang LM, Zhang S, Mammadov M, Li K, Zhang XH (2021) Semi-supervised weighting for averaged one-dependence estimators. Applied Intelligence

  6. Liu Y, Wang LM, Mammadov M, Chen SL, Wang GJ, Qi SK, Sun MH (2021) Hierarchical Independence Thresholding for learning Bayesian network classifiers. Knowl-Based Syst, p 212

  7. Liu Y, Wang LM, Mammadov M (2020) Learning semi-lazy Bayesian network classifier under the c.i.i.d assumption. Knowl-Based Syst, p 208

  8. Jiang LX, Li CQ, Wang SS, Zhang LG (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39

    Article  Google Scholar 

  9. Jiang LX, Zhang H, Cai ZH (2009) A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21(10):1361–1371

    Article  Google Scholar 

  10. Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467

    Article  Google Scholar 

  11. Sahami M (1996) Learning limited dependence bayesian classifiers. Knowledge Discovery in Databases 96(1):335–338

    Google Scholar 

  12. Wang LM, Zhang XH, Li K, Zhang S (2021) Semi-supervised learning for k-dependence Bayesian classifiers. Applied Intelligence

  13. Wang LM, Chen SL, Mammadov M (2018) Target Learning: A Novel Framework to Mine Significant Dependencies for Unlabeled Data. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 106–117

  14. Herr HD, Krzysztofowicz R (2019) Ensemble Bayesian forecasting system Part II: Experiments and properties. J Hydrol 575:1328–1344

    Article  Google Scholar 

  15. Aridas CK, Kotsiantis SB, Vrahatis MN (2016) Increasing Diversity in Random Forests Using Naive Bayes. In: Proceedings of the 12th International Conference on Artificial Intelligence Applications and Innovations, pp 75–86

  16. Ho TK (1995) Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp 278–282

  17. Wang LM, Chen P, Chen SL, Sun MH (2021) A novel approach to fully representing the diversity in conditional dependencies for learning Bayesian network classifier. Intelligent Data Analysis 25(1):35–55

    Article  Google Scholar 

  18. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13rd International Conference on Machine Learning, pp 148–156

  19. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  Google Scholar 

  20. Adeva JJG, Beresi UC, Calvo RA (2005) Accuracy and diversity in ensembles of text categorisers. CLEI Electronic Journal 9(1):1–12

    Google Scholar 

  21. Zhang H, Petitjean F, Buntine W (2020) Bayesian network classifiers using ensembles and smoothing. Knowl Inf Syst 62(9):3457–3480

    Article  Google Scholar 

  22. Wang LM, Wang GJ, Duan ZY, Lou H, Sun MH (2019) Optimizing the Topology of Bayesian Network Classifiers by Applying Conditional Entropy to Mine Causal Relationships Between Attributes. IEEE Access 7:134271–134279

    Article  Google Scholar 

  23. Martinez AM, Webb GI, Chen SL, Zaidi NA (2016) Scalable learning of bayesian network classifiers. J Mach Learn Res 17(1):1515–1549

    MathSciNet  MATH  Google Scholar 

  24. Webb GI, Boughton JR, Wang ZH (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24

    Article  Google Scholar 

  25. Jiang LX, Zhang H, Cai ZH, Wang DH (2012) Weighted average of one-dependence estimators. Journal of Experimental & Theoretical Artificial Intelligence 24(2):219–230

    Article  Google Scholar 

  26. Kong H, Shi XH, Wang LM, Liu Y, Mammadov M (2021) Averaged tree-augmented one-dependence estimators. Appl Intell 51(7):4270–4286

    Article  Google Scholar 

  27. Jiang LX, Cai ZH, Wang DH, Zhang H (2012) Improving Tree augmented Naive Bayes for class probability estimation. Knowl-Based Syst 26:239–245

    Article  Google Scholar 

  28. Hellman S, McGovern A, Xue M (2012) Learning ensembles of Continuous Bayesian Networks: An application to rainfall prediction. In: Proceedings of 2012 Conference on Intelligent Data Understanding, pp 112–117

  29. Geiger D, Heckerman D (1996) Knowledge representation and inference in similarity networks and Bayesian multinets. Artif Intell 82(1-2):45–74

    Article  MathSciNet  Google Scholar 

  30. Davison AC, Hinkley DV, Young GA (2003) Recent developments in bootstrap methodology. Stat Sci 18(2):141–157

    Article  MathSciNet  Google Scholar 

  31. Bryll R, Gutierrez-Osuna R, Quek F (2003) Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recogn 36(6):1291–1302

    Article  Google Scholar 

  32. Jing YS, Pavlovic V, Rehg JM (2008) Boosted Bayesian network classifiers. Mach Learn 73(2):155–184

    Article  Google Scholar 

  33. Ratsch G, Onoda T, Muller KR (2001) Soft margins for AdaBoost. Mach Learn 42 (3):287–320

    Article  Google Scholar 

  34. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  35. Kunwar R, Pal U, Blumenstein M (2014) Semi-Supervised Online Bayesian Network Learner for Handwritten Characters Recognition. In: Proceedings of the 22nd International Conference on Pattern Recognition, pp 3104–3109

  36. Ma SC, Shi HB (2004) Tree-augmented Naive Bayes ensembles. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, pp 1497–1502

  37. Breiman L (2001) Random forests. Machine Learning 45(1):5–32

    Article  Google Scholar 

  38. Murphy PM, Aha DW (1994) UCI Repository of Machine Learning Databases, Available online: http://www.ics.uci.edu/mlearn/MLRepository.html

  39. Kumar V, Heikkonen J, Rissanen J, Kaski K (2006) Minimum description length denoising with histogram models. IEEE Transactions on Signal Processing 54(8):2922–2928

    Article  Google Scholar 

  40. Cestnik B (1990) Estimating probabilities: A crucial task in machine learning. In: Proceedings of the 9th European Conference on Artificial Intelligence, pp 147–149

  41. Chen SL, Martinez AM, Webb GI, Wang LM (2017) Selective anDE for large data learning: a low-bias memory constrained approach. Knowl Inf Syst 50(2):475–503

    Article  Google Scholar 

  42. Chen SL, Martinez AM, Webb GI, Wang LM (2017) Sample-based Attribute Selective anDE for Large Data. IEEE Trans Knowl Data Eng 29(1):172–185

    Article  Google Scholar 

  43. Duan ZY, Wang LM, Chen SL, Sun MH (2020) Instance-based weighting filter for superparent one-dependence estimators. Knowl-Based Syst 203:106085

    Article  Google Scholar 

  44. Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th International Conference on Machine Learning, pp 275–283

  45. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  46. Salles T, Rocha L, Goncalves M (2020) A bias-variance analysis of state-of-the-art random forest text classifiers. ADAC 15(2):379–405

    Article  MathSciNet  Google Scholar 

  47. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  Google Scholar 

  48. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  49. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach Learn 40(2):139–157

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (No.2019YFC1804804), Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (No.KLIGIP-2021A04), and the Scientific and Technological Developing Scheme of Jilin Province (No.20200201281JC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to LiMin Wang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 9 Experimental results of RMSE
Table 10 Experimental results of bias
Table 11 Experimental results of variance

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, Y., Wang, L., Li, X. et al. Stochastic optimization for bayesian network classifiers. Appl Intell 52, 15496–15516 (2022). https://doi.org/10.1007/s10489-022-03356-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03356-z

Keywords

Navigation