Skip to main content
Log in

Probability knowledge acquisition from unlabeled instance based on dual learning

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The functionality of machine learning algorithms heavily relies on the abundance and quality of training data accessible. However, the data may originate from diverse data subspaces, making labeled training set typically offers only limited insights, inadequately representing the entirety of potential scenarios. How to safely make use of the unlabeled instances is an emerging and interesting problem for learning Bayesian network classifiers (BNCs), which graphically model the probabilistic relationships among variables in the form of directed acyclic graph (DAG). In this paper, we introduce dual learning into the learning procedure to realize the safe exploitation of the unlabeled instances. We elucidate the mapping between information metric and the local DAG, as well as the distinction between informational (in)dependence and probabilistic (in)dependence. Building upon this foundation, we propose new metrics to accurately measure attribute dependencies within unlabeled instances. The proposed dual learning-based flexible selective k-dependence Bayesian network classifier (DL-FSKDB) employs eager learning to construct the initial model and incorporates lazy learning for personalized fine-tuning and optimization. The extensive experimental evaluations across 36 datasets spanning various domains with distinct properties reveal that the learned BNCs demonstrate competitive classification performance in comparison with state-of-the-art learners in terms of zero–one loss, bias and variance, as well as F1-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

All datasets used in the paper are publicly available http://archive.ics.uci.edu/ml

References

  1. Yu T, Kumar A, Chebotar Y, Hausman K, Finn C, Levine S (2022) How to leverage unlabeled data in offline reinforcement learning. In: ICML, vol 162, pp 25611–25635

  2. Liu Y, Wang L, Mammadov M (2020) Learning semi-lazy Bayesian network classifier under the ciid assumption. Knowl Based Syst 208:106–132

    Article  MATH  Google Scholar 

  3. Wang L, Zhou J, Wei J, Pang M, Sun M (2022) Learning causal Bayesian networks based on causality analysis for classification. Eng Appl Artif Intell 114:105–138

    Article  MATH  Google Scholar 

  4. Chickering DM (1995) Learning Bayesian networks is np-complete. In: AISTATS, pp 121–130

  5. Lewis DD (1998) Naive (bayes) at forty: the independence assumption in information retrieval. In: ECML, vol 1398, pp 4–15

  6. Inza I, Larrañaga P, Etxeberria R, Sierra B (2000) Feature subset selection by Bayesian network-based optimization. Artif Intell 123(1–2):157–184

    Article  MATH  Google Scholar 

  7. Pernkopf F (2004) Bayesian network classifiers versus k-NN classifier using sequential feature selection. In: AAAI, pp 360–365

  8. Rafla M, Voisine N, Crémilleux B, Boullé M (2022) A non-parametric Bayesian approach for uplift discretization and feature selection. In: ECML/PKDD, vol 13717, pp 239–254

  9. Jiang L, Zhang L, Li C, Wu J (2018) A correlation-based feature weighting filter for Naive Bayes. IEEE Trans Knowl Data Eng 31(2):201–213

    Article  MATH  Google Scholar 

  10. Wang L, Xie Y, Pang M, Wei J (2022) Alleviating the attribute conditional independence and IID assumptions of averaged one-dependence estimator by double weighting. Knowl Based Syst 250:109–131

    Article  MATH  Google Scholar 

  11. Zhang H, Jiang L, Zhang W, Li C (2023) Multi-view attribute weighted Naive Bayes. IEEE Trans Knowl Data Eng 35(7):7291–7302

    MATH  Google Scholar 

  12. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163

    Article  MATH  Google Scholar 

  13. Sahami M (1996) Learning limited dependence Bayesian classifiers. In: KDD, pp 335–338

  14. Wang L, Li L, Li Q, Li K (2024) Learning high-dependence Bayesian network classifier with robust topology. Expert Syst Appl 239:122–145

    Article  MATH  Google Scholar 

  15. Frank E, Hall MA, Pfahringer B (2003) Locally weighted Naive Bayes. In: UAI, pp 249–256

  16. Duan Z, Wang L, Chen S, Sun M (2020) Instance-based weighting filter for superparent one-dependence estimators. Knowl Based Syst 203:106–132

    Article  MATH  Google Scholar 

  17. Zheng Z, Webb GI (2000) Lazy learning of Bayesian rules. Mach Learn 41:53–84

    Article  MATH  Google Scholar 

  18. Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55

    Article  MathSciNet  MATH  Google Scholar 

  19. Gao Y, Gong M, Xie Y, Qin AK, Pan K, Ong Y-S (2022) Multiparty dual learning. IEEE Trans Cybern 53(5):2955–2968

    Article  Google Scholar 

  20. Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented Naive Bayes for class probability estimation. Knowl Based Syst 26:239–245

    Article  MATH  Google Scholar 

  21. Martınez AM, Webb GI, Chen S, Zaidi NA (2016) Scalable learning of Bayesian network classifiers. J Mach Learn Res 17(44):1–35

    MathSciNet  MATH  Google Scholar 

  22. Rubio A, Gámez JA (2011) Flexible learning of k-dependence Bayesian network classifiers. In: GECCO, pp 1219–1226

  23. Ren H, Guo Q (2023) Flexible learning tree augmented naïve classifier and its application. Knowl Based Syst 260:110–140

    Article  MATH  Google Scholar 

  24. Liu Y, Wang L, Mammadov M, Chen S, Wang G, Qi S, Sun M (2021) Hierarchical independence thresholding for learning Bayesian network classifiers. Knowl Based Syst 212:106–127

    Article  MATH  Google Scholar 

  25. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  MATH  Google Scholar 

  26. Jiang L, Zhang H (2006) Lazy averaged one-dependence estimators. In: Canadian AI, vol 4013, pp 515–525

  27. Webb GI, Boughton JR, Wang Z (2005) Not so Naive Bayes: aggregating one-dependence estimators. Mach Learn 58:5–24

    Article  MATH  Google Scholar 

  28. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  MATH  Google Scholar 

  29. Brown G, Pocock A, Zhao M-J, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13(1):27–66

    MathSciNet  MATH  Google Scholar 

  30. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, pp 856–863

  31. Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5(9):1531–1555

    MathSciNet  MATH  Google Scholar 

  32. Yang HH, Moody JE (1999) Data visualization and feature selection: new algorithms for nongaussian data. In: NIPS, pp 687–702

  33. Jiang L, Zhang H, Cai Z (2008) A novel Bayes model: nidden Naive Bayes. IEEE Trans Knowl Data Eng 21(10):1361–1371

    Article  MATH  Google Scholar 

  34. Jiang L, Zhang H, Cai Z, Wang D (2012) Weighted average of one-dependence estimators. J Exp Theor Artif Intell 24(2):219–230

    Article  MATH  Google Scholar 

  35. Kong H, Wang L (2023) Flexible model weighting for one-dependence estimators based on point-wise independence analysis. Pattern Recogn 139:109–139

    Article  MATH  Google Scholar 

  36. Khan MA, Pečarić J, Chu Y-M (2020) Refinements of Jensen’s and Mcshane’s inequalities with applications. AIMS Math 5(5):4931–4945

    Article  MathSciNet  MATH  Google Scholar 

  37. Reichenbach H (1971) The theory of probability. University of California Press, Oakland

    MATH  Google Scholar 

  38. Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: ECAI, pp 147–149

  39. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130

    Article  MATH  Google Scholar 

  40. Kohavi R, Wolpert DH et al (1996) Bias plus variance decomposition for zero-one loss functions. In: ICML, vol 96, pp 275–283

  41. Pillai I, Fumera G, Roli F (2017) Designing multi-label classifiers that maximize f measures: state of the art. Pattern Recogn 61:394–404

    Article  MATH  Google Scholar 

  42. Wang L, Wang J, Guo L, Li Q (2024) Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data. Appl Intell 54(2):1957–1979

    Article  MATH  Google Scholar 

  43. Wang L, Wang L, Guo L, Li Q, Li X (2023) Exploring complex multivariate probability distributions with simple and robust Bayesian network topology for classification. Appl Intell 53(24):29799–29817

    Article  MATH  Google Scholar 

  44. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  45. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1029

  46. Ortigosa-Hernández J, Inza I, Lozano JA (2017) Measuring the class-imbalance extent of multi-class problems. Pattern Recogn Lett 98:32–38

    Article  MATH  Google Scholar 

  47. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  48. Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University, Princeton

    MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing, China (No. KLIGIP-2021A04), the Scientific and Technological Development Scheme of Jilin Province, China (No. 20240101371JC), and the Key Research Project of Sports Science in Jilin Province (No. 202423).

Author information

Authors and Affiliations

Authors

Contributions

Y.Z. handled methodology, visualization, and wrote the original draft. L.W. was responsible for conceptualization, supervision, writing (review and editing), and funding acquisition. X.Z. performed formal analysis, investigation, and resources. T.J. contributed to investigation, resources, and validation. M.S. and X.L. were involved in investigation, validation, and review.

Corresponding author

Correspondence to Minghui Sun.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Code availability

https://github.com/Bayes514/DL-FSKDB.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

see Table 6.

Table 6 A summary table of the statistics employed

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Wang, L., Zhu, X. et al. Probability knowledge acquisition from unlabeled instance based on dual learning. Knowl Inf Syst 67, 521–547 (2025). https://doi.org/10.1007/s10115-024-02238-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-024-02238-9

Keywords