Abstract
The functionality of machine learning algorithms heavily relies on the abundance and quality of training data accessible. However, the data may originate from diverse data subspaces, making labeled training set typically offers only limited insights, inadequately representing the entirety of potential scenarios. How to safely make use of the unlabeled instances is an emerging and interesting problem for learning Bayesian network classifiers (BNCs), which graphically model the probabilistic relationships among variables in the form of directed acyclic graph (DAG). In this paper, we introduce dual learning into the learning procedure to realize the safe exploitation of the unlabeled instances. We elucidate the mapping between information metric and the local DAG, as well as the distinction between informational (in)dependence and probabilistic (in)dependence. Building upon this foundation, we propose new metrics to accurately measure attribute dependencies within unlabeled instances. The proposed dual learning-based flexible selective k-dependence Bayesian network classifier (DL-FSKDB) employs eager learning to construct the initial model and incorporates lazy learning for personalized fine-tuning and optimization. The extensive experimental evaluations across 36 datasets spanning various domains with distinct properties reveal that the learned BNCs demonstrate competitive classification performance in comparison with state-of-the-art learners in terms of zero–one loss, bias and variance, as well as F1-measure.












Similar content being viewed by others
Data availability
All datasets used in the paper are publicly available http://archive.ics.uci.edu/ml
References
Yu T, Kumar A, Chebotar Y, Hausman K, Finn C, Levine S (2022) How to leverage unlabeled data in offline reinforcement learning. In: ICML, vol 162, pp 25611–25635
Liu Y, Wang L, Mammadov M (2020) Learning semi-lazy Bayesian network classifier under the ciid assumption. Knowl Based Syst 208:106–132
Wang L, Zhou J, Wei J, Pang M, Sun M (2022) Learning causal Bayesian networks based on causality analysis for classification. Eng Appl Artif Intell 114:105–138
Chickering DM (1995) Learning Bayesian networks is np-complete. In: AISTATS, pp 121–130
Lewis DD (1998) Naive (bayes) at forty: the independence assumption in information retrieval. In: ECML, vol 1398, pp 4–15
Inza I, Larrañaga P, Etxeberria R, Sierra B (2000) Feature subset selection by Bayesian network-based optimization. Artif Intell 123(1–2):157–184
Pernkopf F (2004) Bayesian network classifiers versus k-NN classifier using sequential feature selection. In: AAAI, pp 360–365
Rafla M, Voisine N, Crémilleux B, Boullé M (2022) A non-parametric Bayesian approach for uplift discretization and feature selection. In: ECML/PKDD, vol 13717, pp 239–254
Jiang L, Zhang L, Li C, Wu J (2018) A correlation-based feature weighting filter for Naive Bayes. IEEE Trans Knowl Data Eng 31(2):201–213
Wang L, Xie Y, Pang M, Wei J (2022) Alleviating the attribute conditional independence and IID assumptions of averaged one-dependence estimator by double weighting. Knowl Based Syst 250:109–131
Zhang H, Jiang L, Zhang W, Li C (2023) Multi-view attribute weighted Naive Bayes. IEEE Trans Knowl Data Eng 35(7):7291–7302
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163
Sahami M (1996) Learning limited dependence Bayesian classifiers. In: KDD, pp 335–338
Wang L, Li L, Li Q, Li K (2024) Learning high-dependence Bayesian network classifier with robust topology. Expert Syst Appl 239:122–145
Frank E, Hall MA, Pfahringer B (2003) Locally weighted Naive Bayes. In: UAI, pp 249–256
Duan Z, Wang L, Chen S, Sun M (2020) Instance-based weighting filter for superparent one-dependence estimators. Knowl Based Syst 203:106–132
Zheng Z, Webb GI (2000) Lazy learning of Bayesian rules. Mach Learn 41:53–84
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mob Comput Commun Rev 5(1):3–55
Gao Y, Gong M, Xie Y, Qin AK, Pan K, Ong Y-S (2022) Multiparty dual learning. IEEE Trans Cybern 53(5):2955–2968
Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented Naive Bayes for class probability estimation. Knowl Based Syst 26:239–245
Martınez AM, Webb GI, Chen S, Zaidi NA (2016) Scalable learning of Bayesian network classifiers. J Mach Learn Res 17(44):1–35
Rubio A, Gámez JA (2011) Flexible learning of k-dependence Bayesian network classifiers. In: GECCO, pp 1219–1226
Ren H, Guo Q (2023) Flexible learning tree augmented naïve classifier and its application. Knowl Based Syst 260:110–140
Liu Y, Wang L, Mammadov M, Chen S, Wang G, Qi S, Sun M (2021) Hierarchical independence thresholding for learning Bayesian network classifiers. Knowl Based Syst 212:106–127
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Jiang L, Zhang H (2006) Lazy averaged one-dependence estimators. In: Canadian AI, vol 4013, pp 515–525
Webb GI, Boughton JR, Wang Z (2005) Not so Naive Bayes: aggregating one-dependence estimators. Mach Learn 58:5–24
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Brown G, Pocock A, Zhao M-J, Luján M (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13(1):27–66
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: ICML, pp 856–863
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5(9):1531–1555
Yang HH, Moody JE (1999) Data visualization and feature selection: new algorithms for nongaussian data. In: NIPS, pp 687–702
Jiang L, Zhang H, Cai Z (2008) A novel Bayes model: nidden Naive Bayes. IEEE Trans Knowl Data Eng 21(10):1361–1371
Jiang L, Zhang H, Cai Z, Wang D (2012) Weighted average of one-dependence estimators. J Exp Theor Artif Intell 24(2):219–230
Kong H, Wang L (2023) Flexible model weighting for one-dependence estimators based on point-wise independence analysis. Pattern Recogn 139:109–139
Khan MA, Pečarić J, Chu Y-M (2020) Refinements of Jensen’s and Mcshane’s inequalities with applications. AIMS Math 5(5):4931–4945
Reichenbach H (1971) The theory of probability. University of California Press, Oakland
Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. In: ECAI, pp 147–149
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130
Kohavi R, Wolpert DH et al (1996) Bias plus variance decomposition for zero-one loss functions. In: ICML, vol 96, pp 275–283
Pillai I, Fumera G, Roli F (2017) Designing multi-label classifiers that maximize f measures: state of the art. Pattern Recogn 61:394–404
Wang L, Wang J, Guo L, Li Q (2024) Efficient heuristics for learning scalable Bayesian network classifier from labeled and unlabeled data. Appl Intell 54(2):1957–1979
Wang L, Wang L, Guo L, Li Q, Li X (2023) Exploring complex multivariate probability distributions with simple and robust Bayesian network topology for classification. Appl Intell 53(24):29799–29817
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1029
Ortigosa-Hernández J, Inza I, Lozano JA (2017) Measuring the class-imbalance extent of multi-class problems. Pattern Recogn Lett 98:32–38
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University, Princeton
Acknowledgements
This work is supported by the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing, China (No. KLIGIP-2021A04), the Scientific and Technological Development Scheme of Jilin Province, China (No. 20240101371JC), and the Key Research Project of Sports Science in Jilin Province (No. 202423).
Author information
Authors and Affiliations
Contributions
Y.Z. handled methodology, visualization, and wrote the original draft. L.W. was responsible for conceptualization, supervision, writing (review and editing), and funding acquisition. X.Z. performed formal analysis, investigation, and resources. T.J. contributed to investigation, resources, and validation. M.S. and X.L. were involved in investigation, validation, and review.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Code availability
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, Y., Wang, L., Zhu, X. et al. Probability knowledge acquisition from unlabeled instance based on dual learning. Knowl Inf Syst 67, 521–547 (2025). https://doi.org/10.1007/s10115-024-02238-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-024-02238-9