Skip to main content
Log in

Wrapper positive Bayesian network classifiers

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In the information retrieval framework, there are problems where the goal is to recover objects of a particular class from big sets of unlabelled objects. In some of these problems, only examples from the class we want to recover are available. For such problems, the machine learning community has developed algorithms that are able to learn binary classifiers in the absence of negative examples. Among them, we can find the positive Bayesian network classifiers, algorithms that induce Bayesian network classifiers from positive and unlabelled examples. The main drawback of these algorithms is that they require some previous knowledge about the a priori probability distribution of the class. In this paper, we propose a wrapper approach to tackle the learning when no such information is available, setting this probability at the optimal value in terms of the recovery of positive examples. The evaluation of classifiers in positive unlabelled learning problems is a non-trivial question. We have also worked on this problem, and we have proposed a new guiding metric to be used in the search for the optimal a priori probability of the positive class that we have called the pseudo F. We have empirically tested the proposed metric and the wrapper classifiers on both synthetic and real-life datasets. The results obtained in this empirical comparison show that the wrapper Bayesian network classifiers provide competitive results, particularly when the actual a priori probability of the positive class is high.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Not, at least, directly from positive and unlabelled examples. In Elkan et al. [15], the authors present an interesting theoretical result that could be used to estimate this value under certain assumptions. However, the authors propose other ways to use this theoretical result to learn classifiers, as we will see in the experimental part of the paper.

  2. Henceforth, we will refer to the actual a priori probability of the positive class \(P(C=1)\) as \(p\) and, in order to avoid mixing concepts, will refer to the parameter used in the training of PBCs as ‘training \(p\)’ or \(p_t\), to distinguish it from the actual a priori probability of the positive class.

  3. We are assuming that \(X_i\), \(i=1,\ldots ,n\) are discrete variables that can take \(r_i\) values and that the class \(C\) is a binary variable that can take two values, 0 (or negative) and 1 (or positive).

  4. More details on the estimation of the naive Bayes model parameters in the absence of negative examples can be consulted at Calvo [3] and Denis et al. [13].

  5. Except when the previous optimum is in an extreme of the interval, in which case the optimum will also be in the corresponding extreme of the new interval, not in the middle.

  6. Obviously, any information about the upper and/or lower bounds of this parameter could be introduced at this point, reducing the initial search space.

  7. Details about the evaluation of the models will be provided in the next section.

  8. In the most general probabilistic framework, an instance belongs to both the positive and negative class with some probability. In this work, we will focus on the particular case when the probability of belonging to one class is 1 and the probability of belonging to the other class is 0, that is, problems where the instances belong to one, and only one class.

  9. The feasibility of this calculation will depend on the number of possible instances. When the exact value cannot be obtained, other approaches should be taken.

  10. As the estimation of the recall involves only the positive examples, there is no need for dividing the unlabelled examples into folds.

  11. Note that \(p\) refers to the actual a priori probability of the positive class that is a constant for a given problem domain.

  12. In the original dataset, the attributes are continuous. As the classifiers used work with discrete attributes, the original features have been discretised into three intervals using an equal frequency strategy.

  13. Note that, at most, we need 4,800 positive instances, and only 4,044 are available. To overcome this problem the positive examples in the set of unlabelled instances are resampled when needed (i.e. there are repeated positive examples in the set of unlabelled instances in some problems). This resampling process is used to complete only the unlabelled instances, never the set of known positive examples.

  14. The complete set of results can be consulted at the supplementary data web page at http://www.sc.ehu.es/ccwbayes/members/borxa/wPBC/.

  15. In a given comparison, we have two paired samples (the evaluation of two classifiers in fifty datasets). The test signed rank test is run on the differences between the performance of the two classifiers in each dataset, in order to exploit the fact that the samples are paired.

  16. Both the tests and the correction were carried out using R. For the multiple testing correction, the package multtest was used.

  17. The algorithm does not assume this, but it builds a classifier to differentiate labelled and unlabelled examples which, in practical terms, is the same as building a classifier to distinguish positive and negative examples assuming that all the unlabelled instances are negative.

References

  1. Asuncion A, Newman DJ (2007) UCI machine learning repository

  2. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat 29(4):1165–1188

    Article  MathSciNet  MATH  Google Scholar 

  3. Calvo B (2008) Positive unlabelled learning with applications in computational biology. Ph.D. thesis, Facultad de Informatica–UPV/EHU

  4. Calvo B, Larrañaga P, Lozano JA (2007a) Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognit Lett 28:2375–2384

    Article  Google Scholar 

  5. Calvo B, Larrañaga P, Lozano JA (2009) Feature subset selection from positive and unlabelled examples. Pattern Recognit Lett 30(11):1027–1036

    Article  Google Scholar 

  6. Calvo B, López-Bigas N, Furney SJ, Larrañaga P, Lozano JA (2007b) A partially supervised classification approach to dominant and recessive human disease gene prediction. Comput Methods Program Biomed 85:229–237

    Article  Google Scholar 

  7. Castelo R, Guigó R (2004) Splice site identification by idlbns. Bioinformatics 4(20):i69–i76

    Article  Google Scholar 

  8. Cerulo L, Elkan C, Ceccarelli M (2010) Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinform 11:228

    Article  Google Scholar 

  9. Chapelle O, Zien A, Schökopf B (2006) Semi-supervised learning. MIT Press, Cambridge

    Google Scholar 

  10. Chen Y, Li Z, Wang X, Feng J, Hu X (2010) Predicting gene function using few positive examples and unlabeled ones. BMC Genomics 11(Suppl 2):S11

    Article  Google Scholar 

  11. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  12. Denis F (1998) PAC learning from positive statistical queries. In: Proceedings of the 9th international conference on algorithmic learning theory. Springer, New York, pp 112–126

  13. Denis F, Gilleron R, Tommasi M (2002) Text classification from positive and unlabeled examples. In: Proceedings of the 9th international conference on information processing and management of uncertainty in knowledge-based Systems. IPMU 2002, pp 1927–1934

  14. Duan Q, Miao D, Jin K (2007) A rough set approach to classifying web page without negative examples. In: Proceedings of the 11th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, New York, pp 481–488

  15. Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, USA, pp 213–220

  16. Friedman N, Geiger D, Goldszmit M (1997) Bayesian network classifiers. Mach Learn 29:131–163

    Article  MATH  Google Scholar 

  17. Furney SJ, Calvo B, Larrañaga P, Lozano JA, Lopez-Bigas N (2008) Prioritization of candidate cancer genes-an aid to oncogenomic studies. Nucleic Acids Res 36(18):e115

    Article  Google Scholar 

  18. Ide JS, Cozman FG (2002) Random generation of Bayesian networks. In: Proceedings of the 16th Brazilian symposium on artificial intelligence: advances in artificial intelligence. Springer, New York, pp 366–375

  19. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  20. Kononenko I (1991) Semi-naive Bayes classifiers. In: Proceedings of the 6th European working session on, learning, pp 206–219

  21. Lee WS, Liu B (2003) Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the twentieth international conference on machine learning, pp 448–455

  22. Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. In: Proceedings of the eighteenth international joint conference on artificial intelligence 2003, pp 587–594

  23. Li X-L, Liu B, Ng S-K (2010) Negative training data can be harmful to text classification. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 218–228

  24. Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), pp 179–186

  25. Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the nineteenth international conference on machine learning, pp 387–394

  26. Minsky M (1961) Steps toward artificial intelligence. Proc Inst Radio Eng 49:8–30

    MathSciNet  Google Scholar 

  27. Pan S, Zhang Y, Li X (2012) Dynamic classifier ensemble for positive unlabeled text stream classification. Knowl Inform Syst 1–21. doi:10.1007/s10115-011-0469-2

  28. Pazzani M (1997) Searching for dependencies in Bayesian classifiers. Springer, New York

    Google Scholar 

  29. Rodríguez JD, Pérez A, Lozano JA (2010) Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575

    Article  Google Scholar 

  30. Sriphaew K, Takamura H, Okumura M (2009) Cool blog classification from positive and unlabeled examples. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 62–73

  31. Tsai RT-H, Hung H-C, Dai H-J, Lin Y-W, Hsu W-L (2008) Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles. BMC Bioinform 9(Suppl 1):S3

    Article  Google Scholar 

  32. Wang C, Ding C, Meraz RF, Holbrook SR (2006) PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22(21):2590–2596

    Article  Google Scholar 

  33. Xiao Y, Segal MR (2008) Biological sequence classification utilizing positive and unlabeled data. Bioinformatics 24(9):1198–1205

    Article  Google Scholar 

  34. Yousef M, Jung S, Showe LC, Showe MK (2008) Learning from positive examples when the negative class is undetermined-micro RNA gene identification. Algorithms Mol Biol 3:2

    Article  Google Scholar 

  35. Yu H, Han J, Chang K (2004) PEBL: Web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81

    Article  Google Scholar 

  36. Yu H, Han J, Chang KC-C (2002) PEBL: positive example based learning for web page classification using SVM. In Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 239–248

  37. Zhang D, Lee WS (2005) A simple probabilistic approach to learning from positive and unlabeled examples. In: Proceedings of Fifth annual UK conference on computational intelligence (UKCI), pp 83–87

  38. Zhao X-M, Wang Y, Chen L, Aihara K (2008) Gene function prediction using labeled and unlabeled data. BMC Bioinform 9:57

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the Saiotek and Research Groups 2007–2012 (IT-242-07) programs (Basque Government), TIN2010-14931, TIN2010-20900-C04-04 Consolider Ingenio 2010-CSD2007-00018 and Cajal Blue Brain Project (C080020-09) projects (Spanish Ministry of Science and Innovation), the COMBIOMED network in computational bio-medicine (Carlos III Health Institute) and the Diputación Foral de Gipuzkoa (2011-CIEN-000060-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Borja Calvo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calvo, B., Inza, I., Larrañaga, P. et al. Wrapper positive Bayesian network classifiers. Knowl Inf Syst 33, 631–654 (2012). https://doi.org/10.1007/s10115-012-0553-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0553-2

Keywords

Navigation