Density Estimators for Positive-Unlabeled Learning

Basile, Teresa M. A.; Di Mauro, Nicola; Esposito, Floriana; Ferilli, Stefano; Vergari, Antonio

doi:10.1007/978-3-319-78680-3_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10785))

Included in the following conference series:

International Workshop on New Frontiers in Mining Complex Patterns

633 Accesses

Abstract

Positive-Unlabeled (PU) learning works by considering a set of positive samples, and a (usually larger) set of unlabeled ones. This challenging setting requires algorithms to cleverly exploit dependencies hidden in the unlabeled data in order to build models able to accurately discriminate between positive and negative samples. We propose to exploit probabilistic generative models to characterize the distribution of the positive samples, and to label as reliable negative samples those that are in the lowest density regions with respect to the positive ones. The overall framework is flexible enough to be applied to many domains by leveraging tools provided by years of research from the probabilistic generative model community. Results on several benchmark datasets show the performance and flexibility of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ensembles of density estimators for positive-unlabeled learning

Article 01 March 2019

Positive-unlabeled classification under class-prior shift: a prior-invariant approach based on density ratio estimation

Article 27 June 2022

Bayesian Logistic Model for Positive and Unlabeled Data

Notes

1.
This paper is an extended version of [2] presented at the International Workshop NFMCP held in conjunction with ECML/PKDD 2017.
2.
http://archive.ics.uci.edu/ml/.
3.
The datasets and settings used in [18] were kindly provided by Dino Ienco.
4.
http://www.bnlearn.com/.
5.
The same set of experiments have been conducted using the likelihood as scoring function, leading to overfitted models with an overall result worst than that obtained using the K2 score.
6.
http://libra.cs.uoregon.edu/.
7.
For this stage only, categorical data is one-hot encoded.
8.
http://scikit-learn.org/.

References

Balasubramanian, V.: MDL, Bayesian inference, and the geometry of the space of probability distributions. In: Grünwald, P.D., Myung, I.J., Pitt, M.A. (eds.) Advances in Minimum Description Length: Theory and Applications, pp. 81–98. MIT Press, Cambridge (2005)
Google Scholar
Basile, T., Mauro, N.D., Esposito, F., Ferilli, S., Vergari, A.: Generative probabilistic models for positive-unlabeled learning. In: Workshop on NFMCP Held with ECML/PKDD (2017)
Google Scholar
Bengio, Y., Courville, A.C., Vincent, P.: Unsupervised feature learning and deep learning: a review and new perspectives. CoRR abs/1206.5538 (2012)
Google Scholar
Calvo, B., Larraaga, P., Lozano, J.A.: Learning bayesian classifiers from positive and unlabeled examples. Pattern Recogn. Lett. 28(16), 2375–2384 (2007)
Article Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)
Article Google Scholar
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theor. 14, 462–467 (1968)
Article MATH Google Scholar
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)
MATH Google Scholar
De Comité, F., Denis, F., Gilleron, R., Letouzey, F.: Positive and unlabeled examples help learning. In: Watanabe, O., Yokomori, T. (eds.) ALT 1999. LNCS (LNAI), vol. 1720, pp. 219–230. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46769-6_18
Chapter Google Scholar
Di Mauro, N., Vergari, A., Basile, T.M.A., Esposito, F.: Fast and accurate density estimation with extremely randomized cutset networks. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 203–219. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_13
Chapter Google Scholar
Di Mauro, N., Vergari, A., Basile, T.M.A.: Learning Bayesian random cutset forests. In: Esposito, F., Pivert, O., Hacid, M.-S., Raś, Z.W., Ferilli, S. (eds.) ISMIS 2015. LNCS (LNAI), vol. 9384, pp. 122–132. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25252-0_13
Chapter Google Scholar
Di Mauro, N., Vergari, A., Esposito, F.: Learning accurate cutset networks by exploiting decomposability. In: Gavanelli, M., Lamma, E., Riguzzi, F. (eds.) AI*IA 2015. LNCS (LNAI), vol. 9336, pp. 221–232. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24309-2_17
Chapter Google Scholar
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: KDD, pp. 213–220 (2008)
Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)
Article MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Hempstalk, K., Frank, E., Witten, I.H.: One-class classification by combining density and class probability estimation. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5211, pp. 505–519. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87479-9_51
Chapter Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MathSciNet MATH Google Scholar
Hoi, C.H., Chan, C.H., Huang, K., Lyu, M.R., King, I.: Biased support vector machine for relevance feedback in image retrieval. In: IJCNN, pp. 3189–3194 (2004)
Google Scholar
Ienco, D., Pensa, R.G.: Positive and unlabeled learning in categorical data. Neurocomputing 196, 113–124 (2016)
Article Google Scholar
Ienco, D., Pensa, R.G., Meo, R.: From context to distance: learning dissimilarity for categorical data clustering. TKDD 6(1), 1:1–1:25 (2012)
Article Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)
MATH Google Scholar
Li, H., Chen, Z., Liu, B., Wei, X., Shao, J.: Spotting fake reviews via collective positive-unlabeled learning. In: ICDM, pp. 899–904 (2014)
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM, pp. 179–188 (2003)
Google Scholar
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: ICML, pp. 387–394 (2002)
Google Scholar
Lowd, D., Rooshenas, A.: The Libra toolkit for probabilistic models. CoRR abs/1504.00110 (2015)
Google Scholar
Meila, M., Jordan, M.I.: Learning with mixtures of trees. JMLR 1, 1–48 (2000)
MathSciNet MATH Google Scholar
du Plessis, M.C., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. Neural Netw. 50, 110–119 (2014)
Article MATH Google Scholar
Riahi, F., Schulte, O., Li, Q.: A proposal for statistical outlier detection in relational structures. In: SRAI AAAI Workshop (2014)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Article MATH Google Scholar
Vergari, A., Di Mauro, N., Esposito, F.: Visualizing and understanding sum-product networks. CoRR abs/1608.08266 (2016)
Google Scholar
Vergari, A., Di Mauro, N., Esposito, F.: Simplifying, regularizing and strengthening sum-product network structure learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 343–358. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23525-7_21
Chapter Google Scholar
Xu, J., Shelton, C.R.: Intrusion detection using continuous time Bayesian networks. J. Artif. Int. Res. 39(1), 745–774 (2010)
MathSciNet MATH Google Scholar
Yang, E., Baker, Y., Ravikumar, P., Allen, G., Liu, Z.: Mixed graphical models via exponential families. In: AISTATS, pp. 1042–1050 (2014)
Google Scholar
Yang, P., Li, X.L., Mei, J.P., Kwoh, C.K., Ng, S.K.: Positive-unlabeled learning for disease gene identification. Bioinformatics 28, 2640–2647 (2012)
Article Google Scholar
Zhao, Y., Kong, X., Philip, S.Y.: Positive and unlabeled learning for graph classification. In: ICDM, pp. 962–971 (2011)
Google Scholar
Zhou, J., Pan, S., Mao, Q., Tsang, I.: Multi-view positive and unlabeled learning. In: ACML, pp. 555–570 (2012)
Google Scholar
Zhou, K., Gui-Rong, X., Yang, Q., Yu, Y.: Learning with positive and unlabeled examples using topic-sensitive PLSA. TKDE 22(1), 46–58 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bari “Aldo Moro”, Bari, Italy
Nicola Di Mauro, Floriana Esposito, Stefano Ferilli & Antonio Vergari
Department of Physics, University of Bari “Aldo Moro”, Bari, Italy
Teresa M. A. Basile
National Institute for Nuclear Physics (INFN), Bari Division, Bari, Italy
Teresa M. A. Basile

Authors

Teresa M. A. Basile
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Di Mauro
View author publications
You can also search for this author in PubMed Google Scholar
Floriana Esposito
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Ferilli
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Vergari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Di Mauro .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Bari Aldo Moro, Bari, Italy
Corrado Loglisci
CNR, Rende, Italy
Giuseppe Manco
CNR, Rende, Italy
Elio Masciari
University of North Carolina, Charlotte, North Carolina, USA
Zbigniew W. Ras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Basile, T.M.A., Di Mauro, N., Esposito, F., Ferilli, S., Vergari, A. (2018). Density Estimators for Positive-Unlabeled Learning. In: Appice, A., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2017. Lecture Notes in Computer Science(), vol 10785. Springer, Cham. https://doi.org/10.1007/978-3-319-78680-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-78680-3_4
Published: 24 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78679-7
Online ISBN: 978-3-319-78680-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics