Skip to main content
Log in

Graph-based PU learning for binary and multiclass classification without class prior

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

How can we classify graph-structured data only with positive labels? Graph-based positive-unlabeled (PU) learning is to train a binary classifier given only the positive labels when the relationship between examples is given as a graph. The problem is of great importance for various tasks such as detecting malicious accounts in a social network, which are difficult to be modeled by supervised learning when the true negative labels are absent. Previous works for graph-based PU learning assume that the prior distribution of positive nodes is known in advance, which is not true in many real-world cases. In this work, we propose GRAB (Graph-based Risk minimization with iterAtive Belief propagation), a novel end-to-end approach for graph-based PU learning that requires no class prior. GRAB runs marginalization and update steps iteratively. The marginalization step models the given graph as a Markov network and estimates the marginals of latent variables. The update step trains the binary classifier by utilizing the computed marginals in the objective function. We then generalize GRAB to multi-positive unlabeled (MPU) learning, where multiple positive classes exist in a dataset. Extensive experiments on five real-world datasets show that GRAB achieves the state-of-the-art performance, even when the true prior is given only to the competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Yu H., Han J., Chang K.C. (2002) PEBL: positive example based learning for web page classification using SVM. In: KDD

  2. Li M, Pan S, Zhang Y, Cai X (2016) Classifying networked text data with positive and unlabeled examples. Pattern Recognit, Lett

    Book  Google Scholar 

  3. Kiryo R., Niu G., du Plessis M.C., Sugiyama M. (2017) Positive-unlabeled learning with non-negative risk estimator. In: NIPS

  4. Nandanwar S., Murty M.N. (2016) Structural neighborhood based classification of nodes in a network. In: KDD

  5. Ma Y., Wang S., Aggarwal C.C., Tang J. (2019) Graph convolutional networks with eigenpooling. In: KDD

  6. Wang Y., Wang W., Liang Y., Cai Y., Liu J., Hooi B. (2020) Nodeaug: semi-supervised node classification with data augmentation. In: KDD

  7. Kipf T.N., Welling M. (2017) Semi-supervised classification with graph convolutional networks. In: ICLR

  8. Velickovic P., Cucurull G., Casanova A., Romero A., Liò P., Bengio Y. (2018) Graph attention networks. In: ICLR

  9. Velickovic P., Fedus W., Hamilton W.L., Liò P., Bengio Y., Hjelm R.D. Deep graph infomax. In: ICLR

  10. Zhang C., Ren D., Liu T., Yang J., Gong C. (2019) Positive and unlabeled learning with label disambiguation. In: IJCAI

  11. Yu S., Li C. (2007) PE-PUC: A graph based pu-learning approach for text classification. In: MLDM

  12. Ma S., Zhang R. (2017) PU-LP: a novel approach for positive and unlabeled learning by label propagation. In: ICMEW

  13. Wu M., Pan S., Du L., Tsang I.W., Zhu X., Du B. (2019) Long-short distance aggregation networks for positive unlabeled graph learning. In: CIKM

  14. Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach, Learn

    MATH  Google Scholar 

  15. Jaskie K., Spanias A. (2019) Positive and unlabeled learning algorithms and applications: a survey. In: IISA

  16. Li X., Liu B. (2005) Learning from positive and unlabeled examples with different data distributions. In: ECML

  17. Belkin M., Niyogi P., Sindhwani V. (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res

  18. du Plessis M.C., Niu G., Sugiyama M. (2014) Analysis of learning from positive and unlabeled data. In: NIPS

  19. Du Plessis M., Niu G., Sugiyama M. (2015) Convex formulation for learning from positive and unlabeled data. In: ICML

  20. Xu Y., Xu C., Xu C., Tao D. (2017) Multi-positive and unlabeled learning. In: Sierra, C. (ed.) Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp 3182–3188

  21. Shu S., Lin Z., Yan Y., Li L. (2020) Learning from multi-class positive and unlabeled data. In: Plant C., Wang H., Cuzzocrea A., Zaniolo C., Wu X (eds) 20th IEEE international conference on data mining., ICDM 2020, Sorrento, Italy, November 17–20, 2020, pp 1256–1261

  22. Yoo J., Kim J., Yoon H., Kim G., Jang C., Kang U. (2021) Accurate graph-based PU learning without class prior. In: IEEE international conference on data mining, ICDM 2021, Auckland, New Zealand, Dec 7–10, 2021, pp 827–836

  23. Yoo J., Jo S., Kang U. (2017) Supervised belief propagation: scalable supervised inference on attributed networks. In: ICDM

  24. Yoo J., Kang U., Scanagatta M., Corani G., Zaffalon M. (2020) Sampling subgraphs with guaranteed treewidth for accurate and efficient graphical inference. In: WSDM

  25. McGlohon M., Bay S., Anderle M.G., Steier D.M., Faloutsos C. (2009) SNARE: a link analytic system for graph labeling and risk detection. In: KDD

  26. Yoo J., Jeon H., Kang U. (2019) Belief propagation network for hard inductive semi-supervised learning. In: IJCAI

  27. Wu Z., Pan S., Chen F., Long G., Zhang C., Yu P.S. (2021) A comprehensive survey on graph neural networks. IEEE

  28. Zhang Z., Cui P., Zhu W. (2018) Deep learning on graphs: a survey. CoRR

  29. Koller D., Friedman N. Probabilistic graphical models—principles and techniques

  30. Dempster A.P., Laird N.M., Rubin D.B. (1977) Maximum likelihood from incomplete data via the em algorithm. J Royal Stat Soc Ser B (Methodol)

  31. Murphy K.P. Machine learning—a probabilistic perspective. Adaptive computation and machine learning series

  32. Chiang W., Liu X., Si S., Li Y., Bengio S., Hsieh C. (2019) Cluster-gcn: an efficient algorithm for training deep and large graph convolutional networks. In: KDD

  33. Yang Z., Cohen W.W., Salakhutdinov R. (2016) Revisiting semi-supervised learning with graph embeddings. In: ICLR

  34. Bojchevski A., Günnemann S. (2018) Deep gaussian embedding of graphs: unsupervised inductive learning via ranking. In: ICLR

  35. Mernyei P., Cangea C. (2020) Wiki-cs: a wikipedia-based benchmark for graph neural networks. CoRR

  36. Pennington J., Socher R., Manning C.D. (2014) Glove: global vectors for word representation. In: EMNLP

  37. Grover A., Leskovec J. (2016) node2vec: scalable feature learning for networks. In: KDD

  38. Pan S., Hu R., Long G., Jiang J., Yao L., Zhang C. (2018) Adversarially regularized graph autoencoder for graph embedding. In: IJCAI

  39. Kingma D.P., Ba J. (2015) Adam: a method for stochastic optimization. In: ICLR

Download references

Acknowledgements

This research is supported from NCSOFT Co. U Kang is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to U Kang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoo, J., Kim, J., Yoon, H. et al. Graph-based PU learning for binary and multiclass classification without class prior. Knowl Inf Syst 64, 2141–2169 (2022). https://doi.org/10.1007/s10115-022-01702-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01702-8

Keywords

Navigation