article

Evaluating the Predictive Performance of Positive- Unlabelled Classifiers: a brief critical review and practical recommendations for improvement

Authors:
Jack D. Saunders

University of Kent, Canterbury, United Kingdom

University of Kent, Canterbury, United Kingdom
View Profile

,
Alex A. Freitas

University of Kent, Canterbury, United Kingdom

University of Kent, Canterbury, United Kingdom
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 24 Issue 2December 2022pp 5–11https://doi.org/10.1145/3575637.3575642

Published:08 December 2022Publication History

ACM SIGKDD Explorations Newsletter

Abstract

Positive-Unlabelled (PU) learning is a growing area of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances. Whilst much work has been done proposing methods for PU learning, little has been written on the subject of evaluating these methods. Many popular standard classification metrics cannot be precisely calculated due to the absence of fully labelled data, so alternative approaches must be taken. This short commentary paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers and provides practical recommendations for improvements in this area.

References

Bekker, J. and Davis, J., 2020. Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), pp.719--760.Google ScholarDigital Library
Elkan, C. and Noto, K., 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213--220.Google Scholar
Nikdelfaz, O. and Jalili, S., 2018. Disease genes prediction by HMM based PU-learning using gene expression profiles. Journal of Biomedical Informatics, 81, pp.102--111.Google ScholarCross Ref
Vasighizaker, A. and Jalili, S. 2018. C-PUGP: A cluster-based positive unlabelled learning method for disease gene prediction and prioritisation. Computational Biology and Chemistry, 76, pp. 23--31.Google ScholarCross Ref
Yang, P., Li, X., Mei, K., et al. 2012. Positive-unlabelled learning for disease gene identification. Bioinformatics, 28(20), pp. 2640--2647.Google ScholarDigital Library
Liu, L. and Peng, T., 2014. Clustering-based Method for Positive and Unlabelled Text Categorization Enhanced by Improved TFIDF. Journal of Information Science and Engineering, 30, pp. 1463--1481.Google Scholar
Ke, T., Yang, B., Zhen, L., et al. 2012. Building highperformance classifiers using positive and unlabelled examples for text. International Symposium on Neural Networks, pp. 187--195.Google Scholar
Liu, B., Yu, P., and Li, X. 2002. Partially supervised classification of text documents. International Conference on Machine Learning, 2(485), pp. 387--394.Google Scholar
Zhang, Y., Li, L., Zhou, J., et al. 2017. Poster: A PU learning based system for potential malicious URL detection. Proceedings of the ACM Conference on Computer and Communications Security, pp. 2599--2601.Google ScholarDigital Library
Luo, Y., Cheng, S., Liu, C., et al. 2018. PU learning in payload-based web anomaly detection. Proceedings of the Third International Conference on Security of Smart Cities, Industrial Control System and Communications, pp. 1--5.Google ScholarCross Ref
Van Engelen, J.E. and Hoos, H.H., 2020. A survey on semisupervised learning. Machine Learning, 109(2), pp.373- 440.Google ScholarCross Ref
Jaskie, K. and Spanias, A., 2019. Positive and unlabeled learning algorithms and applications: A survey. In Proceedings of the 10th International Conference on Information, Intelligence, Systems and Applications (pp. 1- 8).Google Scholar
Li, G., 2013. A survey on positive and unlabelled learning. Computer & Information Sciences.Google Scholar
Japkowicz, N. and Shah, M., 2011. Evaluating Learning Algorithms: a classification perspective. Cambridge University Press, 2011.Google Scholar
Bekker, J. and Davis, J., 2018. Estimating the class prior in positive and unlabeled data through decision tree induction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 32(1), pp.2712--2719.Google Scholar
Du Plessis, M.C. and Sugiyama, M., 2014. Class prior estimation from positive and unlabeled data. IEICE TRANSACTIONS on Information and Systems, 97(5), pp.1358--1362.Google ScholarCross Ref
Nguyen, M.N., Li, X.L. and Ng, S.K., 2011. Positive unlabeled learning for time series classification. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2, pp.1421--1426.Google Scholar
Zhou, K., Gui-Rong, X., Yang, Q., et al. 2010. Learning with positive and unlabelled examples using topic-sensitive PLSA. IEEE Transactions on Knowledge and Data Engineering, 22(1), pp. 46--58.Google ScholarDigital Library
Basile, T., Di Mauro, N., Esposito, F., et al. 2018. Density estimators for positive-unlabelled learning. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns, pp.49--64.Google Scholar
Bekker, J., and Davis, J., 2017. Positive and unlabelled relational classification through label frequency estimation. In Proceedings of the International Conference on Inductive Logic Programming, pp.16--30.Google Scholar
Calvo, B., Larrañaga, P., and Lozano, J., 2007. Learning Bayesian classifiers from positive and unlabelled examples. Pattern Recognition Letters, 28(16), pp.2375--2384.Google ScholarDigital Library
Chaudhari, S., and Shevade, S., 2012. Learning from positive and unlabelled examples using maximum margin clustering. In Proceedings of the International Conference on Neural Information Processing, pp.465--473.Google Scholar
Chiaroni, F., Rahal, M., Hueber, N., et al. 2018. Learning with a generative adversarial network from a positive unlabeled dataset for image classification. In Proceedings of the 25th IEEE International Conference on Image Processing, pp.1368--1372.Google ScholarCross Ref
Claesen, M., De Smet, F., Suykens, J.A. and De Moor, B., 2015. A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing, 160, pp.73--84.Google ScholarDigital Library
Denis, F., Gilleron, R., and Letouzey, F., 2005. Learning from positive and unlabeled examples. Theoretical Computer Science, pp.70--83.Google Scholar
Fung, C., Yu, J., Lu, H., et al. 2006. Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering, 18(1), pp.6--20.Google ScholarDigital Library
Gan, H., Zhang, Y., and Song, Q., 2017. Bayesian belief network for positive unlabeled learning with uncertainty. Pattern Recognition Letters, 90, pp.28--35.Google ScholarDigital Library
He, F., Liu, T., Webb, G.I. and Tao, D., 2018. Instancedependent PU learning by Bayesian optimal relabeling. arXiv preprint arXiv:1808.02180.Google Scholar
He. J., Zhang, Y., Li, X., et al. 2010. Naïve Bayes classifier for positive unlabeled learning with uncertainty. In Proceedings of the 2010 SIAM International Conference on Data Mining, pp.361--372.Google ScholarCross Ref
Hou, M., Chaib-draa, B., Li, C., et al. 2018. Generative adversarial positive-unlabeled learning. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp.2255--2261.Google ScholarCross Ref
Ienco, D., and Pensa, R., 2016. Positive and unlabeled learning in categorical data. Neurocomputing, 196, pp.113- 124.Google ScholarDigital Library
Kato, M., Teshima, T. and Honda, J., 2019. Learning from positive and unlabeled data with a selection bias. Representations, pp.1--17.Google Scholar
Ke, T., Lv, H., Sun, M., et al. 2018. A biased least squares support vector machine based on Mahalanobis distance for PU learning. Physica A: Statistical Mechanics and its Applications, 509, pp.422--438.Google Scholar
Ke, T., Jing, L., Lv, H., et al. 2018. Global and local learning from positive and unlabeled examples. Artificial Intelligence, 48(8), pp.2373--2392.Google ScholarDigital Library
Lan, W., Wang, J., Li, M., et al. 2016. Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing, 206, pp.50--57.Google ScholarDigital Library
Denis, F., Laurent, A., Gilleron, R., et al, 2003. Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data, pp. 80--87.Google Scholar
Lee, W.S. and Liu, B., 2003. Learning with positive and unlabeled examples using weighted logistic regression. In Proceedings of the International Conference on Machine Learning, 3, pp.448--455.Google Scholar
Li, W., Guo, Q. and Elkan, C., 2010. A positive and unlabeled learning algorithm for one-class classification of remote-sensing data. IEEE Transactions on Geoscience and Remote Sensing, 49(2), pp.717--725.Google ScholarCross Ref
Li, X. and Liu, B., 2003. Learning to classify texts using positive and unlabeled data. In Proceedings of the International Joint Conference on Artificial Intelligence, 3, pp.587--592.Google Scholar
Li, X.L. and Liu, B., 2005. Learning from positive and unlabeled examples with different data distributions. In Proceedings of the European Conference on Machine Learning, pp. 218--229.Google Scholar
Li, X., Liu, B. and Ng, S.K., 2007. Learning to Identify Unexpected Instances in the Test Set. In Proceedings of the International Joint Conference on Artificial Intelligence, 7, pp.2802--2807.Google Scholar
Li, X.L., Yu, P.S., Liu, B. and Ng, S.K., 2009. Positive unlabeled learning for data stream classification. In Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 259--270.Google Scholar
Li, F., Zhang, Y., Purcell, A.W., Webb, G.I., Chou, K.C., Lithgow, T., Li, C. and Song, J., 2019. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics, 20(1), pp.1--17.Google ScholarCross Ref
Liang, C., Zhang, Y., Shi, P. and Hu, Z., 2012. Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. Information Sciences, 213, pp.50- 67.Google ScholarDigital Library
Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S., 2003. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining, pp. 179--186.Google Scholar
Mordelet, F. and Vert, J.P., 2014. A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters, 37, pp.201--209.Google ScholarDigital Library
Peng, T., Zuo, W. and He, F., 2008. SVM based adaptive learning method for text classification from positive and unlabeled documents. Knowledge and Information Systems, 16(3), pp.281--301.Google ScholarDigital Library
Qin, X., Zhang, Y., Li, C. and Li, X., 2013. Learning from data streams with only positive and unlabeled data. Journal of Intelligent Information Systems, 40(3), pp.405--430.Google ScholarDigital Library
Xu, Z., Qi, Z. and Zhang, J., 2014. Learning with positive and unlabeled examples using biased twin support vector machine. Neural Computing and Applications, 25(6), pp.1303--1311.Google ScholarDigital Library
Yang, P., Ormerod, J.T., Liu, W., Ma, C., Zomaya, A.Y. and Yang, J.Y., 2018. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Transactions on Cybernetics, 49(5), pp.1932--1943.Google ScholarCross Ref
Yu, H., 2005. Single-class classification with mapping convergence. Machine Learning, 61(1), pp.49--69.Google ScholarDigital Library
Zeng, X., Zhong, Y., Lin, W. and Zou, Q., 2020. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in Bioinformatics, 21(4), pp.1425--1436.Google ScholarCross Ref
Zhang, Y., Ju, X. and Tian, Y., 2014. Nonparallel hyperplane support vector machine for pu learning. In Proceedings of the 10th International Conference on Natural Computation, pp. 703--708.Google Scholar
Zhang, D. and Lee, W.S., 2005. A simple probabilistic approach to learning from positive and unlabeled examples. In Proceedings of the 5th Annual UK Workshop on Computational Intelligence, pp. 83--87.Google Scholar
Zhang, B. and Zuo, W., 2009. Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples. Journal of Computers, 4(1), pp.94--101.Google ScholarCross Ref
Zheng, Y., Peng, H., Zhang, X., Zhao, Z., Gao, X. and Li, J., 2019. DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions. BMC Bioinformatics, 20(19), pp.1--12.Google Scholar
Zhou, J.T., Pan, S.J., Mao, Q. and Tsang, I.W., 2012. Multiview positive and unlabeled learning. In Proceedings of the Asian Conference on Machine Learning, pp.555--570.Google Scholar
Juba, B. and Le, H.S., 2019. Precision-recall versus accuracy and the role of large data sets. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(1),pp. 4039--4048.Google ScholarDigital Library

Recommendations

Learning classifiers from only positive and unlabeled data
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case ...
Read More
Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled Learning
Artificial Evolution
Abstract
Positive-Unlabelled (PU) learning is a growing area of machine learning that aims to learn classifiers from data consisting of a set of labelled positive instances and a set of unlabelled instances, where the latter can be either positive or ...
Read More
Classifier chains for positive unlabelled multi-label learning
Abstract
In traditional multi-label setting it is assumed that all relevant labels are assigned to the given instance. In positive unlabelled setting, only some of relevant labels are assigned. The appearance of a label means that the instance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGKDD Explorations Newsletter Volume 24, Issue 2
December 2022
130 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/3575637
Editors:
Xiangliang Zhang,
Brian Davison,
Jiayu Zhou,
Srijan Kumar
Issue’s Table of Contents
Copyright © 2022 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 December 2022
Check for updates
Author Tags
classification
machine learning
positive-unlabelled learning
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 58
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluating the Predictive Performance of Positive- Unlabelled Classifiers: a brief critical review and practical recommendations for improvement

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Recommendations

Learning classifiers from only positive and unlabeled data

Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled Learning

Classifier chains for positive unlabelled multi-label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Evaluating the Predictive Performance of Positive- Unlabelled Classifiers: a brief critical review and practical recommendations for improvement

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Recommendations

Learning classifiers from only positive and unlabeled data

Evaluating a New Genetic Algorithm for Automated Machine Learning in Positive-Unlabelled Learning

Classifier chains for positive unlabelled multi-label learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media