Abstract
In our work, we consider the classification methods based on the model of logistic regression for positive and unlabeled data. We examine the following four methods of the posterior probability estimation, where the risk of logistic loss function is optimized, namely: the naive approach, the weighted likelihood approach, as well as the quite recently proposed methods - the joint approach, and the LassoJoint method. The objective of our study is to evaluate the accuracy, the recall, the precision and the F1-score of the considered classification methods. The corresponding assessments have been carried out on 13 machine learning model schemes by conducting some numerical experiments on selected real datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This research was carried out with the support of the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) at the University of Warsaw, under computational allocation No. g88-1185.
- 2.
References
Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5
Dua, D., Graff, C.: UCI Machine Learning Repository (2019). [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0 (2015)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22, (2010). https://www.jstatsoft.org/v33/i01/
Furmańczyk, K., Dudziński, M., Dziewa-Dawidczyk, D.: Some proposal of the high dimensional PU learning classification procedure. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds.) ICCS 2021. LNCS, vol. 12744, pp. 18–25. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77967-2_2
Guo, T., Xu, C., Huang, J., Wang, Y., Shi, B., Xu, C., Tao, D.: On positive-unlabeled classification in GAN. In: CVPR, pp. 8385–8393 (2020)
Hastie, T., Fithian, W.: Inference from presence-only data; the ongoing controversy. Ecography 36, 864–867 (2013)
Hou, M., Chaib-draa, B., Li, C., Zhao, Q.: Generative adversarial positive-unlabeled learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), pp. 4864–4873 (2018)
Kuhn, M. caret: Classification and Regression Training. R package version 6.0-90 (2021). https://CRAN.R-project.org/package=caret
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: ICML, Washington, D.C., AAAI Press, pp. 448–455, August 2003
Łazęcka, M., Mielniczuk, J., Teisseyre, P.: Estimating the class prior for positive and unlabelled data via logistic regression. Adv. Data Anal. Classif. 15(4), 1039–1068 (2021). https://doi.org/10.1007/s11634-021-00444-9
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. R Pack. Vers. 1, 7–9 (2021). https://CRAN.R-project.org/package=e1071
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2013)
Sansone, E., De Natale, F. G. B., Zhou, Z.H.: Efficient training for positive unlabeled learning. TPAMI 41(11), 2584–2598 (2018)
Teisseyre, Paweł, Mielniczuk, Jan, Łazęcka, Ma.łgorzata: Different strategies of fitting logistic regression for positive and unlabelled data. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 3–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_1
Teisseyre, P.: Repository from https://github.com/teisseyrep/Pulogistic. Accessed 25 Jan 2022
Teisseyre, P.: Repository from. https://github.com/teisseyrep/PU_class_prior. Accessed 25 Jan 2022
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. 58, 267–288 (1996)
Yang, P., Liu, W., Yang. J.: 6. Positive unlabeled learning via wrapper-based adaptive sampling. In: 6 International Joint Conferences on Artificial Intelligence (IJCAI), pp. 3272–3279 (2017)
Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J.: 7. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Trans. Cybern. 49(5), 1932–1943 (2018) https://doi.org/10.1109/TCYB.2018.2816984
Yang, P.: AdaSampling: adaptive sampling for positive unlabeled and label noise learning. R Pack. Vers. 1, 3 (2019). https://CRAN.R-project.org/package=AdaSampling
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Furmańczyk, K., Paczutkowski, K., Dudziński, M., Dziewa-Dawidczyk, D. (2022). Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13350. Springer, Cham. https://doi.org/10.1007/978-3-031-08751-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-08751-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08750-9
Online ISBN: 978-3-031-08751-6
eBook Packages: Computer ScienceComputer Science (R0)