Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data

Furmańczyk, Konrad; Paczutkowski, Kacper; Dudziński, Marcin; Dziewa-Dawidczyk, Diana

doi:10.1007/978-3-031-08751-6_3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13350))

Included in the following conference series:

International Conference on Computational Science

1108 Accesses

Abstract

In our work, we consider the classification methods based on the model of logistic regression for positive and unlabeled data. We examine the following four methods of the posterior probability estimation, where the risk of logistic loss function is optimized, namely: the naive approach, the weighted likelihood approach, as well as the quite recently proposed methods - the joint approach, and the LassoJoint method. The objective of our study is to evaluate the accuracy, the recall, the precision and the F1-score of the considered classification methods. The corresponding assessments have been carried out on 13 machine learning model schemes by conducting some numerical experiments on selected real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This research was carried out with the support of the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) at the University of Warsaw, under computational allocation No. g88-1185.
2.
http://github.com/kfurmanczyk/ICCS22.

References

Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5
Article MathSciNet MATH Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository (2019). [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0 (2015)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22, (2010). https://www.jstatsoft.org/v33/i01/
Furmańczyk, K., Dudziński, M., Dziewa-Dawidczyk, D.: Some proposal of the high dimensional PU learning classification procedure. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds.) ICCS 2021. LNCS, vol. 12744, pp. 18–25. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77967-2_2
Chapter Google Scholar
Guo, T., Xu, C., Huang, J., Wang, Y., Shi, B., Xu, C., Tao, D.: On positive-unlabeled classification in GAN. In: CVPR, pp. 8385–8393 (2020)
Google Scholar
Hastie, T., Fithian, W.: Inference from presence-only data; the ongoing controversy. Ecography 36, 864–867 (2013)
Article Google Scholar
Hou, M., Chaib-draa, B., Li, C., Zhao, Q.: Generative adversarial positive-unlabeled learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), pp. 4864–4873 (2018)
Google Scholar
Kuhn, M. caret: Classification and Regression Training. R package version 6.0-90 (2021). https://CRAN.R-project.org/package=caret
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: ICML, Washington, D.C., AAAI Press, pp. 448–455, August 2003
Google Scholar
Łazęcka, M., Mielniczuk, J., Teisseyre, P.: Estimating the class prior for positive and unlabelled data via logistic regression. Adv. Data Anal. Classif. 15(4), 1039–1068 (2021). https://doi.org/10.1007/s11634-021-00444-9
Article MathSciNet MATH Google Scholar
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. R Pack. Vers. 1, 7–9 (2021). https://CRAN.R-project.org/package=e1071
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2013)
Google Scholar
Sansone, E., De Natale, F. G. B., Zhou, Z.H.: Efficient training for positive unlabeled learning. TPAMI 41(11), 2584–2598 (2018)
Google Scholar
Teisseyre, Paweł, Mielniczuk, Jan, Łazęcka, Ma.łgorzata: Different strategies of fitting logistic regression for positive and unlabelled data. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 3–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_1
Chapter Google Scholar
Teisseyre, P.: Repository from https://github.com/teisseyrep/Pulogistic. Accessed 25 Jan 2022
Teisseyre, P.: Repository from. https://github.com/teisseyrep/PU_class_prior. Accessed 25 Jan 2022
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Yang, P., Liu, W., Yang. J.: 6. Positive unlabeled learning via wrapper-based adaptive sampling. In: 6 International Joint Conferences on Artificial Intelligence (IJCAI), pp. 3272–3279 (2017)
Google Scholar
Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J.: 7. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Trans. Cybern. 49(5), 1932–1943 (2018) https://doi.org/10.1109/TCYB.2018.2816984
Yang, P.: AdaSampling: adaptive sampling for positive unlabeled and label noise learning. R Pack. Vers. 1, 3 (2019). https://CRAN.R-project.org/package=AdaSampling

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Warsaw University of Life Sciences, Warsaw, Poland
Konrad Furmańczyk, Kacper Paczutkowski, Marcin Dudziński & Diana Dziewa-Dawidczyk

Authors

Konrad Furmańczyk
View author publications
You can also search for this author in PubMed Google Scholar
Kacper Paczutkowski
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Dudziński
View author publications
You can also search for this author in PubMed Google Scholar
Diana Dziewa-Dawidczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konrad Furmańczyk .

Editor information

Editors and Affiliations

Brunel University London, London, UK
Derek Groen
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 132 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furmańczyk, K., Paczutkowski, K., Dudziński, M., Dziewa-Dawidczyk, D. (2022). Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13350. Springer, Cham. https://doi.org/10.1007/978-3-031-08751-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-08751-6_3
Published: 15 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08750-9
Online ISBN: 978-3-031-08751-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data