Skip to main content

Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data

  • Conference paper
  • First Online:
Computational Science – ICCS 2022 (ICCS 2022)

Abstract

In our work, we consider the classification methods based on the model of logistic regression for positive and unlabeled data. We examine the following four methods of the posterior probability estimation, where the risk of logistic loss function is optimized, namely: the naive approach, the weighted likelihood approach, as well as the quite recently proposed methods - the joint approach, and the LassoJoint method. The objective of our study is to evaluate the accuracy, the recall, the precision and the F1-score of the considered classification methods. The corresponding assessments have been carried out on 13 machine learning model schemes by conducting some numerical experiments on selected real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This research was carried out with the support of the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) at the University of Warsaw, under computational allocation No. g88-1185.

  2. 2.

    http://github.com/kfurmanczyk/ICCS22.

References

  1. Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5

    Article  MathSciNet  MATH  Google Scholar 

  2. Dua, D., Graff, C.: UCI Machine Learning Repository (2019). [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  3. Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0 (2015)

    Google Scholar 

  4. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22, (2010). https://www.jstatsoft.org/v33/i01/

  5. Furmańczyk, K., Dudziński, M., Dziewa-Dawidczyk, D.: Some proposal of the high dimensional PU learning classification procedure. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds.) ICCS 2021. LNCS, vol. 12744, pp. 18–25. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77967-2_2

    Chapter  Google Scholar 

  6. Guo, T., Xu, C., Huang, J., Wang, Y., Shi, B., Xu, C., Tao, D.: On positive-unlabeled classification in GAN. In: CVPR, pp. 8385–8393 (2020)

    Google Scholar 

  7. Hastie, T., Fithian, W.: Inference from presence-only data; the ongoing controversy. Ecography 36, 864–867 (2013)

    Article  Google Scholar 

  8. Hou, M., Chaib-draa, B., Li, C., Zhao, Q.: Generative adversarial positive-unlabeled learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), pp. 4864–4873 (2018)

    Google Scholar 

  9. Kuhn, M. caret: Classification and Regression Training. R package version 6.0-90 (2021). https://CRAN.R-project.org/package=caret

  10. Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: ICML, Washington, D.C., AAAI Press, pp. 448–455, August 2003

    Google Scholar 

  11. Łazęcka, M., Mielniczuk, J., Teisseyre, P.: Estimating the class prior for positive and unlabelled data via logistic regression. Adv. Data Anal. Classif. 15(4), 1039–1068 (2021). https://doi.org/10.1007/s11634-021-00444-9

    Article  MathSciNet  MATH  Google Scholar 

  12. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. R Pack. Vers. 1, 7–9 (2021). https://CRAN.R-project.org/package=e1071

  13. Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2013)

    Google Scholar 

  14. Sansone, E., De Natale, F. G. B., Zhou, Z.H.: Efficient training for positive unlabeled learning. TPAMI 41(11), 2584–2598 (2018)

    Google Scholar 

  15. Teisseyre, Paweł, Mielniczuk, Jan, Łazęcka, Ma.łgorzata: Different strategies of fitting logistic regression for positive and unlabelled data. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 3–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_1

    Chapter  Google Scholar 

  16. Teisseyre, P.: Repository from https://github.com/teisseyrep/Pulogistic. Accessed 25 Jan 2022

  17. Teisseyre, P.: Repository from. https://github.com/teisseyrep/PU_class_prior. Accessed 25 Jan 2022

  18. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  19. Yang, P., Liu, W., Yang. J.: 6. Positive unlabeled learning via wrapper-based adaptive sampling. In: 6 International Joint Conferences on Artificial Intelligence (IJCAI), pp. 3272–3279 (2017)

    Google Scholar 

  20. Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J.: 7. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Trans. Cybern. 49(5), 1932–1943 (2018) https://doi.org/10.1109/TCYB.2018.2816984

  21. Yang, P.: AdaSampling: adaptive sampling for positive unlabeled and label noise learning. R Pack. Vers. 1, 3 (2019). https://CRAN.R-project.org/package=AdaSampling

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konrad Furmańczyk .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 132 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Furmańczyk, K., Paczutkowski, K., Dudziński, M., Dziewa-Dawidczyk, D. (2022). Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13350. Springer, Cham. https://doi.org/10.1007/978-3-031-08751-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08751-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08750-9

  • Online ISBN: 978-3-031-08751-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics