Skip to main content

Self-paced Learning for Imbalanced Data

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9621))

Included in the following conference series:

  • 2477 Accesses

Abstract

In this paper, we propose a novel training paradigm that combines two learning strategies: cost-sensitive and self-paced learning. This learning approach can be applied to the decision problems where highly imbalanced data is used during training process. The main idea behind the proposed method is to start the learning process by taking large number of minority examples and only the easiest majority objects and then gradually turning to more difficult cases. We examine the quality of this training paradigm comparing to other learning schemas for neural network model using a set of highly imbalanced benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We consider two-class imbalanced data problem in which the minority class is assumed to be positive and the majority class is associated with the negative class.

  2. 2.

    Absolute value can be omitted since \(\mathbf {v} \in \{0,1\}^n\).

  3. 3.

    We can always utilize about 10–20% data for validation.

  4. 4.

    AUC is defined as the arithmetic mean of True Positive Rate (TPR, called Sensitivity)and the True Negative Rate (TNR, called Specificity), \(AUC = \frac{TPR+TNR}{2}\). TP,TN,FP,FN are the elements of the confusion matrix, \(TPR = \frac{TP}{TP+FN}\) and \(TNR = \frac{TN}{TN+FP}\). We can represent the AUC value in such form if we consider classes, not probabilities while testing. In such case the ROC curve is represented by one point located in position (TPR,FPR). The area under ROC curve can be calculated using the procedure \(AUC=\frac{1+TPR-FPR}{2}\). Making use of \(TNR=1-FPR\) we have \(AUC=\frac{1}{2}(TPR+TNR)\). In our opinion this method of calculating AUC is better for imbalanced data problems, because it evaluates true predictions instead of the ordering of data that is used for evaluation.

References

  1. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)

    Google Scholar 

  2. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)

    Google Scholar 

  3. Gorski, J., Pfeuffer, F., Klamroth, K.: Biconvex sets and optimization with biconvex functions: a survey and extensions. Math. Methods Oper. Res. 66(3), 373–407 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. Jiang, L., Meng, D., Yu, S.I., Lan, Z., Shan, S., Hauptmann, A.: Self-paced learning with diversity. In: Advances in Neural Information Processing Systems, pp. 2078–2086 (2014)

    Google Scholar 

  5. Jiang, L., Meng, D., Zhao, Q., Shan, S., Hauptmann, A.G.: Self-paced curriculum learning. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  6. Krueger, K.A., Dayan, P.: Flexible shaping: How learning in small steps helps. Cognition 110(3), 380–394 (2009)

    Article  Google Scholar 

  7. Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS, pp. 1189–1197 (2010)

    Google Scholar 

  8. Tomczak, J.M., Zięba, M.: Classification restricted boltzmann machine for comprehensible credit scoring model. Expert Syst. Appl. 42(4), 1789–1796 (2015)

    Article  Google Scholar 

  9. Tomczak, J.M., Zięba, M.: Probabilistic combination of classification rules and its application to medical diagnosis. Mach. Learn. 101(1–3), 105–135 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Zhao, Q., Meng, D., Jiang, L., Xie, Q., Xu, Z., Hauptmann, A.G.: Self-paced learning for matrix factorization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

Download references

Acknowledgments

The research conducted by the authors has been partially co-financed by the Ministry of Science and Higher Education, Republic of Poland, namely, Maciej Zięba: grant No. B50083/W8/K3, Jakub M. Tomczak: grant No. B50106/W8/K3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Zięba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zięba, M., Tomczak, J.M., Świątek, J. (2016). Self-paced Learning for Imbalanced Data. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49381-6_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49381-6_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49380-9

  • Online ISBN: 978-3-662-49381-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics