Self-paced Learning for Imbalanced Data

Zięba, Maciej; Tomczak, Jakub M.; Świątek, Jerzy

doi:10.1007/978-3-662-49381-6_54

Maciej Zięba⁸,
Jakub M. Tomczak⁸ &
Jerzy Świątek⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9621))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

2477 Accesses

Abstract

In this paper, we propose a novel training paradigm that combines two learning strategies: cost-sensitive and self-paced learning. This learning approach can be applied to the decision problems where highly imbalanced data is used during training process. The main idea behind the proposed method is to start the learning process by taking large number of minority examples and only the easiest majority objects and then gradually turning to more difficult cases. We examine the quality of this training paradigm comparing to other learning schemas for neural network model using a set of highly imbalanced benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We consider two-class imbalanced data problem in which the minority class is assumed to be positive and the majority class is associated with the negative class.
2.
Absolute value can be omitted since \(\mathbf {v} \in \{0,1\}^n\).
3.
We can always utilize about 10–20% data for validation.
4.
AUC is defined as the arithmetic mean of True Positive Rate (TPR, called Sensitivity)and the True Negative Rate (TNR, called Specificity), \(AUC = \frac{TPR+TNR}{2}\). TP,TN,FP,FN are the elements of the confusion matrix, \(TPR = \frac{TP}{TP+FN}\) and \(TNR = \frac{TN}{TN+FP}\). We can represent the AUC value in such form if we consider classes, not probabilities while testing. In such case the ROC curve is represented by one point located in position (TPR,FPR). The area under ROC curve can be calculated using the procedure \(AUC=\frac{1+TPR-FPR}{2}\). Making use of \(TNR=1-FPR\) we have \(AUC=\frac{1}{2}(TPR+TNR)\). In our opinion this method of calculating AUC is better for imbalanced data problems, because it evaluates true predictions instead of the ordering of data that is used for evaluation.

References

Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48 (2009)
Google Scholar
Gorski, J., Pfeuffer, F., Klamroth, K.: Biconvex sets and optimization with biconvex functions: a survey and extensions. Math. Methods Oper. Res. 66(3), 373–407 (2007)
Article MathSciNet MATH Google Scholar
Jiang, L., Meng, D., Yu, S.I., Lan, Z., Shan, S., Hauptmann, A.: Self-paced learning with diversity. In: Advances in Neural Information Processing Systems, pp. 2078–2086 (2014)
Google Scholar
Jiang, L., Meng, D., Zhao, Q., Shan, S., Hauptmann, A.G.: Self-paced curriculum learning. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Krueger, K.A., Dayan, P.: Flexible shaping: How learning in small steps helps. Cognition 110(3), 380–394 (2009)
Article Google Scholar
Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NIPS, pp. 1189–1197 (2010)
Google Scholar
Tomczak, J.M., Zięba, M.: Classification restricted boltzmann machine for comprehensible credit scoring model. Expert Syst. Appl. 42(4), 1789–1796 (2015)
Article Google Scholar
Tomczak, J.M., Zięba, M.: Probabilistic combination of classification rules and its application to medical diagnosis. Mach. Learn. 101(1–3), 105–135 (2015)
Article MathSciNet MATH Google Scholar
Zhao, Q., Meng, D., Jiang, L., Xie, Q., Xu, Z., Hauptmann, A.G.: Self-paced learning for matrix factorization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar

Download references

Acknowledgments

The research conducted by the authors has been partially co-financed by the Ministry of Science and Higher Education, Republic of Poland, namely, Maciej Zięba: grant No. B50083/W8/K3, Jakub M. Tomczak: grant No. B50106/W8/K3.

Author information

Authors and Affiliations

Department of Computer Science, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Maciej Zięba, Jakub M. Tomczak & Jerzy Świątek

Authors

Maciej Zięba
View author publications
You can also search for this author in PubMed Google Scholar
Jakub M. Tomczak
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Świątek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Zięba .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Wrocław University of Technology, Wrocław, Poland
Bogdan Trawiński
Iwate Prefectural University, Takizawa, Japan
Hamido Fujita
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zięba, M., Tomczak, J.M., Świątek, J. (2016). Self-paced Learning for Imbalanced Data. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49381-6_54

Download citation

DOI: https://doi.org/10.1007/978-3-662-49381-6_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49380-9
Online ISBN: 978-3-662-49381-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics