Improving ANNs Performance on Unbalanced Data with an AUC-Based Learning Algorithm

Castro, Cristiano L.; Braga, Antônio P.

doi:10.1007/978-3-642-33266-1_39

Improving ANNs Performance on Unbalanced Data with an AUC-Based Learning Algorithm

Cristiano L. Castro²¹ &
Antônio P. Braga²¹

Conference paper

3213 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7553))

Abstract

This paper investigates the use of the Area Under the ROC Curve (AUC) as an alternative criteria for model selection in classification problems with unbalanced datasets. A novel algorithm, named here as AUCMLP, which incorporates AUC optimization into the Multi-layer Perceptron (MLPs) learning process is presented. The basic principle of AUCMLP is the solution of an optimization problem that aims at ranking quality as well as the separability of class distributions with respect to the threshold decision. Preliminary results achieved on real data, point out that our approach is promising, and can lead to better decision surfaces, specially under more severe unbalance conditions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rumelhart, D.E., McClelland, J.L.: Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1: Foundations. MIT Press (1986)
Google Scholar
Lan, J., Hu, M.Y., Patuwo, E., Zhang, G.P.: An investigation of neural network classifiers with unequal misclassification costs and group sizes. Decis. Support Syst. 48, 582–591 (2010)
Article Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pat. Rec. Lett. 27, 861–874 (2006)
Article Google Scholar
Rudin, C., Schapire, R.E.: Margin-based ranking and an equivalence between AdaBoost and RankBoost. J. of Mach. Learn. Research 10, 2193–2232 (2009)
MATH MathSciNet Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Google Scholar
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In: ICML 2003: Proceedings of the 20th Int. Conf. on Machine Learning, pp. 848–855 (2003)
Google Scholar
Joachims, T.: A support vector method for multivariate performance measures. In: ICML 2005: Proc. of the 22nd Int. Conf. on Machine learning, pp. 377–384 (2005)
Google Scholar
Herschtal, A., Raskutti, B., Campbell, P.K.: Area under ROC optimization using a ramp approximation. In: Proc. of 6th Int. Conf. on Data Mining, pp. 1–11 (2006)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. on Knowledge and Data Engineering 21, 1263–1284 (2009)
Article Google Scholar
Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans. on Neural Networks 21, 813–830 (2010)
Article Google Scholar
Hanley, J.A., Mcneil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)
Google Scholar
Batista, G., Prati, R., Monard, M.: A study of the behavior of methods for balancing machine learning training data. SIGKDD Expl. Newsl. 6, 20–29 (2004)
Article Google Scholar
Chen, S., He, H., Garcia, E.A.: Ramoboost: ranked minority oversampling in boosting. IEEE Trans. on Neural Networks 21, 1624–1642 (2010)
Article Google Scholar
UCI machine learning repository, http://archive.ics.uci.edu/ml/
Wu, G., Chang, E.: KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. on Knowl. and Data Eng. 17, 786–795 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Federal University of Lavras, 37200-000, Lavras, MG, Brazil
Cristiano L. Castro & Antônio P. Braga

Authors

Cristiano L. Castro
View author publications
You can also search for this author in PubMed Google Scholar
Antônio P. Braga
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Neuro Heuristic Research Group, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa
Department of Informatics, Nicolaus Copernicus University, 87-100, Toruń, Poland
Włodzisław Duch
Center for Complex Systems Studies, Kalamazoo College, 49006, Kalamazoo, MI, USA
Péter Érdi
Dipartimento di Informatica e Scienze dell’Informazione, Università di Genova, 16146, Genoa, Italy
Francesco Masulli
Institut für Neuroinformatik, Universität Ulm, 89069, Ulm, Germany
Günther Palm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castro, C.L., Braga, A.P. (2012). Improving ANNs Performance on Unbalanced Data with an AUC-Based Learning Algorithm. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33266-1_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-33266-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33265-4
Online ISBN: 978-3-642-33266-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics