Abstract
This paper investigates the use of the Area Under the ROC Curve (AUC) as an alternative criteria for model selection in classification problems with unbalanced datasets. A novel algorithm, named here as AUCMLP, which incorporates AUC optimization into the Multi-layer Perceptron (MLPs) learning process is presented. The basic principle of AUCMLP is the solution of an optimization problem that aims at ranking quality as well as the separability of class distributions with respect to the threshold decision. Preliminary results achieved on real data, point out that our approach is promising, and can lead to better decision surfaces, specially under more severe unbalance conditions.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Rumelhart, D.E., McClelland, J.L.: Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1: Foundations. MIT Press (1986)
Lan, J., Hu, M.Y., Patuwo, E., Zhang, G.P.: An investigation of neural network classifiers with unequal misclassification costs and group sizes. Decis. Support Syst. 48, 582–591 (2010)
Fawcett, T.: An introduction to ROC analysis. Pat. Rec. Lett. 27, 861–874 (2006)
Rudin, C., Schapire, R.E.: Margin-based ranking and an equivalence between AdaBoost and RankBoost. J. of Mach. Learn. Research 10, 2193–2232 (2009)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In: ICML 2003: Proceedings of the 20th Int. Conf. on Machine Learning, pp. 848–855 (2003)
Joachims, T.: A support vector method for multivariate performance measures. In: ICML 2005: Proc. of the 22nd Int. Conf. on Machine learning, pp. 377–384 (2005)
Herschtal, A., Raskutti, B., Campbell, P.K.: Area under ROC optimization using a ramp approximation. In: Proc. of 6th Int. Conf. on Data Mining, pp. 1–11 (2006)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. on Knowledge and Data Engineering 21, 1263–1284 (2009)
Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans. on Neural Networks 21, 813–830 (2010)
Hanley, J.A., Mcneil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)
Batista, G., Prati, R., Monard, M.: A study of the behavior of methods for balancing machine learning training data. SIGKDD Expl. Newsl. 6, 20–29 (2004)
Chen, S., He, H., Garcia, E.A.: Ramoboost: ranked minority oversampling in boosting. IEEE Trans. on Neural Networks 21, 1624–1642 (2010)
UCI machine learning repository, http://archive.ics.uci.edu/ml/
Wu, G., Chang, E.: KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. on Knowl. and Data Eng. 17, 786–795 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Castro, C.L., Braga, A.P. (2012). Improving ANNs Performance on Unbalanced Data with an AUC-Based Learning Algorithm. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33266-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-33266-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33265-4
Online ISBN: 978-3-642-33266-1
eBook Packages: Computer ScienceComputer Science (R0)