Skip to main content

Improving ANNs Performance on Unbalanced Data with an AUC-Based Learning Algorithm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7553))

Abstract

This paper investigates the use of the Area Under the ROC Curve (AUC) as an alternative criteria for model selection in classification problems with unbalanced datasets. A novel algorithm, named here as AUCMLP, which incorporates AUC optimization into the Multi-layer Perceptron (MLPs) learning process is presented. The basic principle of AUCMLP is the solution of an optimization problem that aims at ranking quality as well as the separability of class distributions with respect to the threshold decision. Preliminary results achieved on real data, point out that our approach is promising, and can lead to better decision surfaces, specially under more severe unbalance conditions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rumelhart, D.E., McClelland, J.L.: Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1: Foundations. MIT Press (1986)

    Google Scholar 

  2. Lan, J., Hu, M.Y., Patuwo, E., Zhang, G.P.: An investigation of neural network classifiers with unequal misclassification costs and group sizes. Decis. Support Syst. 48, 582–591 (2010)

    Article  Google Scholar 

  3. Fawcett, T.: An introduction to ROC analysis. Pat. Rec. Lett. 27, 861–874 (2006)

    Article  Google Scholar 

  4. Rudin, C., Schapire, R.E.: Margin-based ranking and an equivalence between AdaBoost and RankBoost. J. of Mach. Learn. Research 10, 2193–2232 (2009)

    MATH  MathSciNet  Google Scholar 

  5. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)

    Article  Google Scholar 

  6. Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)

    Google Scholar 

  7. Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In: ICML 2003: Proceedings of the 20th Int. Conf. on Machine Learning, pp. 848–855 (2003)

    Google Scholar 

  8. Joachims, T.: A support vector method for multivariate performance measures. In: ICML 2005: Proc. of the 22nd Int. Conf. on Machine learning, pp. 377–384 (2005)

    Google Scholar 

  9. Herschtal, A., Raskutti, B., Campbell, P.K.: Area under ROC optimization using a ramp approximation. In: Proc. of 6th Int. Conf. on Data Mining, pp. 1–11 (2006)

    Google Scholar 

  10. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. on Knowledge and Data Engineering 21, 1263–1284 (2009)

    Article  Google Scholar 

  11. Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans. on Neural Networks 21, 813–830 (2010)

    Article  Google Scholar 

  12. Hanley, J.A., Mcneil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)

    Google Scholar 

  13. Batista, G., Prati, R., Monard, M.: A study of the behavior of methods for balancing machine learning training data. SIGKDD Expl. Newsl. 6, 20–29 (2004)

    Article  Google Scholar 

  14. Chen, S., He, H., Garcia, E.A.: Ramoboost: ranked minority oversampling in boosting. IEEE Trans. on Neural Networks 21, 1624–1642 (2010)

    Article  Google Scholar 

  15. UCI machine learning repository, http://archive.ics.uci.edu/ml/

  16. Wu, G., Chang, E.: KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. on Knowl. and Data Eng. 17, 786–795 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Castro, C.L., Braga, A.P. (2012). Improving ANNs Performance on Unbalanced Data with an AUC-Based Learning Algorithm. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33266-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33266-1_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33265-4

  • Online ISBN: 978-3-642-33266-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics