Skip to main content

Ensemble Learning for Imbalanced E-commerce Transaction Anomaly Classification

  • Conference paper
Neural Information Processing (ICONIP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5863))

Included in the following conference series:

Abstract

This paper presents the main results of our on-going work, one month before the deadline, on the 2009 UC San Diego data mining contest. The tasks of the contest are to rank the samples in two e-commerce transaction anomaly datasets according to the probability each sample has a positive label. The performance is evaluated by the lift at 20% on the probability of the two datasets. A main difficulty for the tasks is that the data is highly imbalanced, only about 2% of data are labeled as positive, for both tasks. We first preprocess the data on the categorical features and normalize all the features. Here, we present our initial results on several popular classifiers, including Support Vector Machines, Neural Networks, AdaBoosts, and Logistic Regression. The objective is to get benchmark results of these classifiers without much modification, so it will help us to select a classifier for future tuning. Further, based on these results, we observe that the area under the ROC curve (AUC) is a good indicator to improve the lift score, we then propose an ensemble method to combine the above classifiers aiming at optimizing the AUC score and obtain significant better results. We also discuss with some treatment on the imbalance data in the experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bell, R.M., Haffner, P.G., Volinsky, J.C.: Modifying boosted trees to improve performance on task 1 of the 2006 kdd challenge cup. ACM SIGKDD Explorations Newsletter 2, 47–52 (2006)

    Article  Google Scholar 

  2. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1996)

    MATH  Google Scholar 

  3. Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithm. Pattern Recognition 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  4. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  5. Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27, 861–874 (2006)

    Article  Google Scholar 

  6. Freund, Y., Schapire, R.E.: Game theory, on-line prediction and boosting. In: Proc. of the Ninth Annual Conference on Computational Learning Theory, pp. 325–332 (1996)

    Google Scholar 

  7. García-Pedrajas, N., García-Osorio, C., Fyfe, C.: Nonlinear boosting projections for ensemble construction. Journal of Machine Learning Research 8, 1–33 (2007)

    Google Scholar 

  8. Hosmer, D.W., Lemeshow, S.: Applied logistic regression, 2nd edn. Wiley-Interscience Publication, Hoboken (2000)

    MATH  Google Scholar 

  9. Huang, K., Yang, H., King, I., Lyu, M.R.: Imbalanced learning with biased minimax probability machine. IEEE Transactions on System, Man, and Cybernetics Part B 36, 913–923 (2006)

    Article  Google Scholar 

  10. Huang, K., Yang, H., King, I., Lyu, M.R.: Maximizing sensitivity in medical diagnosis using biased minimax probability machine. IEEE Transactions on Biomedical Engineering 53, 821–831 (2006)

    Article  Google Scholar 

  11. Huang, K., Yang, H., King, I., Lyu, M.R., Chan, L.: The minimum error minimax probability machine. Journal of Machine Learning Research 5, 1253–1286 (2004)

    MathSciNet  Google Scholar 

  12. Joachims, T.: A support vector method for multivariate performance measures. In: ICML, pp. 377–384 (2005)

    Google Scholar 

  13. Juditsky, A., Rigollet, P., Tsybakov, A.B.: Learning by mirror averaging. Annals of Statistics 36, 2183–2206 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  14. Kégl, B., Busa-Fekete, R.: Boosting products of base classifiers. In: ICML, p. 63 (2009)

    Google Scholar 

  15. Maloof, M.A., Langley, P., Binford, T.O., Nevatia, R., Sage, S.: Improved rooftop detection in aerial images with machine learning. Machine Learning 53, 157–191 (2003)

    Article  Google Scholar 

  16. Nabney, I.T.: Netlab: Algorithms for Pattern Recognition. Springer, Heidelberg (2004)

    Google Scholar 

  17. Platt, J.C., Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)

    Google Scholar 

  18. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics 26, 1651–1686 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  19. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  20. G.A.M. Toolbox, http://research.graphicon.ru/machine-learning/gml-adaboost-matlab-toolbox.html

  21. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (1999)

    Google Scholar 

  22. Vezhnevets, A., Vezhnevets, V.: Modest adaboost – teaching adaboost to generalize better. Graphicon (2005)

    Google Scholar 

  23. Wahba, G.: Spline Models for Observational Data, volume 59. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)

    Google Scholar 

  24. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explorations 6(1), 7–19 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, H., King, I. (2009). Ensemble Learning for Imbalanced E-commerce Transaction Anomaly Classification. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10677-4_98

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10677-4_98

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10676-7

  • Online ISBN: 978-3-642-10677-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics