Skip to main content

Comparative Evaluation of the Supervised Machine Learning Classification Methods and the Concept Drift Detection Methods in the Financial Business Problems

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2020)

Abstract

Machine Learning methods are key tools for aiding in the decision making of financial business problems, such as risk analysis, fraud detection, and credit-granting evaluations, reducing the time and effort and increasing accuracy. Supervised machine learning classification methods learn patterns in data to improve prediction. In the long term, the data patterns may change in a process known as concept drift, with the changes requesting retraining the classification methods to maintain their accuracies. We conducted a comparative study using twelve classification methods and seven concept drift detection methods. The evaluated methods are Gaussian and Incremental Naïve Bayes, Logistic Regression, Support Vector Classifier, k-Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting, XGBoost, Multilayer Perceptron, Stochastic Gradient Descent, and Hoeffding Tree. The analyzed concept drift detection methods are ADWIN, DDM, EDDM, HDDMa, HDDMw, KSWIN, and Page Hinkley. We used the next-generation hyperparameter optimization framework Optuna and applied the non-parametric Friedman test to infer hypotheses and Nemeyni as a posthoc test to validate the results. We used five datasets in the financial domain. With the performance metrics of F1 and AUROC scores for classification, XGBoost outperformed other methods in the classification experiments. In the data stream experiments with concept drift, using accuracy as performance metrics, Hoeffding Tree and XGBoost showed the best results with the HDDMw, KSWIN, and ADWIN concept drift detection methods. We conclude that XGBoost with HDDMw is the recommended combination when financial datasets that exhibit concept drift.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631. ACM (2019)

    Google Scholar 

  2. Bache, K., Lichman, M.: UCI machine learning repository, vol. 28. School of Information and Computer Science, University of California, Irvine, CA (2013). http://archive.ics.uci.edu/ml

  3. Barros, R.S.M., Santos, S.G.T.C.: A large-scale comparison of concept drift detectors. Inf. Sci. 451, 348–370 (2018)

    Article  MathSciNet  Google Scholar 

  4. Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22

    Chapter  Google Scholar 

  5. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-7908-2604-3_16

    Chapter  Google Scholar 

  6. Bouazza, I., Ameur, E.B., Ameur, F.: Datamining for fraud detecting, state of the art. In: Ezziyyani, M. (ed.) AI2SD 2018. AISC, vol. 915, pp. 205–219. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11928-7_17

    Chapter  Google Scholar 

  7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  8. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  9. Dal Pozzolo, A., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 159–166. IEEE (2015)

    Google Scholar 

  10. Damodaran, A.: Corporate Finance. Wiley, Hoboken (1996)

    Google Scholar 

  11. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  12. Frías-Blanco, I., del Campo-Ávila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Díaz, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27(3), 810–823 (2014)

    Article  Google Scholar 

  13. García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)

    Article  Google Scholar 

  14. Géron, A.: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Newton (2019)

    Google Scholar 

  15. Gonçalves, P.M., Jr., de Carvalho Santos, S.G., Barros, R.S., Vieira, D.C.: A comparative study on concept drift detectors. Expert Syst. Appl. 41(18), 8144–8156 (2014)

    Article  Google Scholar 

  16. Hofmann, H.: Statlog (German credit data) data set. UCI Repository of Machine Learning Databases (1994)

    Google Scholar 

  17. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM Press (2001)

    Google Scholar 

  18. Islam, M.J., Wu, Q.J., Ahmadi, M., Sid-Ahmed, M.A.: Investigating the performance of Naive-Bayes classifiers and k-Nearest Neighbor classifiers. In: 2007 International Conference on Convergence Information Technology (ICCIT 2007), pp. 1541–1546. IEEE (2007)

    Google Scholar 

  19. Lavanya, D., Rani, K.U.: Performance evaluation of decision tree classifiers on medical datasets. Int. J. Comput. Appl. 26(4), 1–4 (2011)

    Google Scholar 

  20. Liaw, A., Wiener, M., et al.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

    Google Scholar 

  21. Lin, W.Y., Hu, Y.H., Tsai, C.F.: Machine learning in financial crisis prediction: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 421–436 (2011)

    Google Scholar 

  22. Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(72), 1–5 (2018). http://jmlr.org/papers/v19/18-251.html

  23. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)

    Article  Google Scholar 

  24. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013)

    Article  Google Scholar 

  25. Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes. In: Advances in Neural Information Processing Systems, pp. 841–848 (2002)

    Google Scholar 

  26. Nielsen, D.: Tree boosting with XGBoost-why does XGBoost win “Every” machine learning competition? Master’s thesis, NTNU (2016)

    Google Scholar 

  27. Pearl, J., Glymour, M., Jewell, N.P.: Causal Inference in Statistics: A Primer. Wiley, Hoboken (2016)

    MATH  Google Scholar 

  28. Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)

    Article  Google Scholar 

  29. Pugliese, V.U., Hirata, C.M., Costa, R.D.: Comparing supervised classification methods for financial domain problems. In: ICEIS (1), pp. 440–451 (2020)

    Google Scholar 

  30. Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27(3), 221–234 (1987)

    Article  Google Scholar 

  31. Raab, C., Heusinger, M., Schleif, F.M.: Reactive soft prototype computing for concept drift streams. Neurocomputing 416, 340–351 (2020)

    Article  Google Scholar 

  32. Ridgeway, G.: The state of boosting. In: Computing Science and Statistics, pp. 172–181 (1999)

    Google Scholar 

  33. Rish, I., et al.: An empirical study of the Naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, pp. 41–46 (2001)

    Google Scholar 

  34. Roberts, S.: Control chart tests based on geometric moving averages. Technometrics 42(1), 97–101 (2000)

    Article  Google Scholar 

  35. Sinayobye, J.O., Kiwanuka, F., Kyanda, S.K.: A state-of-the-art review of machine learning techniques for fraud detection research. In: 2018 IEEE/ACM Symposium on Software Engineering in Africa (SEiA), pp. 11–19. IEEE (2018)

    Google Scholar 

  36. Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)

    Article  Google Scholar 

  37. Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38(1), 223–230 (2011)

    Article  Google Scholar 

  38. Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing concept drift. Data Min. Knowl. Disc. 30(4), 964–994 (2016). https://doi.org/10.1007/s10618-015-0448-4

    Article  MathSciNet  MATH  Google Scholar 

  39. Yeh, I.C., Yang, K.J., Ting, T.M.: Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst. Appl. 36(3), 5866–5871 (2009)

    Article  Google Scholar 

  40. Yu, Q., Miche, Y., Séverin, E., Lendasse, A.: Bankruptcy prediction using extreme learning machine and financial expertise. Neurocomputing 128, 296–302 (2014)

    Article  Google Scholar 

  41. Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: based on bagging ensemble classifier. Proc. Comput. Sci. 48(2015), 679–685 (2015)

    Article  Google Scholar 

  42. Žliobaitė, I., Pechenizkiy, M., Gama, J.: An overview of concept drift applications. In: Japkowicz, N., Stefanowski, J. (eds.) Big Data Analysis: New Algorithms for a New Society. SBD, vol. 16, pp. 91–114. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26989-4_4

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Victor Ulisses Pugliese , Renato Duarte Costa or Celso Massaki Hirata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pugliese, V.U., Costa, R.D., Hirata, C.M. (2021). Comparative Evaluation of the Supervised Machine Learning Classification Methods and the Concept Drift Detection Methods in the Financial Business Problems. In: Filipe, J., Śmiałek, M., Brodsky, A., Hammoudi, S. (eds) Enterprise Information Systems. ICEIS 2020. Lecture Notes in Business Information Processing, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-75418-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75418-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75417-4

  • Online ISBN: 978-3-030-75418-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics