Skip to main content

Query-Based Versus Tree-Based Classification: Application to Banking Data

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10352))

Included in the following conference series:

  • 1655 Accesses

Abstract

The cornerstone of retail banking risk management is the estimation of the expected losses when granting a loan to the borrower. The key driver for loss estimation is probability of default (PD) of the borrower. Assessing PD lies in the area of classification problem. In this paper we apply FCA query-based classification techniques to Kaggle open credit scoring data. We argue that query based classification allows one to achieve higher classification accuracy as compared to applying classical banking models and still to retain interpretability of model results, whereas black-box methods grant better accuracy but diminish interpretability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/c/GiveMeSomeCredit.

  2. 2.

    https://www.kaggle.com/c/GiveMeSomeCredit.

  3. 3.

    https://github.com/dmlc/xgboost.

References

  1. Bigss, D., Ville, B., Suen, E.: A method of choosing multiway partitions for classification and decision trees. J. Appl. Stat. 18(1), 49–62 (1991)

    Article  Google Scholar 

  2. Bonferroni, C.E.: Teoria statistica delle classi e calcolo delle probabilitá. In: Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, vol. 8, pp. 3–62 (1936)

    Google Scholar 

  3. Naeem, S.: Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. SAS Publishing, Cary (2005)

    Google Scholar 

  4. Baesens, B., Gestel, T.V., Viaene, S., Stepanova, M., Suykens, J.: Benchmarking state-of-the-art classification algorithms for credit scoring. J. Oper. Res. Soc. 54(6), 627–635 (2003)

    Article  MATH  Google Scholar 

  5. Yu, L., Wang, S., Lai, K.K.: An intelligent agent-based fuzzy group decision making model for financial multicriteria decision support: the case of credit scoring. Eur. J. Oper. Res. 195, 942–959 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Gestel, T.V., Baesens, B., Suykens, J.A., Van den Poel, D., Baestaens, D.E., Willekens, B.: Bayesian kernel based classification for financial distress detection. Eur. J. Oper. Res. 172, 979–1003 (2006)

    Article  MATH  Google Scholar 

  7. Kumar, P.R., Ravi, V.: Bankruptcy prediction in banks and firms via statistical and intelligent techniques - a review. Eur. J. Oper. Res. 180(1), 1–28 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York Inc., New York (1997)

    MATH  Google Scholar 

  9. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Delugach, H.S., Stumme, G. (eds.) ICCS-ConceptStruct 2001. LNCS, vol. 2120, pp. 129–142. Springer, Heidelberg (2001). doi:10.1007/3-540-44583-8_10

    Chapter  Google Scholar 

  10. Kuznetsov, S.O.: Scalable knowledge discovery in complex data with pattern structures. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds.) PReMI 2013. LNCS, vol. 8251, pp. 30–39. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45062-4_3

    Chapter  Google Scholar 

  11. Kuznetsov, S.O.: Fitting pattern structures to knowledge discovery in big data. In: Cellier, P., Distel, F., Ganter, B. (eds.) ICFCA 2013. LNCS (LNAI), vol. 7880, pp. 254–266. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38317-5_17

    Chapter  Google Scholar 

  12. Kaytoue, M., Duplessis, S., Kuznetsov, S.O, Napoli, A.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. (2011). Special Issue: Lattices

    Google Scholar 

  13. Aha, D.W. (ed.): Lazy Learning. Kluwer Academic Publishers, Berlin (1997)

    MATH  Google Scholar 

  14. Li, X., Zhong, Y.: An overview of personal credit scoring: techniques and future work. Int. J. Intell. Sci. 2(4A), 182–189 (2012)

    MathSciNet  Google Scholar 

  15. Masyutin, A., Kashnitsky, Y., Kuznetsov, S.O.: Lazy classification with interval pattern structures: application to credit scoring. In: Kuznetsov, S.O., Napoli, A., Rudolph, S. (eds.) Proceedings of the International Workshop “What can FCA do for Artificial Intelligence?”, FCA4AI at IJCAI 2015, pp. 43–54. Buenos Aires, Argentina (2015)

    Google Scholar 

  16. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: IJCAI 2011, pp. 1342–1347 (2011)

    Google Scholar 

Download references

Acknowledgments

The paper was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE) and supported within the framework of a subsidy by the Russian Academic Excellence Project ‘5-100’.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexey Masyutin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Masyutin, A., Kashnitsky, Y. (2017). Query-Based Versus Tree-Based Classification: Application to Banking Data. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60438-1_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60437-4

  • Online ISBN: 978-3-319-60438-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics