Abstract
Many authoritative studies report how in these last years the consumer credit was up year on year, making it necessary to develop instruments able to assist the financial operators in some crucial tasks. The most important of them is to classify the loan applications as reliable or unreliable, on the basis of the customer information at their disposal. Such instruments of credit scoring allow the operators to reduce the financial losses, and for this reason they play a very important role. However, the design of effective credit scoring models is not an easy task, since it must face some problems, first among them the data imbalance in the model training. This problem arises because the number of default cases is usually much smaller than that of the non-default ones and this kind of distribution worsens the effectiveness of the state-of-the-art approaches used to define these models. This paper proposes a novel Linear Dependence Based (LDB) approach able to build a credit scoring model by using only the past non-default cases, overcoming both the imbalanced class distribution and the cold-start issues. It relies on the concept of linear dependence between the vector representations of the past and new loan applications, evaluating it in the context of a matrix. The experiments, performed by using two real-world datasets with a strong unbalanced distribution of data, show that the proposed approach achieves performance closer or better than that of one of the best state-of-the-art approaches of credit scoring such as random forests, even using only past non-default cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Notes
- 1.
When one of the vectors is a scalar multiple of the other.
- 2.
- 3.
- 4.
References
Henley, W., et al.: Construction of a k-nearest-neighbour credit-scoring system. IMA J. Manag. Math. 8, 305–321 (1997)
Mester, L.J.: What’s the point of credit scoring? Bus. Rev. 3, 3–16 (1997)
Morrison, J.: Introduction to survival analysis in business. J. Bus. Forecast. 23, 18 (2004)
Brill, J.: The importance of credit scoring models in improving cash flow and collections. Bus. Credit. 100, 16–17 (1998)
Pozzolo, A.D., Caelen, O., Borgne, Y.L., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41, 4915–4928 (2014)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6, 20–29 (2004)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)
Lessmann, S., Baesens, B., Seow, H., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. Eur. J. Oper. Res. 247, 124–136 (2015)
Brown, I., Mues, C.: An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 39, 3446–3453 (2012)
Bhattacharyya, S., Jha, S., Tharakunnel, K.K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support. Syst. 50, 602–613 (2011)
Saia, R., Carta, S.: A linear-dependence-based approach to design proactive credit scoring models. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016), KDIR, vol. 1, Porto, Portugal, 9–11 November 2016, pp. 111–120. SciTePress (2016)
Doumpos, M., Zopounidis, C.: Credit scoring. In: Doumpos, M., Zopounidis, C. (eds.) Multicriteria Analysis in Finance, pp. 43–59. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05864-1_4
Ali, S., Smith, K.A.: On learning algorithm selection for classification. Appl. Soft Comput. 6, 119–138 (2006)
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)
Siami, M., Hajimohammadi, Z., et al.: Credit scoring in banks and financial institutions via data mining techniques: a literature review. J. AI Data Min. 1, 119–129 (2013)
Chen, S.Y., Liu, X.: The contribution of data mining to information science. J. Inf. Sci. 30, 550–558 (2004)
Alborzi, M., Khanbabaei, M.: Using data mining and neural networks techniques to propose a new hybrid customer behaviour analysis and credit scoring model in banking services based on a developed RFM analysis method. IJBIS 23, 1–22 (2016)
Reichert, A.K., Cho, C.C., Wagner, G.M.: An examination of the conceptual issues involved in developing credit-scoring models. J. Bus. Econ. Stat. 1, 101–114 (1983)
Henley, W.E.: Statistical aspects of credit scoring. Ph.D. thesis, Open University (1994)
Desai, V.S., Crook, J.N., Overstreet, G.A.: A comparison of neural networks and linear scoring models in the credit union environment. Eur. J. Oper. Res. 95, 24–37 (1996)
Blanco-Oliver, A., Pino-Mejías, R., Lara-Rubio, J., Rayo, S.: Credit scoring models for the microfinance industry using neural networks: evidence from Peru. Expert Syst. Appl. 40, 356–364 (2013)
Henley, W.: A k-nearest-neighbour classifier for assessing consumer credit risk. Statistician 45, 77–95 (1996)
Ong, C.S., Huang, J.J., Tzeng, G.H.: Building credit scoring models using genetic programming. Expert. Syst. Appl. 29, 41–47 (2005)
Chi, B., Hsu, C.: A hybrid approach to integrate genetic algorithm into dual scoring model in enhancing the performance of credit scoring model. Expert Syst. Appl. 39, 2650–2661 (2012)
Saia, R., Carta, S.: An entropy based algorithm for credit scoring. In: Tjoa, A.M., Xu, L.D., Raffai, M., Novak, N.M. (eds.) CONFENIS 2016. LNBIP, vol. 268, pp. 263–276. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49944-4_20
Davis, R., Edelman, D., Gammerman, A.: Machine-learning algorithms for credit-card applications. IMA J. Manag. Math. 4, 43–51 (1992)
Wang, G., Ma, J., Huang, L., Xu, K.: Two credit scoring models based on dual strategy ensemble trees. Knowl.-Based Syst. 26, 61–68 (2012)
Hsieh, N.C.: Hybrid mining approach in the design of credit scoring models. Expert. Syst. Appl. 28, 655–665 (2005)
Lee, T.S., Chen, I.F.: A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert. Syst. Appl. 28, 743–752 (2005)
Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Syst. Appl. 38, 223–230 (2011)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)
Vinciotti, V., Hand, D.J.: Scorecard construction with unbalanced class sizes. J. Iran. Stat. Soc. 2, 189–205 (2003)
Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. JORS 64, 1060–1070 (2013)
Crone, S.F., Finlay, S.: Instance sampling in credit scoring: an empirical study of sample size and balancing. Int. J. Forecast. 28, 224–238 (2012)
Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Scott, D., Uszkoreit, H. (eds.) COLING 2008, 22nd International Conference on Computational Linguistics, Proceedings of the Conference, 18–22 August 2008, Manchester, UK, pp. 1137–1144 (2008)
Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_14
Attenberg, J., Provost, F.J.: Inactive learning? Difficulties employing active learning in practice. SIGKDD Explor. 12, 36–41 (2010)
Thanuja, V., Venkateswarlu, B., Anjaneyulu, G.: Applications of data mining in customer relationship management. J. Comput. Math. Sci. 2, 399–580 (2011)
Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41, 2065–2073 (2014)
Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a comparative review. Inf. Syst. 58, 87–104 (2016)
Fernández-Tobías, I., Tomeo, P., Cantador, I., Noia, T.D., Sciascio, E.D.: Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 119–122. ACM (2016)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Moler, C.B.: Numerical Computing with MATLAB. SIAM, Philadelphia (2004)
Quah, J.T.S., Sriganesh, M.: Real-time credit card fraud detection using computational intelligence. Expert Syst. Appl. 35, 1721–1732 (2008)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)
Shannon, C.E.: A mathematical theory of communication. Mob. Comput. Commun. Rev. 5, 3–55 (2001)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)
Kwak, N., Choi, C.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13, 143–159 (2002)
Jiang, F., Sui, Y., Zhou, L.: A relative decision entropy-based feature selection approach. Pattern Recognit. 48, 2151–2163 (2015)
Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation (2011)
Faraggi, D., Reiser, B.: Estimation of the area under the ROC curve. Stat. Med. 21, 3093–3106 (2002)
Salzberg, S.: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min. Knowl. Discov. 1, 317–328 (1997)
Liu, Y., Schumann, M.: Data mining feature selection for credit scoring models. J. Oper. Res. Soc. 56, 1099–1108 (2005)
Acknowledgements
This research is partially funded by Regione Sardegna under project “Next generation Open Mobile Apps Development” (NOMAD), “Pacchetti Integrati di Agevolazione” (PIA) - Industria Artigianato e Servizi - Annualità 2013.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Saia, R., Carta, S. (2019). Introducing a Vector Space Model to Perform a Proactive Credit Scoring. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2016. Communications in Computer and Information Science, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-319-99701-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-99701-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99700-1
Online ISBN: 978-3-319-99701-8
eBook Packages: Computer ScienceComputer Science (R0)