Abstract
This paper proposes a data mining approach for automatic customer targeting based on their expected profitability. The main challenge with customer profitability prediction is asymmetry, i.e., skewness of the distribution, because the number of highly profitable customers is very small compared to others. Although data mining methods are more resistant to sample heterogeneity than statistical ones, due to strong skewness, the accuracy of predictions often decreases as the value of profit increases. These few customers are actually outliers which can make data-driven methods to overestimate predicted amounts, but on the other hand, they contain very important information about the most valuable customers, so it is not advisable to remove them. In this paper, a data mining approach for overcoming these problems is proposed. The results show that the relative error in predicting the absolute amount of the profitability of the most valuable customers is very small and does not differ much from the error for other customers, unlike previously applied methods where predicting high profitability was less accurate. Accordingly, the specific implication of the high accuracy is more efficient identification of the most profitable customers, which ultimately make a greater contribution to the company in terms of revenue. Also, due to the good precision of the model, errors in the assessment of highly profitable and risky customers are reduced, which leads to savings in unnecessary costs for the marketers.




Similar content being viewed by others
Data and material availability
Authors are not allowed to share company’s data.
References
Bull C (2003) Strategic issues in customer relationship management (CRM) implementation. Bus Process Manag J 9:592–602
Lee JH, Park SC (2005) Intelligent profitable customers segmentation system based on business intelligence tools. Expert Syst Appl 29:145–152. https://doi.org/10.1016/j.eswa.2005.01.013
Gurău C, Ranchhod A, Hackney R (2003) Customer-centric strategic planning: integrating CRM in online business systems. Inf Technol Manag 4:199–214. https://doi.org/10.1023/A:1022902412594
Verhoef PC, Donkers B (2001) Predicting customer potential value: an application in the insurance industry. Decis Support Syst 32:189–199
Rust RT, Kumar V, Venkatesan R (2011) Will the frog change into a prince? Predicting future customer profitability. Int J Res Mark 28:281–294
Glady N, Baesens B, Croux C (2008) Modeling churn using customer lifetime value. Expert Syst Appl 197:402–411
Malthouse EC, Blattberg RC (2005) Can we predict customer lifetime value? J Interact Mark 19:2–16. https://doi.org/10.1002/dir.20027
Donkers B, Verhoef PC, de Jong MG (2007) Modeling CLV: a test of competing models in the insurance industry. Quant Mark Econ 5:163–190. https://doi.org/10.1007/s11129-006-9016-y
Xiahou J, Xu Y, Zhang S, Liao W (2016) Customer profitability analysis of automobile insurance market based on data mining. In: ICCSE 2016—1th International Conference on Computer Science & Education pp. 603–609. Doi: https://doi.org/10.1109/ICCSE.2016.7581649
Rogic S, Kascelan L (2020) Class balancing in customer segments classification using support vector machine rule extraction and ensemble learning. Comput Sci Inf Syst 18:893–925. https://doi.org/10.2298/csis200530052r
Fang K, Jiang Y, Song M (2016) Customer profitability forecasting using big data analytics: a case study of the insurance industry. Comput Ind Eng 101:554–564. https://doi.org/10.1016/j.cie.2016.09.011
Lam S (2018) The ensemble of neural network and gradient boosting for the prediction of customer profitability: a two-stage modeling approach. Model Assist Stat Appl 13:329–340. https://doi.org/10.3233/MAS-180443
Vapnik VN (2010) The nature of statistical learning theory. Springer, New York
Basak D, Pal S, Patranabis DC (2007) Support vector regression. Neural Inf Process Lett Rev 11:203–224
Lipovina-Božović M, Kašćelan L, Kašćelan V (2019) A support vector machine approach for predicting progress toward environmental sustainability from information and communication technology and human development. Environ Ecol Stat 26:259–286. https://doi.org/10.1007/s10651-019-00427-2
Chuang CC, Su SF, Jeng JT, Hsiao CC (2002) Robust support vector regression networks for function approximation with outliers. IEEE Trans Neural Netw 13:1322–1330. https://doi.org/10.1109/TNN.2002.804227
Colliez J, Dufrenois F, Hamad D (2006) Robust regression and outlier detection with SVR: application to optic flow estimation. In: BMVC 2006— Proc Br Mach Vis Conf 2006. 99: 1229–1238. Doi: https://doi.org/10.5244/c.20.125
Lei M, Jiang G, Yang J, Mei X, Xia P, Shi H (2018) Improvement of the regression model for spindle thermal elongation by a boosting-based outliers detection approach. Int J Adv Manuf Technol 99:1389–1403. https://doi.org/10.1007/s00170-018-2559-8
Wang K, Lan H (2020) Robust support vector data description for novelty detection with contaminated data. Eng Appl Artif Intell 91:103554. https://doi.org/10.1016/j.engappai.2020.103554
Kim D, Lee H, Cho S (2008) Response modeling with support vector regression. Expert Syst Appl 34:1102–1108. https://doi.org/10.1016/j.eswa.2006.12.019
Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review. Artif Intell Rev 52:857–900. https://doi.org/10.1007/s10462-017-9611-1
Guo L, Boukir S (2015) Fast data selection for SVM training using ensemble margin. Pattern Recognit Lett 51:112–119. https://doi.org/10.1016/j.patrec.2014.08.003
Al-Anazi AF, Gates ID (2012) Support vector regression to predict porosity and permeability: effect of sample size. Comput Geosci 39:64–76. https://doi.org/10.1016/j.cageo.2011.06.011
Meng M, Zhao C (2015) Application of support vector machines to a small-sample prediction. Adv Pet Explor Dev 10:72–75. https://doi.org/10.3968/7830
Tange RI, Rasmussen MA, Taira E, Bro R (2017) Benchmarking support vector regression against partial least squares regression and artificial neural network: effect of sample size on model performance. J Near Infrared Spectrosc 25:381–390. https://doi.org/10.1177/0967033517734945
Kašćelan V, Kašćelan L, Burić MN (2016) A nonparametric data mining approach for risk prediction in car insurance: a case study from the Montenegrin market. Econ Res Istraz 29:545–558. https://doi.org/10.1080/1331677X.2016.1175729
Camps-Valls G, Soria-Olivas E, Pérez-Ruixo JJ, Pérez-Cruz F, Figueiras-Vidal AR, Artés-Rodríguez A (2002) Cyclosporine concentration prediction using clustering and support vector regression methods. Electron Lett 38:568–570. https://doi.org/10.1049/el:20020354
Varian HR (2014) Big data: new tricks for econometrics. J Econ Perspect 28:3–28. https://doi.org/10.1257/jep.28.2.3
Sanders R (1987) The pareto principle: its use and abuse. J Serv Mark 1:37–40. https://doi.org/10.1108/eb024706
Qi JY, Zhou YP, Chen WJ, Qu QX (2012) Are customer satisfaction and customer loyalty drivers of customer lifetime value in mobile data services: a comparative cross-country study. Inf Technol Manag 13:281–296. https://doi.org/10.1007/s10799-012-0132-y
Qi JY, Qu QX, Zhou YP, Li L (2014) The impact of users’ characteristics on customer lifetime value raising: evidence from mobile data service in China. Inf Technol Manag 16:273–290. https://doi.org/10.1007/s10799-014-0200-6
Ballestar MT, Grau-Carles P, Sainz J (2019) Predicting customer quality in e-commerce social networks: a machine learning approach. Rev Manag Sci 13:589–603. https://doi.org/10.1007/s11846-018-0316-x
Christmann A (2004) An approach to model complex high? dimensional insurance data. All Stat Arch 88:375–396. https://doi.org/10.1007/s101820400178
D’Haen J, Van Den Poel D, Thorleuchter D (2013) Predicting customer profitability during acquisition: finding the optimal combination of data source and data mining technique. Expert Syst Appl 40:2007–2012. https://doi.org/10.1016/j.eswa.2012.10.023
Ferraretti D, Gamberoni G, Lamma E (2012) Expert systems with applications unsupervised and supervised learning in cascade for petroleum geology. Expert Syst Appl 39:9504–9514. https://doi.org/10.1016/j.eswa.2012.02.104
Berkhin P (2002) Survey of clustering data mining techniques. In: Grouping multidimensional data, pp. 25–71. https://doi.org/10.1007/3-540-28349-8_2
Hughes AM (1994) Strategic database marketing: the masterplan for starting and managing a profitable, customer-based marketing program. Irwin, Chicago
Cheng CH, Chen YS (2009) Classifying the segmentation of customer value via RFM model and RS theory. Expert Syst Appl 36:4176–4184. https://doi.org/10.1016/j.eswa.2008.04.003
Hosseini SMS, Maleki A, Gholamian MR (2010) Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty. Expert Syst Appl 37:5259–5264. https://doi.org/10.1016/j.eswa.2009.12.070
Sarvari P, Ustundag A, Takci H (2016) Performance evaluation of different customer segmentation approaches based on RFM and demographics analysis. Kybernetes 45:1129–1157
Rogic S, Kascelan L (2019) Customer value prediction in direct marketing using hybrid support vector machine rule extraction method. Commun Comput Inf Sci 1064:283–294. https://doi.org/10.1007/978-3-030-30278-8_30
Djurisic V, Kascelan L, Rogic S, Melovic B (2020) Bank CRM optimization using predictive classification based on the support vector machine method. Appl Artif Intell 34:941–955. https://doi.org/10.1080/08839514.2020.1790248
Zeng L, Li L, Duan L (2012) Business intelligence in enterprise computing environment. Inf Technol Manag 13:297–310. https://doi.org/10.1007/s10799-012-0123-z
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1: 281–297
Jain AK (2009) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Arthur D, Vassilvitskii S (2006) k-means ++ : the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. pp. 1027–1035
Davies DL, Bouldin DW (1979) A cluster separation measure. In IEEE Transactions on pattern analysis and machine intelligence PAMI-1, pp. 224–227. Doi: https://doi.org/10.1109/TPAMI.1979.4766909
Sanderson M (2010) Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, introduction to information retrieval, Cambridge University Press. 2008. Nat Lang Eng 16: 100–103
Raphaeli O, Goldstein A, Fink L (2017) Analyzing online consumer behavior in mobile and PC Devices: a novel web usage mining approach. Electron Commer Res Appl 26:1–12. https://doi.org/10.1016/j.elerap.2017.09.003
Abdi F, Abolmakarem S (2019) Customer behavior mining framework (CBMF) using clustering and classification techniques. J Ind Eng Int. https://doi.org/10.1007/s40092-018-0285-3
Benou P, Vassilakis C, Vrechopoulos A (2012) Context management for m-commerce applications: determinants, methodology and the role of marketing. Inf Technol Manag 13:91–111. https://doi.org/10.1007/s10799-012-0120-2
Bulysheva L, Bulyshev A (2012) Segmentation modeling algorithm: a novel algorithm in data mining. Inf Technol Manag 13:263–271. https://doi.org/10.1007/s10799-012-0136-7
Alizadeh Zoeram A, Karimi Mazidi AR (2018) A new approach for customer clustering by integrating the LRFM model and fuzzy inference system. Iran J Manag Stud 11:351–378. https://doi.org/10.22059/ijms.2018.242528.672839
McCarty JA, Hastak M (2007) Segmentation approaches in data-mining: a comparison of RFM, CHAID, and logistic regression. J Bus Res 60:656–662. https://doi.org/10.1016/j.jbusres.2006.06.015
van Raaij EM, Vernooij MJA, van Triest S (2003) The implementation of customer profitability analysis: a case study. Ind Mark Manag 32:573–583. https://doi.org/10.1016/S0019-8501(03)00006-3
Ben Schafer J, Konstan JA, Riedl J (2001) E-commerce recommendation applications. Data Min Knowl Discov 5:115–153. https://doi.org/10.1007/978-1-4615-1627-9_6
Leick R (2007) Building airline passenger loyalty through an understanding of customer value: a relationship segmentation of airline passengers. PhD thesis, Cranfield University
Rishika R, Kumar A, Janakiraman R, Bezawada R (2013) The effect of customers’ social media participation on customer visit frequency and profitability: an empirical investigation. Inf Syst Res 24:108–127. https://doi.org/10.1287/isre.1120.0460
Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study. Int J Adv Comput Sci Appl 9:273–281. https://doi.org/10.14569/IJACSA.2018.090238
Liu DR, Shih YY (2005) Integrating AHP and data mining for product recommendation based on customer lifetime value. Inf Manag 42:387–400. https://doi.org/10.1016/j.im.2004.01.008
Stone MD, Woodcock ND (2014) Interactive, direct and digital marketing: A future that depends on better use of business intelligence. J Res Interact Mark 8:4–17. https://doi.org/10.1108/JRIM-07-2013-0046
Funding
No funding was obtained for this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rogić, S., Kašćelan, L., Kašćelan, V. et al. Automatic customer targeting: a data mining solution to the problem of asymmetric profitability distribution. Inf Technol Manag 23, 315–333 (2022). https://doi.org/10.1007/s10799-021-00353-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-021-00353-5