Abstract
A central problem in customer relation management (CRM) is to cluster customers into meaningful groups. The problem is often called customer segmentation and is of paramount importance in the twenty-first century due to the rapid development of E-commerce which generates databases containing millions of customers. Recent algorithms in machine learning have been successful in clustering a wide range of datasets such as images, text documents, news and so on. Inspired by those accomplishments, we design a new segmentation model based on a combination of a deep neural network and a self-supervised probabilistic clustering technique. The new model is more flexible and more adaptive to the diversity of customer datasets compared to current heuristic algorithms in CRM. Moreover, feature engineering is the process to clean, prepare and transform raw data into features which are then fed into a model to produce clusters. To perform feature engineering, we combine a novel categorical encoding method in economics and an autoencoder, a recent machine learning data transformation method, to extract useful patterns from the original data. Our experiments with the full model on a set of retail transaction data from a supermarket chain in Ho Chi Minh city, Vietnam, show the capabilities of our algorithm to produce useful, explainable customer clusters.
Similar content being viewed by others
References
Muhammad A, Nabil AM, Ariff LS, Abdullah A-M (2019) Customer relationship management and big data enabled: Personalization & customization of services. Appl Comput Inf 15(2):94–101
Beheshtian-Ardakani A, Fathian M, Gholamian M (2018) A novel model for product bundling and direct marketing in e-commerce based on market segmentation. Decis Sci Lett 7(1):39–54
Yoshua B, Aaron C, Pascal V (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100,
Carnein M, Trautmann H (2019) Customer segmentation based on transactional data using stream clustering. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 280–292. Springer
Daqing C, Laing SS, Kun G (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Market Customer Strat Manag 19(3):197–208
Demchenko Y, De Laat C, Membrey P (2014) Defining architecture components of the big data ecosystem. In: 2014 International Conference on Collaboration Technologies and Systems (CTS), pp 104–112. IEEE
Gomez-Uribe Carlos A, Neil H (2015) The netflix recommender system: algorithms, business value, and innovation. ACM Trans Manag Inf Syst TMIS 6(4):1–19
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press
Hanagandi V, Dhar A, Buescher K (1996) Density-based clustering and radial basis function modeling to generate credit card fraud scores. In: IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr), pp 247–251. IEEE
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Abdulkadir H (2013) Soft computing applications in customer segmentation: State-of-art review and critique. Expert Syst Appl 40(16):6491–6507
Jonathan Johannemann, Vitor Hadad, Susan Athey, Stefan Wager (2019) Sufficient representations for categorical variables. arXiv preprint arXiv:1908.09874
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol. 344. Wiley, New York
Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: Advances in neural information processing systems, pp 3581–3589
Kodinariya Trupti M, Makwana Prashant R (2013) Review on determining number of cluster in k-means clustering. Int J 1(6):90–95
van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
Giovanna M (2011) Density-based silhouette diagnostics for clustering methods. Stat Comput 21(3):295–308
Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on Information and knowledge management, pp 86–93
Noori B (2015) An analysis of mobile banking user behavior using customer segmentation. Int J Global Bus 8(2)
Rivera-Castro R, Pletnev A, Pilyugina P, Diaz G, Nazarov I, Zhu W, Burnaev E (2019) Topology-based clusterwise regression for user segmentation and demand forecasting. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp 326–336. IEEE
Singh A, Rumantir G, South A, Bethwaite B (2014) Clustering experiments on big transaction data for market segmentation. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, pp 1–7
Phuc SN, Hoang UP et al (2019) On a segmentation of coopextra customers in thu DUC district. Sci Technol Develop J Econ Law Manag 3(1):28–36
Terragni A, Hassani M (2018) Analyzing customer journey with process mining: from discovery to recommendations. In: 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), pp 224–229. IEEE
Tsiptsis KK, Chorianopoulos A (2011) Data mining techniques in CRM: inside customer segmentation. Wiley, New York
Van Der Maaten L (2009) Learning a parametric embedding by preserving local structure. In: Artificial Intelligence and Statistics, pp 384–391
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Watson Hugh J (2014) Tutorial: big data analytics: concepts, technologies, and applications. Commun Assoc Inf Syst 34(1):65
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4066–4075
Zhen Y, Yain-Whar S, Defu Z, XiangXiang Z, Leung Stephen CH, Tao L (2015) A decision-making framework for precision marketing. Expert Syst Appl 42(7):3357–3367
Acknowledgements
The author is thankful to Dien H. Le and Nhat Q. Truong at the University of Economics and Law for coordinating the data collection process and for digitizing the customer receipts. The author highly appreciates the referees’ very helpful comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest:
Author Son P. Nguyen declares that he has no conflict of interest.
Ethical approval:
This article does not contain any studies with animals performed by any of the authors.
Additional information
Communicated by Vladik Kreinovich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nguyen, S.P. Deep customer segmentation with applications to a Vietnamese supermarkets’ data. Soft Comput 25, 7785–7793 (2021). https://doi.org/10.1007/s00500-021-05796-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05796-0