Skip to main content
Log in

Deep customer segmentation with applications to a Vietnamese supermarkets’ data

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

A central problem in customer relation management (CRM) is to cluster customers into meaningful groups. The problem is often called customer segmentation and is of paramount importance in the twenty-first century due to the rapid development of E-commerce which generates databases containing millions of customers. Recent algorithms in machine learning have been successful in clustering a wide range of datasets such as images, text documents, news and so on. Inspired by those accomplishments, we design a new segmentation model based on a combination of a deep neural network and a self-supervised probabilistic clustering technique. The new model is more flexible and more adaptive to the diversity of customer datasets compared to current heuristic algorithms in CRM. Moreover, feature engineering is the process to clean, prepare and transform raw data into features which are then fed into a model to produce clusters. To perform feature engineering, we combine a novel categorical encoding method in economics and an autoencoder, a recent machine learning data transformation method, to extract useful patterns from the original data. Our experiments with the full model on a set of retail transaction data from a supermarket chain in Ho Chi Minh city, Vietnam, show the capabilities of our algorithm to produce useful, explainable customer clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Muhammad A, Nabil AM, Ariff LS, Abdullah A-M (2019) Customer relationship management and big data enabled: Personalization & customization of services. Appl Comput Inf 15(2):94–101

    Google Scholar 

  2. Beheshtian-Ardakani A, Fathian M, Gholamian M (2018) A novel model for product bundling and direct marketing in e-commerce based on market segmentation. Decis Sci Lett 7(1):39–54

    Article  Google Scholar 

  3. Yoshua B, Aaron C, Pascal V (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  4. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on Computational learning theory, pages 92–100,

  5. Carnein M, Trautmann H (2019) Customer segmentation based on transactional data using stream clustering. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 280–292. Springer

  6. Daqing C, Laing SS, Kun G (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Market Customer Strat Manag 19(3):197–208

    Article  Google Scholar 

  7. Demchenko Y, De Laat C, Membrey P (2014) Defining architecture components of the big data ecosystem. In: 2014 International Conference on Collaboration Technologies and Systems (CTS), pp 104–112. IEEE

  8. Gomez-Uribe Carlos A, Neil H (2015) The netflix recommender system: algorithms, business value, and innovation. ACM Trans Manag Inf Syst TMIS 6(4):1–19

    Google Scholar 

  9. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press

  10. Hanagandi V, Dhar A, Buescher K (1996) Density-based clustering and radial basis function modeling to generate credit card fraud scores. In: IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr), pp 247–251. IEEE

  11. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  12. Abdulkadir H (2013) Soft computing applications in customer segmentation: State-of-art review and critique. Expert Syst Appl 40(16):6491–6507

    Article  Google Scholar 

  13. Jonathan Johannemann, Vitor Hadad, Susan Athey, Stefan Wager (2019) Sufficient representations for categorical variables. arXiv preprint arXiv:1908.09874

  14. Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol. 344. Wiley, New York

  15. Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: Advances in neural information processing systems, pp 3581–3589

  16. Kodinariya Trupti M, Makwana Prashant R (2013) Review on determining number of cluster in k-means clustering. Int J 1(6):90–95

    Google Scholar 

  17. van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  18. Giovanna M (2011) Density-based silhouette diagnostics for clustering methods. Stat Comput 21(3):295–308

    Article  MathSciNet  Google Scholar 

  19. Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on Information and knowledge management, pp 86–93

  20. Noori B (2015) An analysis of mobile banking user behavior using customer segmentation. Int J Global Bus 8(2)

  21. Rivera-Castro R, Pletnev A, Pilyugina P, Diaz G, Nazarov I, Zhu W, Burnaev E (2019) Topology-based clusterwise regression for user segmentation and demand forecasting. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp 326–336. IEEE

  22. Singh A, Rumantir G, South A, Bethwaite B (2014) Clustering experiments on big transaction data for market segmentation. In: Proceedings of the 2014 International Conference on Big Data Science and Computing, pp 1–7

  23. Phuc SN, Hoang UP et al (2019) On a segmentation of coopextra customers in thu DUC district. Sci Technol Develop J Econ Law Manag 3(1):28–36

    Article  Google Scholar 

  24. Terragni A, Hassani M (2018) Analyzing customer journey with process mining: from discovery to recommendations. In: 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), pp 224–229. IEEE

  25. Tsiptsis KK, Chorianopoulos A (2011) Data mining techniques in CRM: inside customer segmentation. Wiley, New York

    Google Scholar 

  26. Van Der Maaten L (2009) Learning a parametric embedding by preserving local structure. In: Artificial Intelligence and Statistics, pp 384–391

  27. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  28. Watson Hugh J (2014) Tutorial: big data analytics: concepts, technologies, and applications. Commun Assoc Inf Syst 34(1):65

    Google Scholar 

  29. Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487

  30. Yang X, Deng C, Zheng F, Yan J, Liu W (2019) Deep spectral clustering using dual autoencoder network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4066–4075

  31. Zhen Y, Yain-Whar S, Defu Z, XiangXiang Z, Leung Stephen CH, Tao L (2015) A decision-making framework for precision marketing. Expert Syst Appl 42(7):3357–3367

    Article  Google Scholar 

Download references

Acknowledgements

The author is thankful to Dien H. Le and Nhat Q. Truong at the University of Economics and Law for coordinating the data collection process and for digitizing the customer receipts. The author highly appreciates the referees’ very helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Son P. Nguyen.

Ethics declarations

Conflict of Interest:

Author Son P. Nguyen declares that he has no conflict of interest.

Ethical approval:

This article does not contain any studies with animals performed by any of the authors.

Additional information

Communicated by Vladik Kreinovich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, S.P. Deep customer segmentation with applications to a Vietnamese supermarkets’ data. Soft Comput 25, 7785–7793 (2021). https://doi.org/10.1007/s00500-021-05796-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05796-0

Keywords

Navigation