Abstract
Electronic commerce has brought about a fundamental change in marketing atmosphere. Marketers strive to gain a competitive advantage by providing exciting electronic platforms to attract visitors. Suppliers must comprehend and fulfill their buyers’ demands to increase their income and profit. Analyzing users’ behavior and predicting their purchase attitudes significantly influences marketing strategies. A comprehensive prediction of customers’ purchase intention is crucial to developing good marketing strategies, which may trigger much greater purchase amounts. Marketing analysts utilize various data mining approaches to identify consumer traits and intentions, helping marketing analyzers distinguish complex consumption patterns. This paper aims to recommend an ensemble meta-classifier algorithm for predicting customer purchase intentions in electronic markets using their clickstream data. To achieve this, we performed the RFECV procedure to select the most practical features, thus achieving the highest accuracy. Additionally, we applied hyperparameter optimization and tuning to select the best parameters for prediction. Furthermore, we chose to use Oracle as a static ensemble meta-classifier method. The results demonstrate that using the Oracle algorithm in combination with supervised classifiers and RFECV feature selection achieves higher precision and a lower error rate compared to conventional classifiers.
Similar content being viewed by others
References
Agrawal T, Agrawal T (2021) Hyperparameter optimization using scikit-learn. In: Hyperparameter optimization in machine learning: make your machine learning and deep learning models more efficient, pp 31–51
Alalwan AA (2018) Investigating the impact of social media advertising features on customer purchase intention. Int J Inf Manag 42:65–77
Alghanam OA, Al-Khatib SN, Hiari MO (2022) Data mining model for predicting customer purchase behavior in e-commerce context. Int J Adv Comput Sci Appl 13(2)
Alkufahy A, Al-Alshare F, Qawasmeh F, Aljawarneh N, Almaslmani R (2023) The mediating role of the perceived value on the relationships between customer satisfaction, customer loyalty and e-marketing. Int J Data Netw Sci 7(2):891–900
Anitha P, Patil MM (2022) RFM model for customer purchase behavior using K- Means algorithm. J King Saud Univ-Comput Inf Sci 34(5):1785–1792
Arasu BS, Seelan BJB, Thamaraiselvan N (2020) A machine learning-based approach to enhancing social media marketing. Comput Electr Eng 86:106723
Ascarza E, Neslin SA, Netzer O, Anderson Z, Fader PS, Gupta S, Schrift R (2018) In pursuit of enhanced customer retention management: review, key issues, and future directions. Cust Needs Solut 5:65–81
Ash T, Ginty M, Page R (2012) Landing page optimization: the definitive guide to testing and tuning for conversions. John
Baumann A, Haupt J, Gebert F, Lessmann S (2018) Changing perspectives: using graph metrics to predict purchase probabilities. Expert Syst Appl 94:137–148
Baumann A, Haupt J, Gebert F, Lessmann S (2019) The price of privacy: an evaluation of the economic value of collecting clickstream data. Bus Inf Syst Eng 61:413–431
Brownlee J (2020) Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery
Bucklin RE, Sismeiro C (2009) Click here for Internet insight: advances in clickstream data analysis in marketing. J Interact Mark 23(1):35–48
Canbek G, Sagiroglu S, Temizel TT, Baykal N (2017) Binary classification performance measures/metrics: a comprehensive visualized roadmap to gain new insights. In 2017 international conference on computer science and engineering (UBMK). IEEE, Chicago, pp 821–826
Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S (2012) Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Syst Appl 39(12):11243–11249
Carreón ECA, Nonaka H, Hentona A, Yamashiro H (2019) Measuring the influence of mere exposure effect of TV commercial adverts on purchase behavior based on machine learning prediction models. Inf Process Manag 56(4):1339–1355
Cateni S, Colla V, Vannucci M (2014) A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 135:32–41
Chaudhuri N, Gupta G, Vamsi V, Bose I (2021) On the platform but will they buy? Predicting customers’ purchase behavior using deep learning. Decis Support Syst 149:113622
Chen M, Chen ZL (2015) Recent developments in dynamic pricing research: multiple products, competition, and limited demand information. Prod Oper Manag 24(5):704–731
Chen S-X, Wang X-K, Zhang H-Y, Wang J-Q (2021) Customer purchase prediction from the perspective of imbalanced data: a machine learning framework based on factorization machine. Expert Syst Appl 173:114756
Chintagunta PK, Chu J, Cebollada J (2012) Quantifying transaction costs in online/off-line grocery channel choice. Mark Sci 31(1):96–114
Chong D, Ali H (2022) Literature review: competitive strategy, competitive advantages, and marketing performance on e-commerce Shopee Indonesia. Dinasti Int J Digit Bus Manag 3(2):299–309
Chou P, Chuang HHC, Chou YC, Liang TP (2022) Predictive analytics for customer repurchase: interdisciplinary integration of buy till you die modeling and machine learning. Eur J Oper Res 296(2):635–651
Claesen M, Simm J, Popovic D, Moor B (2014) Hyperparameter tuning in Python using optunity. In: paper presented at the proceedings of the international workshop on technical computing for machine learning and mathematical engineering
Cruz RM, Sabourin R, Cavalcanti GD (2018) Dynamic classifier selection: recent advances and perspectives. Inf Fusion 41:195–216
Cruz RM, Hafemann LG, Sabourin R, Cavalcanti GD (2020) DESlib: a dynamic ensemble selection library in Python. J Mach Learn Res 21(1):283–287
Dai Q, Liu JW, Liu Y (2022) Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl Soft Comput 124:109083
Ding AW, Li S, Chatterjee P (2015) Learning user real-time intent for optimal dynamic web page transformation. Inf Syst Res 26(2):339–359
Dong Y, Jiang W (2019) Brand purchase prediction based on time-evolving user behaviors in e-commerce. Concurr Comput Pract Exp 31(1):e4882
Ehikioya SA, Lu S (2020) A traffic tracking analysis model for the effective management of e-commerce transactions. Int J Netw Distrib Comput 8(3):171–193
Ehsani F, Hosseini M (2023a) Consumer segmentation based on location and timing dimensions using big data from business-to-customer retailing marketplaces. Big Data 11(5):1–16
Ehsani F, Hosseini M (2023b) Investigation to determine elements influencing customer's satisfaction in the B2C electronic retailing marketplaces. EuroMed J Bus 18(3):321–344
Felix E (2015) Marketing challenges of satisfying consumers changing expectations and preferences in a competitive market. Int J Mark Stud 7(5):41
Forslund J, Fahlén J (2020) Predicting customer purchase behavior within Telecom: how Artificial Intelligence can be collaborated into marketing efforts. Master of Science Thesis TRITA-ITM-EX 2020:356, KTH Industrial Engineering and Management, Stockholm
Ghosh S, Banerjee C (2020) A predictive analysis model of customer purchase behavior using modified random forest algorithm in cloud environment. In: 2020 IEEE 1st international conference for convergence in engineering (ICCE). IEEE, pp 239–244
Holland CP, Thornton SC, Naudé P (2020) B2B analytics in the airline market: harnessing the power of consumer big data. Ind Mark Manage 86:52–64
Hou C, Chen C, Wang J (2018) Tree-based feature transformation for purchase behavior prediction. IEICE Trans Inf Syst 101(5):1441–1444
Huang C-C, Liang W-Y, Lai Y-H, Lin Y-C (2010) The agent-based negotiation process for B2C e-commerce. Expert Syst Appl 37(1):348–359
Huidobro A, Monroy R, Cervantes B (2022) A High-level representation of the navigation behavior of website visitors. Appl Sci 12(13):6711
Hutter F, Kotthoff L, Vanschoren J (2019) Automated machine learning: methods, systems, challenges. Springer Nature, p 219
Joshi R, Gupte R, Saravanan P (2018) A random forest approach for predicting online buying behavior of Indian customers. Theor Econ Lett 8(03):448
Kabir MR, Ashraf FB, Ajwad R (2019). Analysis of different predicting model for online shoppers’ purchase intention from empirical data. In: 2019 22nd international conference on computer and information technology (ICCIT). IEEE, pp 1–6
Khachatryan H, Hodges AW, Hall C, Palma M (2020) Production and marketing practices and trade flows in the United States green industry, 2018. South Coop Ser Bull 421:2020–2021
Kircova I, SaglamMH, Kose SG (2021) Artificial intelligence in retailing. University of South Florida (USF) M3 Publishing, 5, p 73
Koehn D, Lessmann S, Schaal M (2020) Predicting online shopping behaviour from clickstream data using deep learning. Expert Syst Appl 150:113342
Kumar A, Kabra G, Mussada EK, Dash MK, Rana PS (2019) Combined artificial bee colony algorithm and machine learning techniques for prediction of online consumer repurchase intention. Neural Comput Appl 31(2):877–890
Li Z, Xie H, Xu G, Li Q, Leng M, Zhou C (2021) Towards purchase prediction: a transaction-based setting and a graph-based method leveraging price information. Pattern Recogn 113:107824
Liengaard BD, Sharma PN, Hult GTM, Jensen MB, Sarstedt M, Hair JF, Ringle CM (2021) Prediction: coveted, yet forsaken? Introducing a cross-validated predictive ability test in partial least squares path modeling. Decis Sci 52(2):362–392
Lin W, Milic-Frayling N, Zhou K, Ch'ng E (2019) Predicting outcomes of active sessions using multi-action motifs. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp 9–17
Liu Y (2007) The long-term impact of loyalty programs on consumer purchase behavior and loyalty. J Mark 71(4):19–35
Liu B, Zhang H, Kong L, Niu D (2021) Factorizing historical user actions for next-day purchase prediction. ACM Trans Web (TWEB) 16(1):1–26
Lubis AN, Lumbanraja P, Hasibuan BK (2022) Evaluation on e-marketing exposure practice to minimize the customers’ online shopping purchase regret. Cogent Bus Manag 9(1):2016039
Luo MM, Chen JS, Ching RK, Liu CC (2011) An examination of the effects of virtual experiential marketing on online customer intentions and loyalty. Serv Ind J 31(13):2163–2191
Malmasi S, Tetreault J, Dras M (2015) Oracle and human baselines for native language identification. In: proceedings of the tenth workshop on innovative use of NLP for building educational applications, pp 172–178
Marqués AI, García V, Sánchez JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070
Montgomery AL, Li S, Srinivasan K, Liechty JC (2004) Modeling online browsing and path analysis using clickstream data. Mark Sci 23(4):579–595
Nasir S (2017) Customer retention strategies and customer loyalty. Advertising and branding: concepts, methodologies, tools, and applications. IGI Global, pp 1178–1201
Nisar TM, Prabhakar G (2017) What factors determine e-satisfaction and consumer spending in e-commerce retailing? J Retail Consum Serv 39:135–144
Noviantoro T, Huang JP (2021) Applying data mining techniques to investigate online shopper purchase intention based on clickstream data. Rev Bus Account Financ 1(2):130–159
Płoński P, Zaremba K (2014) Visualizing random forest with self-organising map. In: artificial intelligence and soft computing: 13th international conference, ICAISC 2014, Zakopane, Poland, June 1–5, 2014, proceedings, Part II 13. Springer International Publishing, pp 63–71
Punj G (2011) Effect of consumer beliefs on online purchase behavior: the influence of demographic characteristics and consumption values. J Interact Mark 25(3):134–144
Qiu J, Lin Z, Li Y (2015) Predicting customer purchase behavior in the e-commerce context. Electron Commer Res 15:427–452
Rahim MA, Mushafiq M, Khan S, Arain ZA (2021) RFM-based repurchase behavior for customer classification and segmentation. J Retail Consum Serv 61:102566
Rahman A, Khan MNA (2018) A classification based model to assess customer behavior in banking sector. Eng Technol Appl Sci Res 8(3):2949
Rajeswari S, Suthendran K (2019) C5. 0: advanced decision tree (ADT) classification model for agricultural data analysis on cloud. Comput Electron Agric 156:530–539
Raphaeli O, Goldstein A, Fink L (2017) Analyzing online consumer behavior in mobile and PC devices: a novel web usage mining approach. Electron Commer Res Appl 26:1–12
Rudewicz J, Sala K (2021) New professional competencies in the era of WEB 2.0 and 3.0 and the dissemination of ICT. Przedsiębiorczość-Edukacja 17(1):19–34
Safa NS, Ismail MA (2013) A customer loyalty formation model in electronic commerce. Econ Model 35:559–564
Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Comput Appl 31(10):6893–6908
Samboteng L, Rulinawaty R, Kasmad MR, Basit M, Rahim R (2022) Market basket analysis of administrative patterns data of consumer purchases using data mining technology. J Appl Eng Sci 20(2):339–345
Shamsudin H, Yusof UK, Jayalakshmi A, Khalid MNA (2020) Combining oversampling and undersampling techniques for imbalanced classification: a comparative study using credit card fraudulent transaction dataset. In: 2020 IEEE 16th international conference on control & automation (ICCA). IEEE, pp 803–808
Shuai Y, Zheng Y, Huang H (2018) Hybrid software obsolescence evaluation model based on PCA-SVM-GridSearchCV. In: 2018 IEEE 9th international conference on software engineering and service science (ICSESS). IEEE, pp 449–453
Song H, Ruan WJ, Jeon YJJ (2021) An integrated approach to the purchase decision making process of food-delivery apps: focusing on the TAM and AIDA models. Int J Hosp Manag 95:102943
Tillmanns S, Ter Hofstede F, Krafft M, Goetz O (2017) How to separate the wheat from the chaff: improved variable selection for new customer acquisition. J Mark 81(2):99–113
Topal I (2019) Estimation of online purchasing intention using decision tree. Yönetim Ve Ekonomi Araştırmaları Dergisi 17(4):269–280
Valecha H, Varma A, Khare I, Sachdeva A, Goyal M (2018) Prediction of consumer behaviour using random forest algorithm. In: 2018 5th IEEE Uttar Pradesh section international conference on electrical, electronics and computer engineering (UPCON). IEEE, pp 1–6
Van Nguyen T, Zhou L, Chong AYL, Li B, Pu X (2020) Predicting customer demand for remanufactured products: a data-mining approach. Eur J Oper Res 281(3):543–558
Vasić N, Kilibarda M, Kaurin T (2019) The influence of online shopping determinants on customer satisfaction in the Serbian market. J Theor Appl Electron Commer Res 14(2):70–89
Wang XS, Ryoo JHJ, Bendle N, Kopalle PK (2021a) The role of machine learning analytics and metrics in retailing research. J Retail 97(4):658–675
Wang Z, Zhao S, Li Z, Chen H, Li C, Shen Y (2021b) Ensemble selection with joint spectral clustering and structural sparsity. Pattern Recognit 119:108061
Wei JT, Lee MC, Chen HK, Wu HH (2013) Customer relationship management in the hairdressing industry: an application of data mining techniques. Expert Syst Appl 40(18):7513–7518
Weingarten J, Spinler S (2021) Shortening delivery times by predicting customers’ online purchases: a case study in the fashion industry. Inf Syst Manag 38(4):287–308
Wen YT, Yeh PW, Tsai TH, Peng WC, Shuai HH (2018) Customer purchase behavior prediction from payment datasets. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp 628–636
Xiaolong XU, Wen CHEN, Yanfei SUN (2019) Over-sampling algorithm for imbalanced data classification. J Syst Eng Electron 30(6):1182–1191
Zavali M, Lacka E, De Smedt J (2021) Shopping hard or hardly shopping: revealing consumer segments using clickstream data. IEEE Trans Eng Manag
Zeng H, Pan D (2010) A knowledge discovery and data mining process model in E-marketing. In: 2010 8th World Congress on Intelligent Control and Automation. IEEE, pp 3960–3964
Zheng B, Liu B (2018) A scalable purchase intention prediction system using extreme gradient boosting machines with browsing content entropy. In: 2018 IEEE International Conference on Consumer Electronics (ICCE). IEEE, pp 1–4
Zhou QM, Zhe L, Brooke RJ, Hudson MM, Yuan Y (2021) A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve. Diagn Progn Res 5(1):1–15
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
The authors declare that all listed authors have approved the manuscript before submission, including the names and order of authors. All authors are responsible for correctness of the statements provided in the manuscript.
Corresponding author
Ethics declarations
Conflict interest
The authors have no conflicts of interest to declare that are relevant to the content of this article. They did not receive support from any organization for the submitted work.
Ethical approval
The authors declare that this study does not involve human or animal participants.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ehsani, F., Hosseini, M. Customer purchase prediction in electronic markets from clickstream data using the Oracle meta-classifier. Oper Res Int J 24, 11 (2024). https://doi.org/10.1007/s12351-023-00813-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12351-023-00813-6