Skip to main content
Log in

Customer purchase prediction in electronic markets from clickstream data using the Oracle meta-classifier

  • Original Paper
  • Published:
Operational Research Aims and scope Submit manuscript

Abstract

Electronic commerce has brought about a fundamental change in marketing atmosphere. Marketers strive to gain a competitive advantage by providing exciting electronic platforms to attract visitors. Suppliers must comprehend and fulfill their buyers’ demands to increase their income and profit. Analyzing users’ behavior and predicting their purchase attitudes significantly influences marketing strategies. A comprehensive prediction of customers’ purchase intention is crucial to developing good marketing strategies, which may trigger much greater purchase amounts. Marketing analysts utilize various data mining approaches to identify consumer traits and intentions, helping marketing analyzers distinguish complex consumption patterns. This paper aims to recommend an ensemble meta-classifier algorithm for predicting customer purchase intentions in electronic markets using their clickstream data. To achieve this, we performed the RFECV procedure to select the most practical features, thus achieving the highest accuracy. Additionally, we applied hyperparameter optimization and tuning to select the best parameters for prediction. Furthermore, we chose to use Oracle as a static ensemble meta-classifier method. The results demonstrate that using the Oracle algorithm in combination with supervised classifiers and RFECV feature selection achieves higher precision and a lower error rate compared to conventional classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Agrawal T, Agrawal T (2021) Hyperparameter optimization using scikit-learn. In: Hyperparameter optimization in machine learning: make your machine learning and deep learning models more efficient, pp 31–51

  • Alalwan AA (2018) Investigating the impact of social media advertising features on customer purchase intention. Int J Inf Manag 42:65–77

    Google Scholar 

  • Alghanam OA, Al-Khatib SN, Hiari MO (2022) Data mining model for predicting customer purchase behavior in e-commerce context. Int J Adv Comput Sci Appl 13(2)

  • Alkufahy A, Al-Alshare F, Qawasmeh F, Aljawarneh N, Almaslmani R (2023) The mediating role of the perceived value on the relationships between customer satisfaction, customer loyalty and e-marketing. Int J Data Netw Sci 7(2):891–900

    Google Scholar 

  • Anitha P, Patil MM (2022) RFM model for customer purchase behavior using K- Means algorithm. J King Saud Univ-Comput Inf Sci 34(5):1785–1792

    Google Scholar 

  • Arasu BS, Seelan BJB, Thamaraiselvan N (2020) A machine learning-based approach to enhancing social media marketing. Comput Electr Eng 86:106723

    Google Scholar 

  • Ascarza E, Neslin SA, Netzer O, Anderson Z, Fader PS, Gupta S, Schrift R (2018) In pursuit of enhanced customer retention management: review, key issues, and future directions. Cust Needs Solut 5:65–81

    Google Scholar 

  • Ash T, Ginty M, Page R (2012) Landing page optimization: the definitive guide to testing and tuning for conversions. John

    Google Scholar 

  • Baumann A, Haupt J, Gebert F, Lessmann S (2018) Changing perspectives: using graph metrics to predict purchase probabilities. Expert Syst Appl 94:137–148

    Google Scholar 

  • Baumann A, Haupt J, Gebert F, Lessmann S (2019) The price of privacy: an evaluation of the economic value of collecting clickstream data. Bus Inf Syst Eng 61:413–431

    Google Scholar 

  • Brownlee J (2020) Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery

  • Bucklin RE, Sismeiro C (2009) Click here for Internet insight: advances in clickstream data analysis in marketing. J Interact Mark 23(1):35–48

    Google Scholar 

  • Canbek G, Sagiroglu S, Temizel TT, Baykal N (2017) Binary classification performance measures/metrics: a comprehensive visualized roadmap to gain new insights. In 2017 international conference on computer science and engineering (UBMK). IEEE, Chicago, pp 821–826

  • Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S (2012) Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Syst Appl 39(12):11243–11249

    Google Scholar 

  • Carreón ECA, Nonaka H, Hentona A, Yamashiro H (2019) Measuring the influence of mere exposure effect of TV commercial adverts on purchase behavior based on machine learning prediction models. Inf Process Manag 56(4):1339–1355

    Google Scholar 

  • Cateni S, Colla V, Vannucci M (2014) A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 135:32–41

    Google Scholar 

  • Chaudhuri N, Gupta G, Vamsi V, Bose I (2021) On the platform but will they buy? Predicting customers’ purchase behavior using deep learning. Decis Support Syst 149:113622

    Google Scholar 

  • Chen M, Chen ZL (2015) Recent developments in dynamic pricing research: multiple products, competition, and limited demand information. Prod Oper Manag 24(5):704–731

    Google Scholar 

  • Chen S-X, Wang X-K, Zhang H-Y, Wang J-Q (2021) Customer purchase prediction from the perspective of imbalanced data: a machine learning framework based on factorization machine. Expert Syst Appl 173:114756

    Google Scholar 

  • Chintagunta PK, Chu J, Cebollada J (2012) Quantifying transaction costs in online/off-line grocery channel choice. Mark Sci 31(1):96–114

    Google Scholar 

  • Chong D, Ali H (2022) Literature review: competitive strategy, competitive advantages, and marketing performance on e-commerce Shopee Indonesia. Dinasti Int J Digit Bus Manag 3(2):299–309

    Google Scholar 

  • Chou P, Chuang HHC, Chou YC, Liang TP (2022) Predictive analytics for customer repurchase: interdisciplinary integration of buy till you die modeling and machine learning. Eur J Oper Res 296(2):635–651

    Google Scholar 

  • Claesen M, Simm J, Popovic D, Moor B (2014) Hyperparameter tuning in Python using optunity. In: paper presented at the proceedings of the international workshop on technical computing for machine learning and mathematical engineering

  • Cruz RM, Sabourin R, Cavalcanti GD (2018) Dynamic classifier selection: recent advances and perspectives. Inf Fusion 41:195–216

    Google Scholar 

  • Cruz RM, Hafemann LG, Sabourin R, Cavalcanti GD (2020) DESlib: a dynamic ensemble selection library in Python. J Mach Learn Res 21(1):283–287

    Google Scholar 

  • Dai Q, Liu JW, Liu Y (2022) Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl Soft Comput 124:109083

    Google Scholar 

  • Ding AW, Li S, Chatterjee P (2015) Learning user real-time intent for optimal dynamic web page transformation. Inf Syst Res 26(2):339–359

    Google Scholar 

  • Dong Y, Jiang W (2019) Brand purchase prediction based on time-evolving user behaviors in e-commerce. Concurr Comput Pract Exp 31(1):e4882

    Google Scholar 

  • Ehikioya SA, Lu S (2020) A traffic tracking analysis model for the effective management of e-commerce transactions. Int J Netw Distrib Comput 8(3):171–193

    Google Scholar 

  • Ehsani F, Hosseini M (2023a) Consumer segmentation based on location and timing dimensions using big data from business-to-customer retailing marketplaces. Big Data 11(5):1–16

    Google Scholar 

  • Ehsani F, Hosseini M (2023b) Investigation to determine elements influencing customer's satisfaction in the B2C electronic retailing marketplaces. EuroMed J Bus 18(3):321–344

    Google Scholar 

  • Felix E (2015) Marketing challenges of satisfying consumers changing expectations and preferences in a competitive market. Int J Mark Stud 7(5):41

    Google Scholar 

  • Forslund J, Fahlén J (2020) Predicting customer purchase behavior within Telecom: how Artificial Intelligence can be collaborated into marketing efforts. Master of Science Thesis TRITA-ITM-EX 2020:356, KTH Industrial Engineering and Management, Stockholm

  • Ghosh S, Banerjee C (2020) A predictive analysis model of customer purchase behavior using modified random forest algorithm in cloud environment. In: 2020 IEEE 1st international conference for convergence in engineering (ICCE). IEEE, pp 239–244

  • Holland CP, Thornton SC, Naudé P (2020) B2B analytics in the airline market: harnessing the power of consumer big data. Ind Mark Manage 86:52–64

    Google Scholar 

  • Hou C, Chen C, Wang J (2018) Tree-based feature transformation for purchase behavior prediction. IEICE Trans Inf Syst 101(5):1441–1444

    Google Scholar 

  • Huang C-C, Liang W-Y, Lai Y-H, Lin Y-C (2010) The agent-based negotiation process for B2C e-commerce. Expert Syst Appl 37(1):348–359

    Google Scholar 

  • Huidobro A, Monroy R, Cervantes B (2022) A High-level representation of the navigation behavior of website visitors. Appl Sci 12(13):6711

    CAS  Google Scholar 

  • Hutter F, Kotthoff L, Vanschoren J (2019) Automated machine learning: methods, systems, challenges. Springer Nature, p 219

    Google Scholar 

  • Joshi R, Gupte R, Saravanan P (2018) A random forest approach for predicting online buying behavior of Indian customers. Theor Econ Lett 8(03):448

    Google Scholar 

  • Kabir MR, Ashraf FB, Ajwad R (2019). Analysis of different predicting model for online shoppers’ purchase intention from empirical data. In: 2019 22nd international conference on computer and information technology (ICCIT). IEEE, pp 1–6

  • Khachatryan H, Hodges AW, Hall C, Palma M (2020) Production and marketing practices and trade flows in the United States green industry, 2018. South Coop Ser Bull 421:2020–2021

    Google Scholar 

  • Kircova I, SaglamMH, Kose SG (2021) Artificial intelligence in retailing. University of South Florida (USF) M3 Publishing, 5, p 73

  • Koehn D, Lessmann S, Schaal M (2020) Predicting online shopping behaviour from clickstream data using deep learning. Expert Syst Appl 150:113342

    Google Scholar 

  • Kumar A, Kabra G, Mussada EK, Dash MK, Rana PS (2019) Combined artificial bee colony algorithm and machine learning techniques for prediction of online consumer repurchase intention. Neural Comput Appl 31(2):877–890

    Google Scholar 

  • Li Z, Xie H, Xu G, Li Q, Leng M, Zhou C (2021) Towards purchase prediction: a transaction-based setting and a graph-based method leveraging price information. Pattern Recogn 113:107824

    Google Scholar 

  • Liengaard BD, Sharma PN, Hult GTM, Jensen MB, Sarstedt M, Hair JF, Ringle CM (2021) Prediction: coveted, yet forsaken? Introducing a cross-validated predictive ability test in partial least squares path modeling. Decis Sci 52(2):362–392

    Google Scholar 

  • Lin W, Milic-Frayling N, Zhou K, Ch'ng E (2019) Predicting outcomes of active sessions using multi-action motifs. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp 9–17

  • Liu Y (2007) The long-term impact of loyalty programs on consumer purchase behavior and loyalty. J Mark 71(4):19–35

    Google Scholar 

  • Liu B, Zhang H, Kong L, Niu D (2021) Factorizing historical user actions for next-day purchase prediction. ACM Trans Web (TWEB) 16(1):1–26

    Google Scholar 

  • Lubis AN, Lumbanraja P, Hasibuan BK (2022) Evaluation on e-marketing exposure practice to minimize the customers’ online shopping purchase regret. Cogent Bus Manag 9(1):2016039

    Google Scholar 

  • Luo MM, Chen JS, Ching RK, Liu CC (2011) An examination of the effects of virtual experiential marketing on online customer intentions and loyalty. Serv Ind J 31(13):2163–2191

    Google Scholar 

  • Malmasi S, Tetreault J, Dras M (2015) Oracle and human baselines for native language identification. In: proceedings of the tenth workshop on innovative use of NLP for building educational applications, pp 172–178

  • Marqués AI, García V, Sánchez JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070

    Google Scholar 

  • Montgomery AL, Li S, Srinivasan K, Liechty JC (2004) Modeling online browsing and path analysis using clickstream data. Mark Sci 23(4):579–595

    Google Scholar 

  • Nasir S (2017) Customer retention strategies and customer loyalty. Advertising and branding: concepts, methodologies, tools, and applications. IGI Global, pp 1178–1201

    Google Scholar 

  • Nisar TM, Prabhakar G (2017) What factors determine e-satisfaction and consumer spending in e-commerce retailing? J Retail Consum Serv 39:135–144

    Google Scholar 

  • Noviantoro T, Huang JP (2021) Applying data mining techniques to investigate online shopper purchase intention based on clickstream data. Rev Bus Account Financ 1(2):130–159

    Google Scholar 

  • Płoński P, Zaremba K (2014) Visualizing random forest with self-organising map. In: artificial intelligence and soft computing: 13th international conference, ICAISC 2014, Zakopane, Poland, June 1–5, 2014, proceedings, Part II 13. Springer International Publishing, pp 63–71

  • Punj G (2011) Effect of consumer beliefs on online purchase behavior: the influence of demographic characteristics and consumption values. J Interact Mark 25(3):134–144

    Google Scholar 

  • Qiu J, Lin Z, Li Y (2015) Predicting customer purchase behavior in the e-commerce context. Electron Commer Res 15:427–452

    Google Scholar 

  • Rahim MA, Mushafiq M, Khan S, Arain ZA (2021) RFM-based repurchase behavior for customer classification and segmentation. J Retail Consum Serv 61:102566

    Google Scholar 

  • Rahman A, Khan MNA (2018) A classification based model to assess customer behavior in banking sector. Eng Technol Appl Sci Res 8(3):2949

    Google Scholar 

  • Rajeswari S, Suthendran K (2019) C5. 0: advanced decision tree (ADT) classification model for agricultural data analysis on cloud. Comput Electron Agric 156:530–539

    Google Scholar 

  • Raphaeli O, Goldstein A, Fink L (2017) Analyzing online consumer behavior in mobile and PC devices: a novel web usage mining approach. Electron Commer Res Appl 26:1–12

    Google Scholar 

  • Rudewicz J, Sala K (2021) New professional competencies in the era of WEB 2.0 and 3.0 and the dissemination of ICT. Przedsiębiorczość-Edukacja 17(1):19–34

    Google Scholar 

  • Safa NS, Ismail MA (2013) A customer loyalty formation model in electronic commerce. Econ Model 35:559–564

    Google Scholar 

  • Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Comput Appl 31(10):6893–6908

    Google Scholar 

  • Samboteng L, Rulinawaty R, Kasmad MR, Basit M, Rahim R (2022) Market basket analysis of administrative patterns data of consumer purchases using data mining technology. J Appl Eng Sci 20(2):339–345

    Google Scholar 

  • Shamsudin H, Yusof UK, Jayalakshmi A, Khalid MNA (2020) Combining oversampling and undersampling techniques for imbalanced classification: a comparative study using credit card fraudulent transaction dataset. In: 2020 IEEE 16th international conference on control & automation (ICCA). IEEE, pp 803–808

  • Shuai Y, Zheng Y, Huang H (2018) Hybrid software obsolescence evaluation model based on PCA-SVM-GridSearchCV. In: 2018 IEEE 9th international conference on software engineering and service science (ICSESS). IEEE, pp 449–453

  • Song H, Ruan WJ, Jeon YJJ (2021) An integrated approach to the purchase decision making process of food-delivery apps: focusing on the TAM and AIDA models. Int J Hosp Manag 95:102943

    Google Scholar 

  • Tillmanns S, Ter Hofstede F, Krafft M, Goetz O (2017) How to separate the wheat from the chaff: improved variable selection for new customer acquisition. J Mark 81(2):99–113

    Google Scholar 

  • Topal I (2019) Estimation of online purchasing intention using decision tree. Yönetim Ve Ekonomi Araştırmaları Dergisi 17(4):269–280

    MathSciNet  Google Scholar 

  • Valecha H, Varma A, Khare I, Sachdeva A, Goyal M (2018) Prediction of consumer behaviour using random forest algorithm. In: 2018 5th IEEE Uttar Pradesh section international conference on electrical, electronics and computer engineering (UPCON). IEEE, pp 1–6

  • Van Nguyen T, Zhou L, Chong AYL, Li B, Pu X (2020) Predicting customer demand for remanufactured products: a data-mining approach. Eur J Oper Res 281(3):543–558

    Google Scholar 

  • Vasić N, Kilibarda M, Kaurin T (2019) The influence of online shopping determinants on customer satisfaction in the Serbian market. J Theor Appl Electron Commer Res 14(2):70–89

    Google Scholar 

  • Wang XS, Ryoo JHJ, Bendle N, Kopalle PK (2021a) The role of machine learning analytics and metrics in retailing research. J Retail 97(4):658–675

    Google Scholar 

  • Wang Z, Zhao S, Li Z, Chen H, Li C, Shen Y (2021b) Ensemble selection with joint spectral clustering and structural sparsity. Pattern Recognit 119:108061

    Google Scholar 

  • Wei JT, Lee MC, Chen HK, Wu HH (2013) Customer relationship management in the hairdressing industry: an application of data mining techniques. Expert Syst Appl 40(18):7513–7518

    Google Scholar 

  • Weingarten J, Spinler S (2021) Shortening delivery times by predicting customers’ online purchases: a case study in the fashion industry. Inf Syst Manag 38(4):287–308

    Google Scholar 

  • Wen YT, Yeh PW, Tsai TH, Peng WC, Shuai HH (2018) Customer purchase behavior prediction from payment datasets. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp 628–636

  • Xiaolong XU, Wen CHEN, Yanfei SUN (2019) Over-sampling algorithm for imbalanced data classification. J Syst Eng Electron 30(6):1182–1191

    Google Scholar 

  • Zavali M, Lacka E, De Smedt J (2021) Shopping hard or hardly shopping: revealing consumer segments using clickstream data. IEEE Trans Eng Manag

  • Zeng H, Pan D (2010) A knowledge discovery and data mining process model in E-marketing. In: 2010 8th World Congress on Intelligent Control and Automation. IEEE, pp 3960–3964

  • Zheng B, Liu B (2018) A scalable purchase intention prediction system using extreme gradient boosting machines with browsing content entropy. In: 2018 IEEE International Conference on Consumer Electronics (ICCE). IEEE, pp 1–4

  • Zhou QM, Zhe L, Brooke RJ, Hudson MM, Yuan Y (2021) A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve. Diagn Progn Res 5(1):1–15

    CAS  Google Scholar 

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

The authors declare that all listed authors have approved the manuscript before submission, including the names and order of authors. All authors are responsible for correctness of the statements provided in the manuscript.

Corresponding author

Correspondence to Monireh Hosseini.

Ethics declarations

Conflict interest

The authors have no conflicts of interest to declare that are relevant to the content of this article. They did not receive support from any organization for the submitted work.

Ethical approval

The authors declare that this study does not involve human or animal participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ehsani, F., Hosseini, M. Customer purchase prediction in electronic markets from clickstream data using the Oracle meta-classifier. Oper Res Int J 24, 11 (2024). https://doi.org/10.1007/s12351-023-00813-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12351-023-00813-6

Keywords

Navigation