research-article

Repeat Buyer Prediction for E-Commerce

Authors:

Wei ChenAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 155 - 164

https://doi.org/10.1145/2939672.2939674

Published: 13 August 2016 Publication History

Abstract

A large number of new buyers are often acquired by merchants during promotions. However, many of the attracted buyers are one-time deal hunters, and the promotions may have little long-lasting impact on sales. It is important for merchants to identify who can be converted to regular loyal buyers and then target them to reduce promotion cost and increase the return on investment (ROI). At International Joint Conferences on Artificial Intelligence (IJCAI) 2015, Alibaba hosted an international competition for repeat buyer prediction based on the sales data of the ``Double 11" shopping event in 2014 at Tmall.com. We won the first place at stage 1 of the competition out of 753 teams. In this paper, we present our winning solution, which consists of comprehensive feature engineering and model training. We created profiles for users, merchants, brands, categories, items and their interactions via extensive feature engineering. These profiles are not only useful for this particular prediction task, but can also be used for other important tasks in e-commerce, such as customer segmentation, product recommendation, and customer base augmentation for brands. Feature engineering is often the most important factor for the success of a prediction task, but not much work can be found in the literature on feature engineering for prediction tasks in e-commerce. Our work provides some useful hints and insights for data science practitioners in e-commerce.

Supplementary Material

MP4 File (kdd2016_chen_buyer_prediction_01-acm.mp4)

Download
402.87 MB

References

[1]

Fitting generalized linear models. Available on https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html.

[2]

Generalized linear models. Available on http://scikit-learn.org/stable/modules/linear_model.html.

[3]

H. Abdi and L. J. Williams. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4):433--459, 2010.

Digital Library

[4]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(4--5):993--1022, 2003.

Digital Library

[5]

L. Breiman. Random forests. Mach. Learn., 45(1):5--32, 2001.

Digital Library

[6]

T. Chen and T. He. Xgboost: extreme gradient boosting. Available on https://github.com/dmlc/xgboost.

[7]

M. Dash and H. Liu. Feature selection for classification. Intelligent data analysis, 1(1):131--156, 1997.

Digital Library

[8]

P. Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78--87, 2012.

Digital Library

[9]

U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. of the International Joint Conference on Uncertainty in AI, pages 1022--1027, 1993.

[10]

J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.

[11]

Y.-C. Juan, W.-S. Chin, and Y. Zhuang. Field-aware factorization machines. Available on https://github.com/guestwalk/libffm.

[12]

S. Lhate and F. H. Julie Josse. FactoMineR: an R package for multivariate analysis. Journal of statistical software, 25(1):1--18, 2008.

[13]

L. C. Molina, L. Belanche, and Àngela Nebot. Feature selection algorithms: A survey and experimental evaluation. In ICDM, pages 306--313, 2002.

Digital Library

[14]

S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology, 3(3), 2012.

Digital Library

[15]

K.-Q. Shen, C.-J. Ong, X.-P. Li, and E. Wilder-Smith. Feature selection via sensitivity analysis of svm probabilistic outputs. Machine Learning, 70(1):1--20, 2008.

Digital Library

[16]

J.-B. Yang and C.-J. Ong. An effective feature selection method via mutual information estimation. IEEE Transactions on Systems, Man and Cybernetics (Part B), 42(6):1550 -- 1559, 2012.

Digital Library

Cited By

Zhao YXu YLiu YYang LJiang WNing WSun XCui L(2025)Conditional Potential User Mining framework via explainable surrogate modelsExpert Systems with Applications10.1016/j.eswa.2024.125587262(125587)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125587
Cho CSon YJeon GYoon DKim D(2024)Comparison Study of Biological Age Estimation Methods Using Korean National Health BigdataJournal of Health Informatics and Statistics10.21032/jhis.2024.49.3.22949:3(229-237)Online publication date: 31-Aug-2024
https://doi.org/10.21032/jhis.2024.49.3.229
Gooljar VIssa THardin-Ramanan SAbu-Salih B(2024)Sentiment-based predictive models for online purchases in the era of marketing 5.0: a systematic reviewJournal of Big Data10.1186/s40537-024-00947-011:1Online publication date: 5-Aug-2024
https://doi.org/10.1186/s40537-024-00947-0
Show More Cited By

Index Terms

Repeat Buyer Prediction for E-Commerce
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

A Comparative Study of Repeat Buyer Prediction: Kaggle Acquired Value Shopper Case Study
ICISS '19: Proceedings of the 2nd International Conference on Information Science and Systems

Many consumer brands try their best to offer promotions that attract new customers with that hope the customer will remain loyal to the brand and come back to buy more. However, only a fraction of customers who use these promotions actually remained ...
Brand competition in fashion e-commerce

Application of regression analysis to examine substitution effects in fashion e-commerce.Unexpectedly small extent of brand competition present in fashion e-commerce.Patterns and magnitude of cross-price effects are very distinct.Asymmetric competition ...
Assessing and managing e-commerce service convenience

Due to the burgeoning growth of electronic commerce (EC or e-commerce), online shopping has become a key competitive strategy for online retailers (e-retailers) to attract more customers, expand market boundaries, and create more benefits. Service ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
1,334
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao YXu YLiu YYang LJiang WNing WSun XCui L(2025)Conditional Potential User Mining framework via explainable surrogate modelsExpert Systems with Applications10.1016/j.eswa.2024.125587262(125587)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125587
Cho CSon YJeon GYoon DKim D(2024)Comparison Study of Biological Age Estimation Methods Using Korean National Health BigdataJournal of Health Informatics and Statistics10.21032/jhis.2024.49.3.22949:3(229-237)Online publication date: 31-Aug-2024
https://doi.org/10.21032/jhis.2024.49.3.229
Gooljar VIssa THardin-Ramanan SAbu-Salih B(2024)Sentiment-based predictive models for online purchases in the era of marketing 5.0: a systematic reviewJournal of Big Data10.1186/s40537-024-00947-011:1Online publication date: 5-Aug-2024
https://doi.org/10.1186/s40537-024-00947-0
Qi DZheng WWang J(2024)FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00146(1805-1818)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00146
Yao LAbisado M(2024)Prediction Method of O2O Coupon Based on Multi-Grained Attention Mechanism of CNN and Bi-GRUIEEE Access10.1109/ACCESS.2024.335905212(16902-16914)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3359052
Kuric EPuskas ADemcak PMensatorisova D(2023)Effect of Low-Level Interaction Data in Repeat Purchase Prediction TaskInternational Journal of Human–Computer Interaction10.1080/10447318.2023.217597340:10(2515-2533)Online publication date: 17-Feb-2023
https://doi.org/10.1080/10447318.2023.2175973
Cao JCao J(2023)Application of Business Big Data Management and Decision MakingE-Commerce Big Data Mining and Analytics10.1007/978-981-99-3588-8_9(181-203)Online publication date: 30-Jul-2023
https://doi.org/10.1007/978-981-99-3588-8_9
Wen YYang HPeng W(2022)Mining Willing-to-Pay Behavior Patterns from Payment DatasetsACM Transactions on Intelligent Systems and Technology10.1145/348584813:1(1-19)Online publication date: 6-Feb-2022
https://dl.acm.org/doi/10.1145/3485848
Yang LWu JNiu XShi L(2022)Towards purchase prediction: a voting-based method leveraging transactional information2022 5th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT55514.2022.9943898(1-5)Online publication date: 22-Jul-2022
https://doi.org/10.1109/DSIT55514.2022.9943898
Guo WZhou Z(2022)A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default predictionJournal of Forecasting10.1002/for.2856Online publication date: 11-May-2022
https://doi.org/10.1002/for.2856
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten