skip to main content
10.1145/2939672.2939674acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Repeat Buyer Prediction for E-Commerce

Published: 13 August 2016 Publication History

Abstract

A large number of new buyers are often acquired by merchants during promotions. However, many of the attracted buyers are one-time deal hunters, and the promotions may have little long-lasting impact on sales. It is important for merchants to identify who can be converted to regular loyal buyers and then target them to reduce promotion cost and increase the return on investment (ROI). At International Joint Conferences on Artificial Intelligence (IJCAI) 2015, Alibaba hosted an international competition for repeat buyer prediction based on the sales data of the ``Double 11" shopping event in 2014 at Tmall.com. We won the first place at stage 1 of the competition out of 753 teams. In this paper, we present our winning solution, which consists of comprehensive feature engineering and model training. We created profiles for users, merchants, brands, categories, items and their interactions via extensive feature engineering. These profiles are not only useful for this particular prediction task, but can also be used for other important tasks in e-commerce, such as customer segmentation, product recommendation, and customer base augmentation for brands. Feature engineering is often the most important factor for the success of a prediction task, but not much work can be found in the literature on feature engineering for prediction tasks in e-commerce. Our work provides some useful hints and insights for data science practitioners in e-commerce.

Supplementary Material

MP4 File (kdd2016_chen_buyer_prediction_01-acm.mp4)

References

[1]
Fitting generalized linear models. Available on https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html.
[2]
Generalized linear models. Available on http://scikit-learn.org/stable/modules/linear_model.html.
[3]
H. Abdi and L. J. Williams. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4):433--459, 2010.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(4--5):993--1022, 2003.
[5]
L. Breiman. Random forests. Mach. Learn., 45(1):5--32, 2001.
[6]
T. Chen and T. He. Xgboost: extreme gradient boosting. Available on https://github.com/dmlc/xgboost.
[7]
M. Dash and H. Liu. Feature selection for classification. Intelligent data analysis, 1(1):131--156, 1997.
[8]
P. Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78--87, 2012.
[9]
U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. of the International Joint Conference on Uncertainty in AI, pages 1022--1027, 1993.
[10]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.
[11]
Y.-C. Juan, W.-S. Chin, and Y. Zhuang. Field-aware factorization machines. Available on https://github.com/guestwalk/libffm.
[12]
S. Lhate and F. H. Julie Josse. FactoMineR: an R package for multivariate analysis. Journal of statistical software, 25(1):1--18, 2008.
[13]
L. C. Molina, L. Belanche, and Àngela Nebot. Feature selection algorithms: A survey and experimental evaluation. In ICDM, pages 306--313, 2002.
[14]
S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology, 3(3), 2012.
[15]
K.-Q. Shen, C.-J. Ong, X.-P. Li, and E. Wilder-Smith. Feature selection via sensitivity analysis of svm probabilistic outputs. Machine Learning, 70(1):1--20, 2008.
[16]
J.-B. Yang and C.-J. Ong. An effective feature selection method via mutual information estimation. IEEE Transactions on Systems, Man and Cybernetics (Part B), 42(6):1550 -- 1559, 2012.

Cited By

View all
  • (2025)Conditional Potential User Mining framework via explainable surrogate modelsExpert Systems with Applications10.1016/j.eswa.2024.125587262(125587)Online publication date: Mar-2025
  • (2024)Comparison Study of Biological Age Estimation Methods Using Korean National Health BigdataJournal of Health Informatics and Statistics10.21032/jhis.2024.49.3.22949:3(229-237)Online publication date: 31-Aug-2024
  • (2024)Sentiment-based predictive models for online purchases in the era of marketing 5.0: a systematic reviewJournal of Big Data10.1186/s40537-024-00947-011:1Online publication date: 5-Aug-2024
  • Show More Cited By

Index Terms

  1. Repeat Buyer Prediction for E-Commerce

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2016
    2176 pages
    ISBN:9781450342322
    DOI:10.1145/2939672
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. e-commerce
    2. feature engineering
    3. repeat buyer prediction

    Qualifiers

    • Research-article

    Conference

    KDD '16
    Sponsor:

    Acceptance Rates

    KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Conditional Potential User Mining framework via explainable surrogate modelsExpert Systems with Applications10.1016/j.eswa.2024.125587262(125587)Online publication date: Mar-2025
    • (2024)Comparison Study of Biological Age Estimation Methods Using Korean National Health BigdataJournal of Health Informatics and Statistics10.21032/jhis.2024.49.3.22949:3(229-237)Online publication date: 31-Aug-2024
    • (2024)Sentiment-based predictive models for online purchases in the era of marketing 5.0: a systematic reviewJournal of Big Data10.1186/s40537-024-00947-011:1Online publication date: 5-Aug-2024
    • (2024)FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00146(1805-1818)Online publication date: 13-May-2024
    • (2024)Prediction Method of O2O Coupon Based on Multi-Grained Attention Mechanism of CNN and Bi-GRUIEEE Access10.1109/ACCESS.2024.335905212(16902-16914)Online publication date: 2024
    • (2023)Effect of Low-Level Interaction Data in Repeat Purchase Prediction TaskInternational Journal of Human–Computer Interaction10.1080/10447318.2023.217597340:10(2515-2533)Online publication date: 17-Feb-2023
    • (2023)Application of Business Big Data Management and Decision MakingE-Commerce Big Data Mining and Analytics10.1007/978-981-99-3588-8_9(181-203)Online publication date: 30-Jul-2023
    • (2022)Mining Willing-to-Pay Behavior Patterns from Payment DatasetsACM Transactions on Intelligent Systems and Technology10.1145/348584813:1(1-19)Online publication date: 6-Feb-2022
    • (2022)Towards purchase prediction: a voting-based method leveraging transactional information2022 5th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT55514.2022.9943898(1-5)Online publication date: 22-Jul-2022
    • (2022)A comparative study of combining tree‐based feature selection methods and classifiers in personal loan default predictionJournal of Forecasting10.1002/for.2856Online publication date: 11-May-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media