Direction of the Difference Between Bayesian Model Averaging and the Best-Fit Model on Scarce-Data Low-Correlation Churn Prediction

Darwen, Paul J.

doi:10.1007/978-981-99-5834-4_17

Paul J. Darwen ORCID: orcid.org/0000-0002-3481-0701¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13995))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

315 Accesses

Abstract

On a scarce-data customer churn prediction problem, using the tiny differences between the predictions of (1) the single-best model and (2) the ensemble from Bayesian model averaging, gives greater accuracy than state-of-the-art approaches such as XGBoost. The proposed approach reflects the cost-benefit aspect of many such problems: for customer churn, incentives to stay are expensive, so what’s needed is a short list of customers with a high probability of churning. It works even though in every test case, the predicted outcome is always the same from both the best-fit model and Bayesian model averaging. The approach suits many scarce-data prediction problems in commerce and medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See https://www.kaggle.com/huzaiftila/customer-churn-prediction-analysis.
2.
Some software packages do this automatically, such as SPSS and Unistat.
3.
Why 30 feature sets? Kreft’s 30-30 rule says that 30 groups of at least 30 each is reasonable. Here each group has thousands of models, so 30 of those groups should be enough. More than 30 would be desirable, but the CPU cost is already large.
4.
Clopper-Pearson does not approximate a binomial distribution with a normal distribution, instead it gives exact results for small samples.
5.
For some algorithms, a tiny increase in the threshold gives more than one extra customer, so the short list is not always exactly 60.
6.
Other MLP learning parameters were very ordinary: the optimizer was RMSpro, the activation function was relu for the hidden layers and sigmoid for the output, the initializer was GlorotUniform.

References

Ahmad, A.K., Jafar, A., Aljoumaa, K.: Customer churn prediction in telecom using machine learning in big data platform. J. Big Data 6(1), 1–24 (2019). https://doi.org/10.1186/s40537-019-0191-6
Article Google Scholar
Albuquerque, R.A.S., Costa, A.F.J., dos Santos, E.M.: A decision-based dynamic ensemble selection method for concept drift. In: 31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019, pp. 1132–1139. IEEE Computer Society, November 2019. https://doi.org/10.1109/ICTAI.2019.00158
Berger, P., Kompan, M.: User modeling for churn prediction in E-commerce. IEEE Intell. Syst. 34(2), 44–52 (2019). https://doi.org/10.1109/MIS.2019.2895788
Article Google Scholar
Braytee, A., Anaissi, A., Kennedy, P.J.: Sparse feature learning using ensemble model for highly-correlated high-dimensional data. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018. LNCS, vol. 11303, pp. 423–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04182-3_37
Chapter Google Scholar
Caves, C.M., Fuchs, C.A., Schack, R.: Quantum probabilities as Bayesian probabilities. Phys. Rev. A 65(2), 22305 (2002). https://doi.org/10.1103/PhysRevA.65.022305
Article MathSciNet Google Scholar
Cerqueira, V., Torgo, L., Pinto, F., Soares, C.: Arbitrage of forecasting experts. Mach. Learn. 108(6), 913–944 (2019). https://doi.org/10.1007/s10994-018-05774-y
Article MathSciNet MATH Google Scholar
Darwen, P.J.: The varying success of Bayesian model averaging: an empirical study of flood prediction. In: Sundaram, S. (ed.) 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1764–1771. IEEE, November 2018. https://doi.org/10.1109/SSCI.2018.8628939
Darwen, P.J.: Cost-effective prediction in medicine and marketing: only the difference between Bayesian model averaging and the single best-fit model. In: 31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019, pp. 1274–1279. IEEE Computer Society, November 2019. https://doi.org/10.1109/ICTAI.2019.00178
Garmendia-Mujika, A., Graña, M., Lopez-Guede, J.M., Rios, S.: Neural and statistical predictors for time to readmission in emergency departments: a case study. Neurocomputing 354, 3–9 (2019). https://doi.org/10.1016/j.neucom.2018.05.135
Article Google Scholar
Garmendia-Mujika, A., Rios, S.A., Lopez-Guede, J.M., Graña, M.: Triage prediction in pediatric patients with respiratory problems. Neurocomputing 326, 161–167 (2019). https://doi.org/10.1016/j.neucom.2017.01.122
Article Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Texts in Statistical Science Series, 2 edn. Chapman-Hall, Boca Raton (2004). https://doi.org/10.1201/9780429258411
Hammoudeh, A., Al-Naymat, G., Ghannam, I., Obied, N.: Predicting hospital readmission among diabetics using deep learning. Procedia Comput. Sci. 141, 484–489 (2018). https://doi.org/10.1016/j.procs.2018.10.138
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Hooijenga, D., Phan, R., Augusto, V., Xie, X., Redjaline, A.: Discriminant analysis and feature selection for emergency department readmission prediction. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 836–842. IEEE, November 2018. https://doi.org/10.1109/SSCI.2018.8628938
Liu, X., Xie, M., Wen, X., Chen, R., Ge, Y., Duffield, N., Wang, N.: A semi-supervised and inductive embedding model for churn prediction of large-scale mobile games. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 277–286. IEEE Computer Society (2018). https://doi.org/10.1109/ICDM.2018.00043
Najafi, M., Moradkhani, H., Jung, I.: Assessing the uncertainties of hydrologic model selection in climate change impact studies. Hydrol. Process. 25(18), 2814–2826 (2011). https://doi.org/10.1002/hyp.8043
Article Google Scholar
de O. Nunes, R., Dantas, C.A., Canuto, A.P., Xavier, J.C.: Dynamic feature selection for classifier ensembles. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 468–473. IEEE Computer Society, October 2018. https://doi.org/10.1109/BRACIS.2018.00087
Pondel, M., Wuczyński, M., Gryncewicz, W., Łysik, Ł., Hernes, M., Rot, A., Kozina, A.: Deep learning for customer churn prediction in e-commerce decision support. In: Proceedings of the 24th International Conference on Business Information Systems, pp. 3–12. TIB Open Publishing (2021). https://doi.org/10.52825/bis.v1i.42
Praseeda, C., Shivakumar, B.: Fuzzy particle swarm optimization (FPSO) based feature selection and hybrid kernel distance based possibilistic fuzzy local information C-means (HKD-PFLICM) clustering for churn prediction in telecom industry. SN Appl. Sci. 3(6), 1–18 (2021). https://doi.org/10.1007/s42452-021-04576-7
Article Google Scholar
Rapeli, L.: Does sophistication affect electoral outcomes? Gov. Oppos. 53(2), 181–204 (2018). https://doi.org/10.1017/gov.2016.23
Article Google Scholar
Saadallah, A., Priebe, F., Morik, K.: A drift-based dynamic ensemble members selection using clustering for time series forecasting. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11906, pp. 678–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46150-8_40
Chapter Google Scholar
Subramanya, K.B., Somani, A.: Enhanced feature mining and classifier models to predict customer churn for an E-retailer. In: Confluence 2017: 7th International Conference on Cloud Computing, Data Science and Engineering, pp. 531–536. IEEE (2017). https://doi.org/10.1109/CONFLUENCE.2017.7943208
Tazmini, K., Nymo, S.H., Louch, W.E., Ranhoff, A.H., Øie, E.: Electrolyte imbalances in an unselected population in an emergency department: a retrospective cohort study. PLoS ONE 14(4), e0215673 (2019). https://doi.org/10.1371/journal.pone.0215673
Article Google Scholar
Wurl, A., Falkner, A.A., Haselböck, A., Mazak, A., Sperl, S.: Combining prediction methods for hardware asset management. In: Proceedings of the 7th International Conference on Data Science, Technology and Applications - DATA 2018, pp. 13–23. SciTePress (2018). https://doi.org/10.5220/0006859100130023
Zhu, Y., et al.: Addressing the item cold-start problem by attribute-driven active learning. IEEE Trans. Knowl. Data Eng. 32(4), 631–644 (2020). https://doi.org/10.1109/TKDE.2019.2891530
Article Google Scholar

Download references

Acknowledgments

The author thanks Lachlan Butler, Harrison Burrows, and Juan Moredo for technical support. The author also thanks the anonymous peer reviewers for critically reading the manuscript and suggesting substantial improvements.

Author information

Authors and Affiliations

James Cook University, 349 Queen Street, Brisbane, QLD, Australia
Paul J. Darwen

Authors

Paul J. Darwen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul J. Darwen .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Siridech Boonsang
Iwate Prefectural University Iwate, Iwate, Japan
Hamido Fujita
Wroclaw University of Science and Technology, Wrocław, Poland
Bogumiła Hnatkowska
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
Malaysia Japan International Institute of Technology, Kuala Lumpur, Malaysia
Ali Selamat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Darwen, P.J. (2023). Direction of the Difference Between Bayesian Model Averaging and the Best-Fit Model on Scarce-Data Low-Correlation Churn Prediction. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2023. Lecture Notes in Computer Science(), vol 13995. Springer, Singapore. https://doi.org/10.1007/978-981-99-5834-4_17

Download citation

DOI: https://doi.org/10.1007/978-981-99-5834-4_17
Published: 05 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5833-7
Online ISBN: 978-981-99-5834-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Direction of the Difference Between Bayesian Model Averaging and the Best-Fit Model on Scarce-Data Low-Correlation Churn Prediction