Skip to main content
Log in

ForeXGBoost: passenger car sales prediction based on XGBoost

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The rapid development of machine learning has spurred wide applications to various industries, where prediction models are built to forecast sales to help enterprises and governments make better plans. Alibaba Cloud and the Yancheng Municipal Government held a competition in 2018, calling for global efforts to build machine learning models that can accurately forecast vehicle sales based on large-scale datasets. This paper presents the design, implementation and evaluation of ForeXGBoost, and our proposed model that won the first place in the competition. ForeXGBoost takes full advantage of carefully-designed data filling algorithms for missing values to improve data quality. By using the sliding window to extract historical sales and production data features, ForeXGBoost can improve prediction accuracy. An extensive study is conducted to evaluate the influence of different attributes on vehicle sales via information gain and data correlation, based on which we select the most indicative features from the feature set for prediction. Furthermore, we leverage the XGBoost prediction algorithm to achieve a high prediction accuracy with short running time for vehicle sales prediction. Extensive experiments confirm that ForeXGBoost can achieve a high prediction accuracy with a low overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Alibaba Cloud TIANCHI Prediction of Passenger Car Sales Challenge (2018). https://tianchi.aliyun.com/competition/ information.htm?raceId=231640

  2. Astakhova, N.N., Demidova, L.A., Nikulchev, E.V.: Forecasting method for grouped time series with the use of k-means algorithm. Contemp. Eng. Sci. 8(2015), 1659–1677 (2015)

    Article  Google Scholar 

  3. Barfield, J.R., Welch, S., Taylor, T.S., et al.: Prediction of vehicle transactions and targeted advertising using vehicle telematics. US Patent App. 14/197,286 (2015)

  4. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: ACM International Conference on Knowledge Discovery and Data Mining (2016)

  5. Chen, T., He, T.: Higgs Boson discovery with boosted trees. In: NIPS 2014 Workshop on High-Energy Physics and Machine Learning (2015)

  6. Chen, T., He, T., Benesty, M., et al.: Xgboost: extreme gradient boosting. R package version 04–2, 1–4 (2015a)

  7. Chen, Y., Chen, Q., Zhang, F., Zhang, Q., Wu, K., Huang, R., Zhou, L.: Understanding viewer engagement of video service in wi-fi network. Comput. Netw. 91, 101–116 (2015b)

    Article  Google Scholar 

  8. Do, D., Huynh, P., Vo, P., Vu, T.: Customer churn prediction in an internet service provider. In: IEEE International Conference on Big Data (Big Data), pp. 3928–3933 (2017)

  9. Drucker, H., Cortes, C.: Boosting decision trees. In: Advances in Neural Information Processing Systems, pp. 479–485 (1996)

  10. Fantazzini, D., Toktamysova, Z.: Forecasting German car sales using Google data and multivariate models. Int. J. Product. Econ. 170, 97–135 (2015)

    Article  Google Scholar 

  11. Gao, J., Xie, Y., Gu, F., Xiao, W., Hu, J., Yu, W.: A hybrid optimization approach to forecast automobile sales of China. Adv. Mech. Eng. 9(8), 1687814017719422 (2017a)

    Article  Google Scholar 

  12. Gao, L., Wu, J., Zhou, C., Hu, Y.: Collaborative dynamic sparse topic regression with user profile evolution for item recommendation. In: Thirty-First AAAI Conference on Artificial Intelligence (2017b)

  13. Hassan, M., Yang, M., Rasheed, A., Jin, X., Xia, X., Xiao, Y., He, Z.: Time-series multispectral indices from unmanned aerial vehicle imagery reveal senescence rate in bread wheat. Remote Sens. 10(6), 809 (2018)

    Article  Google Scholar 

  14. Hebert, J.: Predicting rare failure events using classification trees on large scale manufacturing data with complex interactions. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2024–2028 (2016)

  15. Hong, T., Fan, S.: Probabilistic electric load forecasting: a tutorial review. Int. J. Forecast. 32(3), 914–938 (2016)

    Article  Google Scholar 

  16. Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, New York (2013)

    Book  Google Scholar 

  17. Hsu, M.W., Lessmann, S., Sung, M.C., Ma, T., Johnson, J.E.: Bridging the divide in financial market forecasting: machine learners vs. financial economists. Expert Syst. Appl. 61, 215–234 (2016)

    Article  Google Scholar 

  18. Hülsmann, M., Borscheid, D., Friedrich, C.M., Reith, D.: General sales forecast models for automobile markets and their analysis. Trans MLDM 5(2), 65–86 (2012)

    Google Scholar 

  19. Javed, M.A., Zeadally, S., Hamida, E.B.: Data analytics for cooperative intelligent transport systems. Veh. Commun. 15, 63–72 (2019)

    Google Scholar 

  20. Jiang, B., Fei, Y.: Vehicle speed prediction by two-level data driven models in vehicular networks. IEEE Trans. Intell. Transport. Syst. 18(7), 1793–1801 (2016)

    Article  Google Scholar 

  21. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)

  22. Kitapcı, O., Özekicioğlu, H., Kaynar, O., Taştan, S.: The effect of economic policies applied in Turkey to the sale of automobiles: multiple regression and neural network analysis. Procedia 148, 653–661 (2014)

    Google Scholar 

  23. Koochakpour, K., Tarokh, M.J.: Sales budget forecasting and revision by adaptive network fuzzy base inference system and optimization methods. J. Comput. Robot. 9(1), 25–38 (2016)

    Google Scholar 

  24. Kuremoto, T., Kimura, S., Kobayashi, K., Obayashi, M.: Time series forecasting using a deep belief network with restricted Boltzmann machines. Neurocomputing 137, 47–56 (2014)

    Article  Google Scholar 

  25. Lin, K., Lin, Q., Zhou, C., Yao, J.: Time series prediction based on linear regression and SVR. In: IEEE International Conference on Natural Computation (2007)

  26. Ling, X., Deng, W., Gu, C., Zhou, H., Li, C., Sun, F.: Model ensemble for click prediction in bing search ads. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 689–698 (2017)

  27. Litman, T.: Autonomous Vehicle Implementation Predictions. Victoria Transport Policy Institute, Victoria (2017)

    Google Scholar 

  28. Lu, W.X., Zhou, C., Wu, J.: Big social network influence maximization via recursively estimating influence spread. Knowl. Based Syst. 113, 143–154 (2016)

    Article  Google Scholar 

  29. Meneguette, R.I.: A vehicular cloud-based framework for the intelligent transport management of big cities. Int. J. Distrib. Sens. Netw. 12(5), 8198597 (2016)

    Article  Google Scholar 

  30. Mitchell, T.M., Learning, M.: Mcgraw-Hill science. Eng. Math. 1, 27 (1997)

    Google Scholar 

  31. Nielsen, D.: Tree Boosting With XGBoost-Why Does XGBoost Win “Every” Machine Learning Competition? Master’s Thesis, NTNU (2016)

  32. Pai, P.F., Liu, C.H.: Predicting vehicle sales by sentiment analysis of twitter data and stock market values. IEEE Access 6, 57655–57662 (2018)

    Article  Google Scholar 

  33. Pavlyshenko, B.M.: Linear, machine learning and probabilistic approaches for time series analysis. In: IEEE First International Conference on Data Stream Mining & Processing (DSMP), pp. 377–381 (2016)

  34. Perallos, A., Hernandez-Jayo, U., Zuazola, I.J.G., Onieva, E.: Intelligent Transport Systems: Technologies and Applications. Wiley, New York (2015)

    Book  Google Scholar 

  35. Sapankevych, N.I., Sankar, R.: Time series prediction using support vector machines: a survey. IEEE Comput. Intell. Mag. 4, 2 (2009)

    Article  Google Scholar 

  36. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge (2001)

    Google Scholar 

  37. Seber, G.A., Lee, A.J.: Linear Regression Analysis. Wiley, New York (2012)

    MATH  Google Scholar 

  38. Sjoberg, K., Andres, P., Buburuzan, T., Brakemeier, A.: Cooperative intelligent transport systems in europe: current deployment status and outlook. IEEE Veh. Technol. Mag. 12(2), 89–97 (2017)

    Article  Google Scholar 

  39. Sładkowski, A., Pamuła, W.: Intelligent Transportation Systems-Problems and Perspectives, vol. 303. Springer, Berlin (2016)

    Book  Google Scholar 

  40. Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)

    Article  Google Scholar 

  41. Wang, F.K., Chang, K.K., Tzeng, C.W.: Using adaptive network-based fuzzy inference system to forecast automobile sales. Expert Syst. Appl. 38(8), 10587–10593 (2011)

    Article  Google Scholar 

  42. Wang, J., Wang, J., Fang, W., Niu, H.: Financial time series prediction using elman recurrent random neural networks. Comput. Intell. Neurosci. (2016). https://doi.org/10.1155/2016/4742515

    Article  Google Scholar 

  43. Weigend, A.S.: Time Series Prediction: Forecasting the Future and Understanding the Past. Routledge, London (2018)

    Book  Google Scholar 

  44. Wu, J., Cai, Z., Zeng, S., Zhu, X.: Artificial immune system for attribute weighted Naive Bayes classification. In: IEEE the 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)

  45. Wu, J., Pan, S., Zhu, X., Zhang, C., Wu, X.: Multi-instance learning with discriminative bag mapping. IEEE Trans. Knowl. Data Eng. 30(6), 1065–1080 (2018)

    Article  Google Scholar 

  46. Ye, J., Chow, J.H., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: ACM Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2061–2064 (2009)

  47. Yu, H.F., Rao, N., Dhillon, I.S.: Temporal regularized matrix factorization for high-dimensional time series prediction. In: Advances in Neural Information Processing Systems, pp. 847–855 (2016)

  48. Yuan, C., Liu, S., Fang, Z.: Comparison of China’s primary energy consumption forecasting by using ARIMA (the autoregressive integrated moving average) model and GM (1, 1) model. Energy 100, 384–390 (2016)

    Article  Google Scholar 

  49. Zaytar, M.A., El Amrani, C.: Sequence to sequence weather forecasting with long short-term memory recurrent neural networks. Int. J. Comput. Appl. 143(11), 7–11 (2016)

    Google Scholar 

  50. Zhang, Q., Wu, J., Yang, H., Tian, Y., Zhang, C.: Unsupervised feature learning from time series. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, AAAI Press, IJCAI’16, pp. 2322–2328 (2016)

  51. Zhang, Y., Wu, J., Zhou, C., Cai, Z.: Instance cloned extreme learning machine. Pattern Recogn. 68, 52–65 (2017)

    Article  Google Scholar 

  52. Zhao, K., Wang, C.: Sales Forecast in E-commerce using Convolutional Neural Network. arXiv preprint arXiv:170807946 (2017)

  53. Zheng, H., Yuan, J., Chen, L.: Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies 10(8), 1168 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Funding was provided by the National Natural Science Foundation of China (Nos. 61772377, 61572370, 91746206), the Natural Science Foundation of Hubei Province of China (No. 2017CFA007), Science and Technology planning project of ShenZhen (JCYJ20170818112550194).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Libing Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, Z., Xue, S., Wu, L. et al. ForeXGBoost: passenger car sales prediction based on XGBoost. Distrib Parallel Databases 38, 713–738 (2020). https://doi.org/10.1007/s10619-020-07294-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-020-07294-y

Keywords

Navigation