Skip to main content
Log in

Scaling forecasting algorithms using clustered modeling

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Research on forecasting has traditionally focused on building more accurate statistical models for a given time series. The models are mostly applied to limited data due to efficiency and scalability problems. However, many enterprise applications require scalable forecasting on large number of data series. For example, telecommunication companies need to forecast each of their customers’ traffic load to understand their usage behavior and to tailor targeted campaigns. Forecasting models are typically applied on aggregate data to estimate the total traffic volume for revenue estimation and resource planning. However, they cannot be easily applied to each user individually as building accurate models for large number of users would be time consuming. The problem is exacerbated when the forecasting process is continuous and the models need to be updated periodically. This paper addresses the problem of building and updating forecasting models continuously for multiple data series. We propose dynamic clustered modeling for forecasting by utilizing representative models as an analogy to cluster centers. We apply the models to each individual series through iterative nonlinear optimization. We develop two approaches: The Integrated Clustered Modeling integrates clustering and modeling simultaneously, and the Sequential Clustered Modeling applies them sequentially. Our findings indicate that modeling an individual’s behavior using its segment can be more scalable and accurate than the individual model itself. The grouped models avoid overfits and capture common motifs even on noisy data. Experimental results from a telco CRM application show the method is efficient and scalable, and also more accurate than having separate individual models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. http://www.r-project.org/

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB Proceedings (2003)

  3. Alonso, A., Berrendero, J., Hernandez, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51(2), 762–776 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  4. Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs (1994)

    MATH  Google Scholar 

  5. Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chakrabarti, D., Faloutsos, C.: F4: Large-scale automated forecasting using fractals. In: CIKM (2002)

  7. Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: ICDE (1999)

  8. Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52, 1860–1872 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  9. Dhiral, K.K., Kalpakis, K., Gada, D., Puttagunta, V.: Distance measures for effective clustering of arima time-series. In: ICDM (2001)

  10. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD (1994)

  11. Furui, S.: Digital Speech Processing, Synthesis, and Recognition. Marcel Dekker, New York (1989)

    Google Scholar 

  12. Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: KDD (2011)

  13. Hyndman, R.J., Khandakar, Y.: Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 27, 1–22 (2008)

    Google Scholar 

  14. Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD (1997)

  15. Kumar, M., Patel, N.: Using clustering to improve sales forecasts in retail merchandising. Ann. Oper. Res. 174, 33–46 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kevecka, I.: Forecasting traffic loads: neural networks vs. linear models. Comput. Model. New. Technol. 14, 20–28 (2010)

  17. Li, L., Prakash, B.A.: Time series clustering: Complex is simpler! ICML (2011).

  18. Li, L., Prakash, B.A., Faloutsos, C.: Parsimonious linear fingerprinting for time series. In: VLDB Proceedings (2010)

  19. Lin, J., Vlachos, M., Keogh, E., Gunopulos, D.: Iterative incremental clustering of time series. In: EDBT (2004)

  20. Makhoul, J.: Linear prediction: a tutorial review. In: Proceedings of the IEEE (1975)

  21. Matsubara, Y., Sakurai, Y., Faloutsos, C., Iwata, T., Yoshikawa, M.: Fast mining and forecasting of complex time-stamped events. In: KDD (2012)

  22. Rodrigues, P.P., Gama, J., Pedroso, J.P.: Hierarchical clustering of time-series data streams. In: TKDE (2008)

  23. Makridakis, S.G., Wheelwright, S.C., Hyndman, R.J.: Forecasting: Methods and Applications. Wiley, New York (1998)

    Google Scholar 

  24. Shumway, R.H., Stoffer, D.S.: Time series analysis and its applications: with R examples (Springer Texts in Statistics) (2006)

  25. Szmit, M., Szmit, A.: Usage of pseudo-estimator lad and sarima models for network traffic prediction: case studies. In: Computer Networks, Communications in Computer and Information Science (2012)

  26. Warren Liao, T.: Clustering of time series data—a survey. Pattern Recogn. 38(11), 1857–1874 (2005)

    Article  MATH  Google Scholar 

  27. Xiong Y., Yeung D-Y.: Mixtures of ARMA models for model-based time series clustering. In: IEEE international conference on data mining (ICDM)(2002)

Download references

Acknowledgments

This work is supported in part by The Scientific and Technological Research Council of Turkey under Grant EEEAG-111E217 and The Turkish Academy of Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hakan Ferhatosmanoglu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gür, İ., Güvercin, M. & Ferhatosmanoglu, H. Scaling forecasting algorithms using clustered modeling. The VLDB Journal 24, 51–65 (2015). https://doi.org/10.1007/s00778-014-0363-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-014-0363-0

Keywords

Navigation