Abstract
Research on forecasting has traditionally focused on building more accurate statistical models for a given time series. The models are mostly applied to limited data due to efficiency and scalability problems. However, many enterprise applications require scalable forecasting on large number of data series. For example, telecommunication companies need to forecast each of their customers’ traffic load to understand their usage behavior and to tailor targeted campaigns. Forecasting models are typically applied on aggregate data to estimate the total traffic volume for revenue estimation and resource planning. However, they cannot be easily applied to each user individually as building accurate models for large number of users would be time consuming. The problem is exacerbated when the forecasting process is continuous and the models need to be updated periodically. This paper addresses the problem of building and updating forecasting models continuously for multiple data series. We propose dynamic clustered modeling for forecasting by utilizing representative models as an analogy to cluster centers. We apply the models to each individual series through iterative nonlinear optimization. We develop two approaches: The Integrated Clustered Modeling integrates clustering and modeling simultaneously, and the Sequential Clustered Modeling applies them sequentially. Our findings indicate that modeling an individual’s behavior using its segment can be more scalable and accurate than the individual model itself. The grouped models avoid overfits and capture common motifs even on noisy data. Experimental results from a telco CRM application show the method is efficient and scalable, and also more accurate than having separate individual models.
Similar content being viewed by others
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB Proceedings (2003)
Alonso, A., Berrendero, J., Hernandez, A., Justel, A.: Time series clustering based on forecast densities. Comput. Stat. Data Anal. 51(2), 762–776 (2006)
Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control. Prentice Hall, Englewood Cliffs (1994)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)
Chakrabarti, D., Faloutsos, C.: F4: Large-scale automated forecasting using fractals. In: CIKM (2002)
Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: ICDE (1999)
Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Stat. Data Anal. 52, 1860–1872 (2008)
Dhiral, K.K., Kalpakis, K., Gada, D., Puttagunta, V.: Distance measures for effective clustering of arima time-series. In: ICDM (2001)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD (1994)
Furui, S.: Digital Speech Processing, Synthesis, and Recognition. Marcel Dekker, New York (1989)
Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: KDD (2011)
Hyndman, R.J., Khandakar, Y.: Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 27, 1–22 (2008)
Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. In: SIGMOD (1997)
Kumar, M., Patel, N.: Using clustering to improve sales forecasts in retail merchandising. Ann. Oper. Res. 174, 33–46 (2010)
Kevecka, I.: Forecasting traffic loads: neural networks vs. linear models. Comput. Model. New. Technol. 14, 20–28 (2010)
Li, L., Prakash, B.A.: Time series clustering: Complex is simpler! ICML (2011).
Li, L., Prakash, B.A., Faloutsos, C.: Parsimonious linear fingerprinting for time series. In: VLDB Proceedings (2010)
Lin, J., Vlachos, M., Keogh, E., Gunopulos, D.: Iterative incremental clustering of time series. In: EDBT (2004)
Makhoul, J.: Linear prediction: a tutorial review. In: Proceedings of the IEEE (1975)
Matsubara, Y., Sakurai, Y., Faloutsos, C., Iwata, T., Yoshikawa, M.: Fast mining and forecasting of complex time-stamped events. In: KDD (2012)
Rodrigues, P.P., Gama, J., Pedroso, J.P.: Hierarchical clustering of time-series data streams. In: TKDE (2008)
Makridakis, S.G., Wheelwright, S.C., Hyndman, R.J.: Forecasting: Methods and Applications. Wiley, New York (1998)
Shumway, R.H., Stoffer, D.S.: Time series analysis and its applications: with R examples (Springer Texts in Statistics) (2006)
Szmit, M., Szmit, A.: Usage of pseudo-estimator lad and sarima models for network traffic prediction: case studies. In: Computer Networks, Communications in Computer and Information Science (2012)
Warren Liao, T.: Clustering of time series data—a survey. Pattern Recogn. 38(11), 1857–1874 (2005)
Xiong Y., Yeung D-Y.: Mixtures of ARMA models for model-based time series clustering. In: IEEE international conference on data mining (ICDM)(2002)
Acknowledgments
This work is supported in part by The Scientific and Technological Research Council of Turkey under Grant EEEAG-111E217 and The Turkish Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gür, İ., Güvercin, M. & Ferhatosmanoglu, H. Scaling forecasting algorithms using clustered modeling. The VLDB Journal 24, 51–65 (2015). https://doi.org/10.1007/s00778-014-0363-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-014-0363-0