Abstract
The Internet of Things (IoT) sparks a revolution in time series forecasting. Traditional techniques forecast time series individually, which becomes unfeasible when the focus changes to thousands of time series exhibiting anomalies like noise and missing values. This work presents CSAR, a technique forecasting a set of time series with only one model, and a feature-aware partitioning applying CSAR on subsets of similar time series. These techniques provide accurate forecasts a hundred times faster than traditional techniques, preparing forecasting for the arising challenges of the IoT era.
Funding source: European Regional Development Fund
Award Identifier / Grant number: 100320127
Funding source: Horizon 2020 Framework Programme
Award Identifier / Grant number: 731232
Funding statement: This work is partly funded (1) by the European Regional Development Fund (ERDF) under co-financing by the Free State of Saxony (100320127) and Systema GmbH, and (2) within the European Union’s Horizon 2020 research and innovation program under grant agreement No. 731232.
About the authors
Claudio Hartmann studied Computer Science at the Technische Universität Dresden (Diplom 2013, Promotion 2018). Currently, he is postdoc at the Database Systems Research Group of Prof. Wolfgang Lehner (since 2018). His research focuses on time series forecasting, especially, the prediction of large-scale data sets.
Lars Kegel studied Computer Science at Technische Universität Dresden where he received his diploma in 2015. His research focuses on feature-based time series analytics, i. e., on statistical representations for time series data sets and their use in data-mining tasks.
Wolfgang Lehner is full professor and head of the Database Systems Research Group at Technische Universität Dresden. His research interests range from designing data-management infrastructures from a modeling perspective, supporting data-intensive applications and processes in large distributed information systems, adding novel database functionality to relational database engines to support data science application, and exploiting modern hardware capabilities to optimize main-memory database systems.
Literature
1. J G De Gooijer and R J Hyndman. 25 Years of Time Series Forecasting. Int J Forecast, 22(3):443–473, 2006.10.1016/j.ijforecast.2006.01.001Search in Google Scholar
2. T M McCarthy, D F Davis, S L Golicic, and J T Mentzer. The Evolution of Sales Forecasting Management. J Forecast, 25(5):303–324, 2006.10.1002/for.989Search in Google Scholar
3. C Hartmann, F Ressel, M Hahmann, D Habich, and W Lehner. CSAR: the cross-sectional autoregression model for short and long-range forecasting. Int J Data Sci Anal, 2019.10.1007/s41060-018-00169-7Search in Google Scholar
4. K Kambatla, G Kollias, V Kumar, and A Grama. Trends in big data analytics. J Parallel Distrib Comput, 74(7):2561–2573, 2014.10.1016/j.jpdc.2014.01.003Search in Google Scholar
5. H Hassani and E S Silva. Forecasting with Big Data: A Review. AODS, 2(1):5–19, 2015.10.1007/s40745-015-0029-9Search in Google Scholar
6. G E P Box, G M Jenkins, and G C Reinsel. Time series analysis forecasting and control. Wiley, 2008.10.1002/9781118619193Search in Google Scholar
7. C C Holt. Forecasting seasonals and trends by exponentially weighted moving averages. Int J Forecast, 20(1):5–10, 2004.10.1016/j.ijforecast.2003.09.015Search in Google Scholar
8. C Hartmann, M Hahmann, W Lehner, and F Rosenthal. Exploiting big data in time series forecasting: A cross-sectional approach. In Proc of DSAA, 2015.10.1109/DSAA.2015.7344786Search in Google Scholar
9. B Neupane, T B Pedersen, and B Thiesson. Towards Flexibility Detection in Device-Level Energy Consumption. In Workshop Proc ECML PKDD, pages 1–16, 2014.10.1007/978-3-319-13290-7_1Search in Google Scholar
10. Y Sakurai, Y Matsubara, and C Faloutsos. Mining and Forecasting of Big Time-series Data. In Proc of SIGMOD, pages 919–922, 2015.10.1145/2723372.2731081Search in Google Scholar
11. J-H Böse, V Flunkert, J Gasthaus, T Januschowski, D Lange, D Salinas, S Schelter, M Seeger, and Y Wang. Probabilistic demand forecasting at scale. In Proc of VLDB, volume 10, pages 1694–1705, 2017.10.14778/3137765.3137775Search in Google Scholar
12. VDE Verband der Elektrotechnik Elektronik Informationstechnik e.V. Messwesen Strom (Metering Code); VDE-AR-N 4400, 2011.Search in Google Scholar
13. G C Tiao and G E P Box. Modeling Multiple Time Series with Applications. J Am Stat Assoc, 76(376):802–816, 1981.10.1080/01621459.1981.10477728Search in Google Scholar
14. J D Croston. Forecasting and Stock Control for Intermittent Demands. J Oper Res Soc, 23(3):289–303, 1972.10.1057/jors.1972.50Search in Google Scholar
15. C Hartmann. Forecasting Large-scale Time Series Data. PhD thesis, Technische Universität Dresden, 2018.Search in Google Scholar
16. T Warren Liao. Clustering of time series data – a survey. Pattern Recognit, 38:1857–1874, 2005.10.1016/j.patcog.2005.01.025Search in Google Scholar
17. S Aghabozorgi, A Seyed Shirkhorshidi, and T Ying Wah. Time-series clustering – A decade review. Inf Syst, 53:16–38, 2015.10.1016/j.is.2015.04.007Search in Google Scholar
18. X Wang, K Smith, and R Hyndman. Characteristic-Based Clustering for Time Series Data. Data Min Knowl Discov, 13(3):335–364, 2006.10.1007/s10618-005-0039-xSearch in Google Scholar
19. Tak-Chung Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1):164–181, 2011.10.1016/j.engappai.2010.09.007Search in Google Scholar
20. Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang, and Dongmei Zhang. YADING: Fast Clustering of Large-Scale Time Series Data. Proc of VLDB, 8(5):473–484, 2015.10.14778/2735479.2735481Search in Google Scholar
21. B D Fulcher, M A Little, and N S Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. J R Soc Interface, 10(83), 2013.10.1098/rsif.2013.0048Search in Google Scholar PubMed PubMed Central
22. B D Fulcher and N S Jones. Highly Comparative Feature-Based Time-Series Classification. IEEE Trans Knowl Data Eng, 26(12):3026–3037, 2014.10.1109/TKDE.2014.2316504Search in Google Scholar
23. R Agrawal, C Faloutsos, and A Swami. Efficient Similarity Search In Sequence Databases. In Proc of FODO, volume 730, pages 69–84, 1993.10.1007/3-540-57301-1_5Search in Google Scholar
24. K-P Chan and A W-C Fu. Efficient Time Series Matching by Wavelets. In Proc of ICDE, pages 126–133, 1999.Search in Google Scholar
25. C Goutte, L K Hansen, M G Liptrot, and E Rostrup. Feature-Space Clustering for fMRI Meta-Analysis. Hum Brain Mapp, 13:165–183, 2001.10.1002/hbm.1031Search in Google Scholar PubMed PubMed Central
26. L J P van der Maaten and G E Hinton. Visualizing Data Using t-SNE. J Mach Learn Res, (9):2579–2605, 2018.Search in Google Scholar
27. M Ester, H-P Kriegel, J Sander, and X Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc of SIGKDD, pages 226–231, 1996.Search in Google Scholar
28. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018.Search in Google Scholar
29. The Commission for Energy Regulation. CER Smart Metering Project, 2015.Search in Google Scholar
30. International Joint Conference on Artificial Intelligence. IJCAI 2017 – Data Mining Contest, 08.02.2017. https://tianchi.aliyun.com/competition/entrance/231591/information.Search in Google Scholar
31. R J Hyndman and Y Khandakar. Automatic Time Series Forecasting: The Forecast Package for R. J Stat Softw, 27(3):1–22, 2008.10.18637/jss.v027.i03Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston