Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Oldenbourg March 6, 2020

Feature-aware forecasting of large-scale time series data sets

  • Claudio Hartmann

    Claudio Hartmann studied Computer Science at the Technische Universität Dresden (Diplom 2013, Promotion 2018). Currently, he is postdoc at the Database Systems Research Group of Prof. Wolfgang Lehner (since 2018). His research focuses on time series forecasting, especially, the prediction of large-scale data sets.

    ORCID logo EMAIL logo
    , Lars Kegel

    Lars Kegel studied Computer Science at Technische Universität Dresden where he received his diploma in 2015. His research focuses on feature-based time series analytics, i. e., on statistical representations for time series data sets and their use in data-mining tasks.

    and Wolfgang Lehner

    Wolfgang Lehner is full professor and head of the Database Systems Research Group at Technische Universität Dresden. His research interests range from designing data-management infrastructures from a modeling perspective, supporting data-intensive applications and processes in large distributed information systems, adding novel database functionality to relational database engines to support data science application, and exploiting modern hardware capabilities to optimize main-memory database systems.

Abstract

The Internet of Things (IoT) sparks a revolution in time series forecasting. Traditional techniques forecast time series individually, which becomes unfeasible when the focus changes to thousands of time series exhibiting anomalies like noise and missing values. This work presents CSAR, a technique forecasting a set of time series with only one model, and a feature-aware partitioning applying CSAR on subsets of similar time series. These techniques provide accurate forecasts a hundred times faster than traditional techniques, preparing forecasting for the arising challenges of the IoT era.

ACM CCS:

Award Identifier / Grant number: 100320127

Award Identifier / Grant number: 731232

Funding statement: This work is partly funded (1) by the European Regional Development Fund (ERDF) under co-financing by the Free State of Saxony (100320127) and Systema GmbH, and (2) within the European Union’s Horizon 2020 research and innovation program under grant agreement No. 731232.

About the authors

Dr.-Ing. Claudio Hartmann

Claudio Hartmann studied Computer Science at the Technische Universität Dresden (Diplom 2013, Promotion 2018). Currently, he is postdoc at the Database Systems Research Group of Prof. Wolfgang Lehner (since 2018). His research focuses on time series forecasting, especially, the prediction of large-scale data sets.

Dipl.-Inf. Lars Kegel

Lars Kegel studied Computer Science at Technische Universität Dresden where he received his diploma in 2015. His research focuses on feature-based time series analytics, i. e., on statistical representations for time series data sets and their use in data-mining tasks.

Prof. Dr.-Ing. Wolfgang Lehner

Wolfgang Lehner is full professor and head of the Database Systems Research Group at Technische Universität Dresden. His research interests range from designing data-management infrastructures from a modeling perspective, supporting data-intensive applications and processes in large distributed information systems, adding novel database functionality to relational database engines to support data science application, and exploiting modern hardware capabilities to optimize main-memory database systems.

Literature

1. J G De Gooijer and R J Hyndman. 25 Years of Time Series Forecasting. Int J Forecast, 22(3):443–473, 2006.10.1016/j.ijforecast.2006.01.001Search in Google Scholar

2. T M McCarthy, D F Davis, S L Golicic, and J T Mentzer. The Evolution of Sales Forecasting Management. J Forecast, 25(5):303–324, 2006.10.1002/for.989Search in Google Scholar

3. C Hartmann, F Ressel, M Hahmann, D Habich, and W Lehner. CSAR: the cross-sectional autoregression model for short and long-range forecasting. Int J Data Sci Anal, 2019.10.1007/s41060-018-00169-7Search in Google Scholar

4. K Kambatla, G Kollias, V Kumar, and A Grama. Trends in big data analytics. J Parallel Distrib Comput, 74(7):2561–2573, 2014.10.1016/j.jpdc.2014.01.003Search in Google Scholar

5. H Hassani and E S Silva. Forecasting with Big Data: A Review. AODS, 2(1):5–19, 2015.10.1007/s40745-015-0029-9Search in Google Scholar

6. G E P Box, G M Jenkins, and G C Reinsel. Time series analysis forecasting and control. Wiley, 2008.10.1002/9781118619193Search in Google Scholar

7. C C Holt. Forecasting seasonals and trends by exponentially weighted moving averages. Int J Forecast, 20(1):5–10, 2004.10.1016/j.ijforecast.2003.09.015Search in Google Scholar

8. C Hartmann, M Hahmann, W Lehner, and F Rosenthal. Exploiting big data in time series forecasting: A cross-sectional approach. In Proc of DSAA, 2015.10.1109/DSAA.2015.7344786Search in Google Scholar

9. B Neupane, T B Pedersen, and B Thiesson. Towards Flexibility Detection in Device-Level Energy Consumption. In Workshop Proc ECML PKDD, pages 1–16, 2014.10.1007/978-3-319-13290-7_1Search in Google Scholar

10. Y Sakurai, Y Matsubara, and C Faloutsos. Mining and Forecasting of Big Time-series Data. In Proc of SIGMOD, pages 919–922, 2015.10.1145/2723372.2731081Search in Google Scholar

11. J-H Böse, V Flunkert, J Gasthaus, T Januschowski, D Lange, D Salinas, S Schelter, M Seeger, and Y Wang. Probabilistic demand forecasting at scale. In Proc of VLDB, volume 10, pages 1694–1705, 2017.10.14778/3137765.3137775Search in Google Scholar

12. VDE Verband der Elektrotechnik Elektronik Informationstechnik e.V. Messwesen Strom (Metering Code); VDE-AR-N 4400, 2011.Search in Google Scholar

13. G C Tiao and G E P Box. Modeling Multiple Time Series with Applications. J Am Stat Assoc, 76(376):802–816, 1981.10.1080/01621459.1981.10477728Search in Google Scholar

14. J D Croston. Forecasting and Stock Control for Intermittent Demands. J Oper Res Soc, 23(3):289–303, 1972.10.1057/jors.1972.50Search in Google Scholar

15. C Hartmann. Forecasting Large-scale Time Series Data. PhD thesis, Technische Universität Dresden, 2018.Search in Google Scholar

16. T Warren Liao. Clustering of time series data – a survey. Pattern Recognit, 38:1857–1874, 2005.10.1016/j.patcog.2005.01.025Search in Google Scholar

17. S Aghabozorgi, A Seyed Shirkhorshidi, and T Ying Wah. Time-series clustering – A decade review. Inf Syst, 53:16–38, 2015.10.1016/j.is.2015.04.007Search in Google Scholar

18. X Wang, K Smith, and R Hyndman. Characteristic-Based Clustering for Time Series Data. Data Min Knowl Discov, 13(3):335–364, 2006.10.1007/s10618-005-0039-xSearch in Google Scholar

19. Tak-Chung Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1):164–181, 2011.10.1016/j.engappai.2010.09.007Search in Google Scholar

20. Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang, and Dongmei Zhang. YADING: Fast Clustering of Large-Scale Time Series Data. Proc of VLDB, 8(5):473–484, 2015.10.14778/2735479.2735481Search in Google Scholar

21. B D Fulcher, M A Little, and N S Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. J R Soc Interface, 10(83), 2013.10.1098/rsif.2013.0048Search in Google Scholar PubMed PubMed Central

22. B D Fulcher and N S Jones. Highly Comparative Feature-Based Time-Series Classification. IEEE Trans Knowl Data Eng, 26(12):3026–3037, 2014.10.1109/TKDE.2014.2316504Search in Google Scholar

23. R Agrawal, C Faloutsos, and A Swami. Efficient Similarity Search In Sequence Databases. In Proc of FODO, volume 730, pages 69–84, 1993.10.1007/3-540-57301-1_5Search in Google Scholar

24. K-P Chan and A W-C Fu. Efficient Time Series Matching by Wavelets. In Proc of ICDE, pages 126–133, 1999.Search in Google Scholar

25. C Goutte, L K Hansen, M G Liptrot, and E Rostrup. Feature-Space Clustering for fMRI Meta-Analysis. Hum Brain Mapp, 13:165–183, 2001.10.1002/hbm.1031Search in Google Scholar PubMed PubMed Central

26. L J P van der Maaten and G E Hinton. Visualizing Data Using t-SNE. J Mach Learn Res, (9):2579–2605, 2018.Search in Google Scholar

27. M Ester, H-P Kriegel, J Sander, and X Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc of SIGKDD, pages 226–231, 1996.Search in Google Scholar

28. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018.Search in Google Scholar

29. The Commission for Energy Regulation. CER Smart Metering Project, 2015.Search in Google Scholar

30. International Joint Conference on Artificial Intelligence. IJCAI 2017 – Data Mining Contest, 08.02.2017. https://tianchi.aliyun.com/competition/entrance/231591/information.Search in Google Scholar

31. R J Hyndman and Y Khandakar. Automatic Time Series Forecasting: The Forecast Package for R. J Stat Softw, 27(3):1–22, 2008.10.18637/jss.v027.i03Search in Google Scholar

Received: 2019-10-02
Revised: 2020-02-07
Accepted: 2020-02-21
Published Online: 2020-03-06
Published in Print: 2020-05-27

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 26.4.2024 from https://www.degruyter.com/document/doi/10.1515/itit-2019-0035/html
Scroll to top button