Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting

Laurinec, Peter; Lucká, Mária

doi:10.1007/s10618-018-0598-2

Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting

Published: 16 November 2018

Volume 33, pages 413–445, (2019)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

1250 Accesses
20 Citations
2 Altmetric
Explore all metrics

Abstract

This paper presents a new interpretable approach for multiple data streams clustering in a smart grid used for the improvement of forecasting accuracy of aggregated electricity consumption and grid analysis named ClipStream. Consumers time series streams are compressed and represented by interpretable features extracted from the clipped representation. The proposed representation has low computational complexity and is incremental in the sense of the windowing method. From the extracted features, outlier consumers can be simply and quickly detected. The clustering phase consists of three parts: clustering non-outlier representations, the aggregation of consumption within clusters, and unsupervised change detection procedure on aggregated time series streams windows. ClipStream behaviour and its forecasting accuracy improvement were evaluated on four different real datasets containing variable patterns of electricity consumption. The clustering accuracy with the proposed feature extraction method from the clipped representation was evaluated on 85 time series datasets from a large public repository. The results of experiments proved the stability of the proposed ClipStream in the sense of improving forecasting accuracy and showed the suitability of the proposed representation in many tested applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Uncertainty in big data analytics: survey, opportunities, and challenges

Article Open access 04 June 2019

Reihaneh H. Hariri, Erik M. Fredericks & Kate M. Bowers

A review and evaluation of elastic distance functions for time series clustering

Article Open access 07 September 2023

Christopher Holder, Matthew Middlehurst & Anthony Bagnall

A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives

Notes

References

Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on Very large data bases-volume 29, VLDB Endowment, pp 81–92
Aghabozorgi S, Seyed Shirkhorshidi A, Ying Wah T (2015) Time-series clustering: a decade review. Inf Syst 53:16–38
Article Google Scholar
Amini A, Saboohi H, Herawan T, Wah TY (2016) Mudi-stream: a multi density clustering algorithm for evolving data stream. J Netw Comput Appl 59:370–385
Article Google Scholar
Appice A, Guccione P, Malerba D, Ciampi A (2014) Dealing with temporal and spatial correlations to classify outliers in geophysical data streams. Inf Sci 285:162–180
Article MathSciNet MATH Google Scholar
Arora P, Deepali Varshney S (2016) Analysis of k-means and k-medoids algorithm for big data. Procedia Comput Sci 78:507–512
Article Google Scholar
Bagnall A, Ratanamahatana C, Keogh E, Lonardi S, Janacek G (2006) A bit level representation for time series data mining with shape based similarity. Data Min Knowl Discov 13(1):11–40
Article MathSciNet Google Scholar
Beringer J, Hüllermeier E (2007) Fuzzy clustering of parallel data streams. In: Advances in fuzzy clustering and its application, pp 333–352
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Amsterdam
MATH Google Scholar
Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In: Data engineering, 1999. Proceedings., 15th international conference on, IEEE, pp 126–133
Chen JY, He HH (2016) A fast density-based data stream clustering algorithm with cluster centers self-determined for mixed data. Inf Sci 345:271–293
Article Google Scholar
Chen L, Zou LJ, Tu L (2012) A clustering algorithm for multiple data streams based on spectral component similarity. Inf Sci 183(1):35–47
Article Google Scholar
Chen Y (2009) Clustering parallel data streams. InTech
Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 133–142
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The ucr time series classification archive www.cs.ucr.edu/~eamonn/time_series_data
Cleveland RB, Cleveland WS, McRae JE, Terpenning I (1990) STL: a seasonal-trend decomposition procedure based on loess. J Off Stat 6(1):3–73
Google Scholar
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex fourier series. Math Comput 19(90):297–301
Article MathSciNet MATH Google Scholar
Corduas M, Piccolo D (2008) Time series clustering and classification by the autoregressive metric. Comput Stat Data Anal 52(4):1860–1872
Article MathSciNet MATH Google Scholar
Dai BR, Huang JW, Yeh MY, Chen MS (2006) Adaptive clustering for multiple evolving streams. IEEE Trans Knowl Data Eng 18(9):1166–1180
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227
Article Google Scholar
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):1–34
Article MATH Google Scholar
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, ACM, New York, SIGMOD ’94, pp 419–429. https://doi.org/10.1145/191839.191925
Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383):553–569
Article MATH Google Scholar
Gama J, Rodrigues PP (2007) Stream-based electricity load forecast. In: Proceedings of the 11th European conference on principles and practice of knowledge discovery in databases (PKDD 2007) vol 4702, pp 446–453
Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461
Article Google Scholar
Hyndman R, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 27(3):1–22
Article Google Scholar
Hyndman R, Koehler AB, Ord JK, Snyder RD (2008) Forecasting with exponential smoothing: the state space approach. Springer, Berlin
Jarábek T, Laurinec P, Lucká M (2017) Energy load forecast using s2s deep neural networks with k-shape clustering. In: Informatics, 2017 IEEE 14th international scientific conference on, IEEE, pp 140–145
Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis. Wiley, London
MATH Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data. ACM, New York, SIGMOD ’01, pp 151–162. https://doi.org/10.1145/375663.375680
Keogh EJ, Pazzani MJ (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: Proceedings of the fourth international conference on knowledge discovery and data mining. AAAI Press, KDD’98, pp 239–243
Keogh EJ, Pazzani MJ (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: Terano T, Liu H, Chen ALP (eds) Knowledge discovery and data mining. Current issues and new applications. Springer, Berlin, pp 122–133
Chapter Google Scholar
Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191(Supplement C):34–43
Article Google Scholar
Laurinec P (2018) TSrepr R package: time series representations. J Open Source Softw 3(23):577. https://doi.org/10.21105/joss.00577
Article Google Scholar
Laurinec P, Lucká M (2016) Comparison of representations of time series for clustering smart meter data. In: Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2016, pp 458–463
Laurinec P, Lucká M (2017) New clustering-based forecasting method for disaggregated end-consumer electricity load using smart grid data. In: 2017 IEEE 14th international scientific conference on informatics, pp 210–215, https://doi.org/10.1109/INFORMATICS.2017.8327248
Laurinec P, Lucká M (2018) Clustering-based forecasting method for individual consumers electricity load using time series representations. Open Comput Sci 8(1):38–50
Article Google Scholar
Laurinec P, Lucká M (2018) Usefulness of unsupervised ensemble learning methods for time series forecasting of aggregated or clustered load. In: Appice A, Loglisci C, Manco G, Masciari E, Ras ZW (eds) New frontiers in mining complex patterns. Springer, Cham, pp 122–137
Chapter Google Scholar
Laurinec P, Lóderer M, Vrablecová P, Lucká M, Rozinajová V, Ezzeddine AB (2016) Adaptive time series forecasting of energy consumption using optimized cluster analysis. In: Data mining workshops (ICDMW), 2016 IEEE 16th international conference on, IEEE, pp 398–405
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery—DMKD ’03 p 2. https://doi.org/10.1145/882085.882086
Livera AMD, Hyndman RJ, Snyder RD (2011) Forecasting time series with complex seasonal patterns using exponential smoothing. J Am Stat Assoc 106(496):1513–1527. https://doi.org/10.1198/jasa.2011.tm09771
Article MathSciNet MATH Google Scholar
Manjoro WS, Dhakar M, Chaurasia BK (2016) Operational analysis of k-medoids and k-means algorithms on noisy data. In: 2016 International conference on communication and signal processing (ICCSP), pp 1500–1505. https://doi.org/10.1109/ICCSP.2016.7754408
McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. Am Stat 32(1):12–16
Google Scholar
Paparrizos J, Gravano L (2015) k-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, New York, SIGMOD ’15, pp 1855–1870. https://doi.org/10.1145/2723372.2737793
Pereira CMM, de Mello RF (2014) TS-stream: clustering time series on data streams. J Intell Inf Syst 42(3):531–566
Google Scholar
Pravilovic S, Bilancia M, Appice A, Malerba D (2017) Using multiple time series analysis for geosensor data forecasting. Inf Sci 380:31–52
Article Google Scholar
Ratanamahatana C, Keogh E, Bagnall AJ, Lonardi S (2005) A novel bit level time series representation with implication of similarity search and clustering. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 771–777
Razali NM, Wah YB et al (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J Stat Model Anal 2(1):21–33
Google Scholar
Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20(5):615–627
Article Google Scholar
Schofield JR, Carmichael R, Tindemans S, Bilton M, Woolf M, Strbac G, et al (2015) Low carbon london project: data from the dynamic time-of-use electricity pricing trial, 2013
Scholz FW, Stephens MA (1987) K-sample anderson–darling tests. J Am Stat Assoc 82(399):918–924
MathSciNet Google Scholar
Silva JA, Faria ER, Barros RC, Hruschka ER, Carvalho ACPLFD, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46(1):1–31
Article MATH Google Scholar
Strasser H, Weber C (1999) On the asymptotic theory of permutation statistics. In: SFB adaptive information systems and modelling in economics and management science
Yang J, Ning C, Deb C, Zhang F, Cheong D, Lee SE, Sekhar C, Tham KW (2017) k-shape clustering algorithm for building energy usage patterns analysis and forecasting model accuracy improvement. Energy Build 146:27–37
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the Slovak Research and Development Agency, Grant Nos. APVV-16-0484 and APVV-16-0213, and the Scientific Grant Agency of The Slovak Republic, Grant No. VG 1/0458/18.

Author information

Authors and Affiliations

Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, Ilkovičova 2, 842 16, Bratislava, Slovak Republic
Peter Laurinec & Mária Lucká

Authors

Peter Laurinec
View author publications
You can also search for this author in PubMed Google Scholar
Mária Lucká
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Laurinec.

Additional information

Responsible editor: Jesse Davis, Elisa Fromont, Derek Greene, Bjorn Bringmann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laurinec, P., Lucká, M. Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting. Data Min Knowl Disc 33, 413–445 (2019). https://doi.org/10.1007/s10618-018-0598-2

Download citation

Received: 26 November 2017
Accepted: 29 October 2018
Published: 16 November 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10618-018-0598-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting

Abstract

Access this article

Similar content being viewed by others

Uncertainty in big data analytics: survey, opportunities, and challenges

A review and evaluation of elastic distance functions for time series clustering

A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting

Abstract

Access this article

Similar content being viewed by others

Uncertainty in big data analytics: survey, opportunities, and challenges

A review and evaluation of elastic distance functions for time series clustering

A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation