CPLP: An algorithm for tracking the changes of power consumption patterns in load profile data over time

doi:10.1016/j.ins.2017.11.006

Information Sciences

Volume 429, March 2018, Pages 332-348

https://doi.org/10.1016/j.ins.2017.11.006 Get rights and content

Abstract

In this paper, we propose a novel algorithm for tracking the Changes of Patterns in Load Profile (CPLP) data of factories. CPLP consists of two stages. The first stage is to cluster the load profiles in each time window and use the clusters to model the power consumption patterns. We propose a new ensemble clustering method to cluster the load profiles in consecutive time windows. It uses a hierarchical binary k-means algorithm to generate component clusterings and a new objective function to ensemble them to produce the final clustering. The second stage is to track the changes of patterns along the time windows. We propose a new method to detect the change of clusters from one window to the next one by using the distribution models of two related clusters in two neighboring windows. By using this method, we can link the clusters in the sequence of time windows to track the patterns. Experiments on synthetic and real-world load profile data have shown that the proposed algorithm was able to track the changes of power consumption patterns of different factory groups and identify the period of significant change, which are very useful for the smart grid applications.

Introduction

Load profile data provide information about power consumption behavior of selected customers over a given period. A load profile is a sequence of power consumption values measured by a smart meter at specified time intervals. A set of load profiles from selected customers can be used to analyze power consumption patterns and their changes over time. By tracking the changes of power consumption patterns from load profile data of factories, we can detect variations of power consumption in manufacturing processes, understand the work patterns of factories, and predict the productivity of factories. The information about the changes of power consumption patterns can find numerous applications in the power distribution industry such as determining optimal tariff rates, load forecasting, and energy demand management.

Clusters of load profiles in a given time window represent the power consumption patterns of users in the period. These patterns reflect power consumption behavior of users. Understanding the changes of power consumption patterns over time is very important for smart grid management. Since load profile data are streaming data, the approaches in concept drifting analysis of data streams can be adopted to analyze the changes of power consumption patterns in load profile data [6], [30].

In this paper, we present a concept drift approach for tracking the changes of power consumption patterns of manufacturing factories over time. The load profiles of selected factories are represented as a matrix where each row is a load profile of a factory, and each column is a set of power consumption measurements at a time slot. The matrix is vertically divided into a sequence of sub-matrices, each one representing the sub-load profiles of factories in a time window. Given this data representation, tracking the changes of power consumption patterns is achieved by finding the patterns in each time window and the changes of patterns along the sequence of windows. When solving this task, there are two challenges. The first one is to estimate the number of power consumption patterns and other one is to define the procedure for change detection [19].

This work is dedicated to proposing an algorithm for tracking the Changes of Patterns in Load Profiles (CPLP) of factories. The algorithm involves two steps. In the first step, we propose an ensemble clustering method to cluster the load profiles in each time window. A hierarchical binary k-means algorithm is used to generate multiple component clusterings. The output of this algorithm is a tree of clusters which represents the component clustering. We define a new stopping criteria for the tree generation process. By changing the input parameter value of tree generation, we generate multiple component clusterings from the same load profile data in a time window. We use a new objective function to ensemble the multiple component clusterings into a single clustering. In the second step, we track the changes of power consumption patterns along the time windows by using the obtained clustering results in all time windows. We define a new method to model two related clusters and check the change of distributions of two models. Two clusters are related if they have the maximum intersection of objects in neighboring windows. We define the survival pairs of clusters along time windows by obtained changes of distributions of related clusters to analyze the changes of power consumption patterns.

We present a series of experiments conducted on both synthetic and real data. Experiments on synthetic data have shown that the CPLP algorithm significantly outperformed the well-known state-of-the-art algorithms in almost all experiments. The experiments on real-world load profile data set demonstrated that the CPLP algorithm is able to track the changes of power consumption patterns of different factory groups and identify the period of significant change. The load profile data contains over 20,000 load profiles collected from manufacturing industries in Guangdong province of China in a period of six months in 2012. The power consumption measurements were collected at 15-minute interval. The results from this data are very important for the applications in the power distribution industry. The obtained results are useful to recommend good products to users and satisfy users demands as for as possible.

The rest of this paper is organized as follows. Section 2 introduces the load profile data and the research problem. Section 3 provides the related work. Section 4 presents the ensemble clustering method for load profile data. Section 5 presents the method for tracking the changes of power consumption patterns. Section 6 illustrates the experiments on synthetic data . Experimental results on real-world data are shown in Section 7. Some concluding remarks are given in Section 8.

Section snippets

Load profile data and research problem

The load profile data of N factories and J time slot measurements are represented as where each row of X represents a load profile of one factory and each column is a set of N power consumption measurements at a time slot. The element x_ij is the measurement of the ith factory at the jth time slot. By vector representation, X can be represented as

where X_i is the vector of load profile of factory i and Y_j is the vector of consumption values of N factories at time slot j. The sequence of Y

Related work

Change detection in data streams has been widely investigated due to its broad application potential in all walks of science and technology, for example, fraud detection, market analysis, medical condition monitoring, and network traffic control [1]. Many studies have been applied to generate approaches for choosing, sampling, splitting, growing, and shrinking the distributions of the data in windows for optimal change detection [4]. For example, the Kullback–Leibler (KL) [9] divergence is used

Ensemble clustering method

In this section, we present a new ensemble clustering method to discover power consumption patterns modeled as clusters in each time window. We first describe a hierarchical binary k-means algorithm for generating the component clusterings. The advantage of this algorithm is that it does not require the number of clusters to be known in advance. Then, we present a new objective function for generating the clustering ensemble from the component clusterings.

Method for tracking the changes of power consumption patterns

Given the sequence of sub-matrices $W_{1}, W_{2}, \dots, W_{B}$ in B consecutive time windows, we can use the ensemble clustering method discussed in Section 3 to generate B ensemble clusterings λ¹, λ², ... , λ^B, one for each time window. Let λ^fi and $λ^{f i + 1}$ be two ensemble clusterings of load profiles in two neighboring time windows W_i and $W_{i + 1},$ each consisting of a set of clusters. For any cluster C₁ from λ^fi, we can find a set of clusters in $λ^{f i + 1}$ which have intersections of load profiles with C₁, as shown in

Experiments on synthetic data

The motivation for development of the CPLP algorithm is to cluster noisy power consumption data and compute the change between two data distributions. Synthetic data is often used to validate a clustering algorithm [24]. To better understand the properties of the CPLP algorithm, synthetic data with different structures containing data noise were first used to investigate the performance of ensemble clustering method on clustering accuracy in comparison with other ensemble clustering algorithms,

Experimental results on real-world data

In this section, we conducted a series of experiments on real-world load profile data to verify the performance of the CPLP algorithm. At first, a description is given to the load profile data and experiment settings. Then, the experiment results are provided with a detailed discussion.

Conclusion

In this paper, we have presented a new CPLP algorithm for tracking the changes of power consumption patterns of load profile data of factories. It contains two parts. In the first part, we have used a new ensemble clustering method to cluster the load profile data of each time window and use the clusters to model the power consumption patterns. We have developed a new hierarchical binary k-means algorithm to generate high quality component clusterings without sacrificing too much diversity.

Acknowledgments

This work was supported by the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284), and National Natural Science Foundation of China (NSFC) Grant No. 61750110536. This work was partially supported by GDNSF fund (2015A030313782), and SUSTechStarup fund (Y01236215).

References (32)

A. Amini et al.
Mudi-stream: a multi density clustering algorithm for evolving data stream
J. Network Comput. Appl.
(2016)
F. Cao et al.
Trend analysis of categorical data streams with a concept change method
Inf. Sci.
(2014)
G. Chicco
Overview and performance assessment of the clustering methods for electrical load pattern grouping
Energy
(2012)
J. Hu et al.
Hierarchical cluster ensemble model based on knowledge granulation
Knowl. Based Syst.
(2016)
L. Jing et al.
Stratified feature sampling method for ensemble clustering of high dimensional data
Pattern Recognit.
(2015)
I. Khan et al.
Incremental density-based ensemble clustering over evolving data streams
Neurocomputing
(2016)
G. Milligan et al.
The validation of four ultrametric clustering algorithms
Pattern Recognit.
(1980)
L. Zhang et al.
Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values
Knowl. Based Syst.
(2016)
W. Zhong et al.
Clinical charge profiles prediction for patients diagnosed with chronic diseases using multi-level support vector machine
Expert Syst. Appl.
(2012)
C.C. Aggarwal
Data Streams: Models and Algorithms, Vol. 31
(2007)

A. Albert et al.

Smart meter driven segmentation: what your consumption says about you, power systems

IEEE Trans.

(2013)

A. Bifet et al.

Learning from time–changing data with adaptive windowing

SDM

(2007)

A. Bivens et al.

Network-based intrusion detection using neural networks

Department of Computer Science Rensselaer Polytechnic Institute Troy, New York

(2002)

C. Carpineto et al.

Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval

Pattern Anal. Mach. Intell., IEEE Trans.

(2012)

T. Dasu et al.

An information-theoretic approach to detecting changes in multi-dimensional data streams

In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications, Citeseer

(2006)

A.P. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm

J. R. Stat. Soc.

(1977)

Cited by (9)

Ensemble clustering using extended fuzzy k-means for cancer data analysis
2021, Expert Systems with Applications
Citation Excerpt :
The performance of the these well-known clustering algorithms is always less than that of the consensus clustering approaches, which is also referred as cluster ensemble. The ensemble clustering methods always produce more stable, robust and accurate results (Khan et al., 2014; Khan et al., 2016; Khan et al., 2018; Monti et al., 2003; Yu et al., 2007; Yu and Wong, 2009). For instance, Yu et al. (2007) proposed a graph-based consensus clustering (GCC) algorithm perform cluster analysis on cancer data.
Clustering analysis is a significant research topic in discovering cancer using different profiles of gene expression, which is very important to successfully diagnose and treat the cancer decease. Many ensemble clustering methods have been developed to perform clustering using tumor data. Only few of them incorporates a significant number of input clusterings, the optimal number of clusters in each input clustering, and an appropriate ensemble method to combine input clusterings into a final clustering. In this paper, we introduce two new steps in the standard fuzzy k-means algorithm to determine the optimal number of input clusterings, and the optimal number of clusters in each clustering for ensemble clustering. The first one is to incorporate a penalty term for making the algorithm insensitive to the initialization of cluster centroids. The second one is to automate a clustering process for iteratively updating the feature weights. This step addresses the noise values in the dataset. We propose an ensemble clustering method, which combines a set of input clusterings into a final clustering having better overall quality. Experiments on real cancer gene expression profiles illustrate that the proposed algorithm outperformed the well-known clustering algorithms.
Recognition and classification of typical load profiles in buildings with non-intrusive learning approach
2019, Applied Energy
Citation Excerpt :
Benchmarking the energy usage in the time domain, through load profiling, is then crucial also for the impact assessment of DMSs and DR initiatives [39,40]. The information about shape and magnitude of electrical power consumption patterns can reveal useful knowledge [41] about building energy flexibility potential and/or in some cases the presence of multiple typical patterns (e.g., seasonality, intra-week variation)[27]. From the design point of view, the in-depth characterization of the energy demand makes it possible to better address the current transition from large centralized generation plants to multi-energy distributed ones that are capable to provide, from different sources, energy at a small scale (e.g., neighborhood) when it is needed [31].
The recent increasing spread of Advanced Metering Infrastructure (AMI) has enabled the collection of a huge amount of building related-data which can be exploited by both energy suppliers and users to gain insight on energy consumption patterns. In this context, data analytics-based methodologies can play a key role for performing advanced characterization, benchmarking and classification of buildings according to their typical energy use in the time domain. Traditionally, energy customers are classified according to their building end-use category. However, buildings belonging to the same category can exhibit very different energy patterns making ineffective this kind of a-priori categorization. For this reason, load profiling frameworks have been developed in the last decade to identify homogenous groups of buildings with similar daily energy profiles. The present study proposes a non-intrusive customer classification process, which does not use as predictive attributes in-field load monitoring data for the classification of unknown customers, but rather monthly energy bills and additional information on customers’ habits collected by means of a phone survey. The proposed classification process is developed by analysing hourly energy consumption data of 114 electrical customers of an Italian Energy Provider. The representative daily load profiles are grouped using the “Follow the Leader” clustering algorithm and a globally optimal decision tree is employed to build a supervised classification model. The model, compared to a baseline recursive partitioning tree, leads to an increase of accuracy of about 6%. Eventually, the procedure exploits energy bill data also for estimating the magnitude of typical load profiles.
Entropy in Fuzzy k-Means Algorithm for Multi-view Data
2023, Lecture Notes in Networks and Systems
Variable weighting in fuzzy k-Means clustering to determine the number of clusters
2020, IEEE Transactions on Knowledge and Data Engineering
Neural Network-Based Deep Encoding for Mixed-Attribute Data Classification
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Topological Clustering via Adaptive Resonance Theory with Information Theoretic Learning
2019, IEEE Access

View all citing articles on Scopus

View full text