CPLP: An algorithm for tracking the changes of power consumption patterns in load profile data over time
Introduction
Load profile data provide information about power consumption behavior of selected customers over a given period. A load profile is a sequence of power consumption values measured by a smart meter at specified time intervals. A set of load profiles from selected customers can be used to analyze power consumption patterns and their changes over time. By tracking the changes of power consumption patterns from load profile data of factories, we can detect variations of power consumption in manufacturing processes, understand the work patterns of factories, and predict the productivity of factories. The information about the changes of power consumption patterns can find numerous applications in the power distribution industry such as determining optimal tariff rates, load forecasting, and energy demand management.
Clusters of load profiles in a given time window represent the power consumption patterns of users in the period. These patterns reflect power consumption behavior of users. Understanding the changes of power consumption patterns over time is very important for smart grid management. Since load profile data are streaming data, the approaches in concept drifting analysis of data streams can be adopted to analyze the changes of power consumption patterns in load profile data [6], [30].
In this paper, we present a concept drift approach for tracking the changes of power consumption patterns of manufacturing factories over time. The load profiles of selected factories are represented as a matrix where each row is a load profile of a factory, and each column is a set of power consumption measurements at a time slot. The matrix is vertically divided into a sequence of sub-matrices, each one representing the sub-load profiles of factories in a time window. Given this data representation, tracking the changes of power consumption patterns is achieved by finding the patterns in each time window and the changes of patterns along the sequence of windows. When solving this task, there are two challenges. The first one is to estimate the number of power consumption patterns and other one is to define the procedure for change detection [19].
This work is dedicated to proposing an algorithm for tracking the Changes of Patterns in Load Profiles (CPLP) of factories. The algorithm involves two steps. In the first step, we propose an ensemble clustering method to cluster the load profiles in each time window. A hierarchical binary k-means algorithm is used to generate multiple component clusterings. The output of this algorithm is a tree of clusters which represents the component clustering. We define a new stopping criteria for the tree generation process. By changing the input parameter value of tree generation, we generate multiple component clusterings from the same load profile data in a time window. We use a new objective function to ensemble the multiple component clusterings into a single clustering. In the second step, we track the changes of power consumption patterns along the time windows by using the obtained clustering results in all time windows. We define a new method to model two related clusters and check the change of distributions of two models. Two clusters are related if they have the maximum intersection of objects in neighboring windows. We define the survival pairs of clusters along time windows by obtained changes of distributions of related clusters to analyze the changes of power consumption patterns.
We present a series of experiments conducted on both synthetic and real data. Experiments on synthetic data have shown that the CPLP algorithm significantly outperformed the well-known state-of-the-art algorithms in almost all experiments. The experiments on real-world load profile data set demonstrated that the CPLP algorithm is able to track the changes of power consumption patterns of different factory groups and identify the period of significant change. The load profile data contains over 20,000 load profiles collected from manufacturing industries in Guangdong province of China in a period of six months in 2012. The power consumption measurements were collected at 15-minute interval. The results from this data are very important for the applications in the power distribution industry. The obtained results are useful to recommend good products to users and satisfy users demands as for as possible.
The rest of this paper is organized as follows. Section 2 introduces the load profile data and the research problem. Section 3 provides the related work. Section 4 presents the ensemble clustering method for load profile data. Section 5 presents the method for tracking the changes of power consumption patterns. Section 6 illustrates the experiments on synthetic data . Experimental results on real-world data are shown in Section 7. Some concluding remarks are given in Section 8.
Section snippets
Load profile data and research problem
The load profile data of N factories and J time slot measurements are represented as where each row of X represents a load profile of one factory and each column is a set of N power consumption measurements at a time slot. The element xij is the measurement of the ith factory at the jth time slot. By vector representation, X can be represented as
where Xi is the vector of load profile of factory i and Yj is the vector of consumption values of N factories at time slot j. The sequence of Y
Related work
Change detection in data streams has been widely investigated due to its broad application potential in all walks of science and technology, for example, fraud detection, market analysis, medical condition monitoring, and network traffic control [1]. Many studies have been applied to generate approaches for choosing, sampling, splitting, growing, and shrinking the distributions of the data in windows for optimal change detection [4]. For example, the Kullback–Leibler (KL) [9] divergence is used
Ensemble clustering method
In this section, we present a new ensemble clustering method to discover power consumption patterns modeled as clusters in each time window. We first describe a hierarchical binary k-means algorithm for generating the component clusterings. The advantage of this algorithm is that it does not require the number of clusters to be known in advance. Then, we present a new objective function for generating the clustering ensemble from the component clusterings.
Method for tracking the changes of power consumption patterns
Given the sequence of sub-matrices in B consecutive time windows, we can use the ensemble clustering method discussed in Section 3 to generate B ensemble clusterings λ1, λ2, ... , λB, one for each time window. Let λfi and be two ensemble clusterings of load profiles in two neighboring time windows Wi and each consisting of a set of clusters. For any cluster C1 from λfi, we can find a set of clusters in which have intersections of load profiles with C1, as shown in
Experiments on synthetic data
The motivation for development of the CPLP algorithm is to cluster noisy power consumption data and compute the change between two data distributions. Synthetic data is often used to validate a clustering algorithm [24]. To better understand the properties of the CPLP algorithm, synthetic data with different structures containing data noise were first used to investigate the performance of ensemble clustering method on clustering accuracy in comparison with other ensemble clustering algorithms,
Experimental results on real-world data
In this section, we conducted a series of experiments on real-world load profile data to verify the performance of the CPLP algorithm. At first, a description is given to the load profile data and experiment settings. Then, the experiment results are provided with a detailed discussion.
Conclusion
In this paper, we have presented a new CPLP algorithm for tracking the changes of power consumption patterns of load profile data of factories. It contains two parts. In the first part, we have used a new ensemble clustering method to cluster the load profile data of each time window and use the clusters to model the power consumption patterns. We have developed a new hierarchical binary k-means algorithm to generate high quality component clusterings without sacrificing too much diversity.
Acknowledgments
This work was supported by the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284), and National Natural Science Foundation of China (NSFC) Grant No. 61750110536. This work was partially supported by GDNSF fund (2015A030313782), and SUSTechStarup fund (Y01236215).
References (32)
- et al.
Mudi-stream: a multi density clustering algorithm for evolving data stream
J. Network Comput. Appl.
(2016) - et al.
Trend analysis of categorical data streams with a concept change method
Inf. Sci.
(2014) Overview and performance assessment of the clustering methods for electrical load pattern grouping
Energy
(2012)- et al.
Hierarchical cluster ensemble model based on knowledge granulation
Knowl. Based Syst.
(2016) - et al.
Stratified feature sampling method for ensemble clustering of high dimensional data
Pattern Recognit.
(2015) - et al.
Incremental density-based ensemble clustering over evolving data streams
Neurocomputing
(2016) - et al.
The validation of four ultrametric clustering algorithms
Pattern Recognit.
(1980) - et al.
Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values
Knowl. Based Syst.
(2016) - et al.
Clinical charge profiles prediction for patients diagnosed with chronic diseases using multi-level support vector machine
Expert Syst. Appl.
(2012) Data Streams: Models and Algorithms, Vol. 31
(2007)
Smart meter driven segmentation: what your consumption says about you, power systems
IEEE Trans.
Learning from time–changing data with adaptive windowing
SDM
Network-based intrusion detection using neural networks
Department of Computer Science Rensselaer Polytechnic Institute Troy, New York
Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval
Pattern Anal. Mach. Intell., IEEE Trans.
An information-theoretic approach to detecting changes in multi-dimensional data streams
In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications, Citeseer
Maximum likelihood from incomplete data via the EM algorithm
J. R. Stat. Soc.
Cited by (9)
Ensemble clustering using extended fuzzy k-means for cancer data analysis
2021, Expert Systems with ApplicationsCitation Excerpt :The performance of the these well-known clustering algorithms is always less than that of the consensus clustering approaches, which is also referred as cluster ensemble. The ensemble clustering methods always produce more stable, robust and accurate results (Khan et al., 2014; Khan et al., 2016; Khan et al., 2018; Monti et al., 2003; Yu et al., 2007; Yu and Wong, 2009). For instance, Yu et al. (2007) proposed a graph-based consensus clustering (GCC) algorithm perform cluster analysis on cancer data.
Recognition and classification of typical load profiles in buildings with non-intrusive learning approach
2019, Applied EnergyCitation Excerpt :Benchmarking the energy usage in the time domain, through load profiling, is then crucial also for the impact assessment of DMSs and DR initiatives [39,40]. The information about shape and magnitude of electrical power consumption patterns can reveal useful knowledge [41] about building energy flexibility potential and/or in some cases the presence of multiple typical patterns (e.g., seasonality, intra-week variation)[27]. From the design point of view, the in-depth characterization of the energy demand makes it possible to better address the current transition from large centralized generation plants to multi-energy distributed ones that are capable to provide, from different sources, energy at a small scale (e.g., neighborhood) when it is needed [31].
Entropy in Fuzzy k-Means Algorithm for Multi-view Data
2023, Lecture Notes in Networks and SystemsVariable weighting in fuzzy k-Means clustering to determine the number of clusters
2020, IEEE Transactions on Knowledge and Data EngineeringNeural Network-Based Deep Encoding for Mixed-Attribute Data Classification
2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)