Elsevier

Information Sciences

Volume 429, March 2018, Pages 332-348
Information Sciences

CPLP: An algorithm for tracking the changes of power consumption patterns in load profile data over time

https://doi.org/10.1016/j.ins.2017.11.006Get rights and content

Abstract

In this paper, we propose a novel algorithm for tracking the Changes of Patterns in Load Profile (CPLP) data of factories. CPLP consists of two stages. The first stage is to cluster the load profiles in each time window and use the clusters to model the power consumption patterns. We propose a new ensemble clustering method to cluster the load profiles in consecutive time windows. It uses a hierarchical binary k-means algorithm to generate component clusterings and a new objective function to ensemble them to produce the final clustering. The second stage is to track the changes of patterns along the time windows. We propose a new method to detect the change of clusters from one window to the next one by using the distribution models of two related clusters in two neighboring windows. By using this method, we can link the clusters in the sequence of time windows to track the patterns. Experiments on synthetic and real-world load profile data have shown that the proposed algorithm was able to track the changes of power consumption patterns of different factory groups and identify the period of significant change, which are very useful for the smart grid applications.

Introduction

Load profile data provide information about power consumption behavior of selected customers over a given period. A load profile is a sequence of power consumption values measured by a smart meter at specified time intervals. A set of load profiles from selected customers can be used to analyze power consumption patterns and their changes over time. By tracking the changes of power consumption patterns from load profile data of factories, we can detect variations of power consumption in manufacturing processes, understand the work patterns of factories, and predict the productivity of factories. The information about the changes of power consumption patterns can find numerous applications in the power distribution industry such as determining optimal tariff rates, load forecasting, and energy demand management.

Clusters of load profiles in a given time window represent the power consumption patterns of users in the period. These patterns reflect power consumption behavior of users. Understanding the changes of power consumption patterns over time is very important for smart grid management. Since load profile data are streaming data, the approaches in concept drifting analysis of data streams can be adopted to analyze the changes of power consumption patterns in load profile data [6], [30].

In this paper, we present a concept drift approach for tracking the changes of power consumption patterns of manufacturing factories over time. The load profiles of selected factories are represented as a matrix where each row is a load profile of a factory, and each column is a set of power consumption measurements at a time slot. The matrix is vertically divided into a sequence of sub-matrices, each one representing the sub-load profiles of factories in a time window. Given this data representation, tracking the changes of power consumption patterns is achieved by finding the patterns in each time window and the changes of patterns along the sequence of windows. When solving this task, there are two challenges. The first one is to estimate the number of power consumption patterns and other one is to define the procedure for change detection [19].

This work is dedicated to proposing an algorithm for tracking the Changes of Patterns in Load Profiles (CPLP) of factories. The algorithm involves two steps. In the first step, we propose an ensemble clustering method to cluster the load profiles in each time window. A hierarchical binary k-means algorithm is used to generate multiple component clusterings. The output of this algorithm is a tree of clusters which represents the component clustering. We define a new stopping criteria for the tree generation process. By changing the input parameter value of tree generation, we generate multiple component clusterings from the same load profile data in a time window. We use a new objective function to ensemble the multiple component clusterings into a single clustering. In the second step, we track the changes of power consumption patterns along the time windows by using the obtained clustering results in all time windows. We define a new method to model two related clusters and check the change of distributions of two models. Two clusters are related if they have the maximum intersection of objects in neighboring windows. We define the survival pairs of clusters along time windows by obtained changes of distributions of related clusters to analyze the changes of power consumption patterns.

We present a series of experiments conducted on both synthetic and real data. Experiments on synthetic data have shown that the CPLP algorithm significantly outperformed the well-known state-of-the-art algorithms in almost all experiments. The experiments on real-world load profile data set demonstrated that the CPLP algorithm is able to track the changes of power consumption patterns of different factory groups and identify the period of significant change. The load profile data contains over 20,000 load profiles collected from manufacturing industries in Guangdong province of China in a period of six months in 2012. The power consumption measurements were collected at 15-minute interval. The results from this data are very important for the applications in the power distribution industry. The obtained results are useful to recommend good products to users and satisfy users demands as for as possible.

The rest of this paper is organized as follows. Section 2 introduces the load profile data and the research problem. Section 3 provides the related work. Section 4 presents the ensemble clustering method for load profile data. Section 5 presents the method for tracking the changes of power consumption patterns. Section 6 illustrates the experiments on synthetic data . Experimental results on real-world data are shown in Section 7. Some concluding remarks are given in Section 8.

Section snippets

Load profile data and research problem

The load profile data of N factories and J time slot measurements are represented as where each row of X represents a load profile of one factory and each column is a set of N power consumption measurements at a time slot. The element xij is the measurement of the ith factory at the jth time slot. By vector representation, X can be represented as

where Xi is the vector of load profile of factory i and Yj is the vector of consumption values of N factories at time slot j. The sequence of Y

Related work

Change detection in data streams has been widely investigated due to its broad application potential in all walks of science and technology, for example, fraud detection, market analysis, medical condition monitoring, and network traffic control [1]. Many studies have been applied to generate approaches for choosing, sampling, splitting, growing, and shrinking the distributions of the data in windows for optimal change detection [4]. For example, the Kullback–Leibler (KL) [9] divergence is used

Ensemble clustering method

In this section, we present a new ensemble clustering method to discover power consumption patterns modeled as clusters in each time window. We first describe a hierarchical binary k-means algorithm for generating the component clusterings. The advantage of this algorithm is that it does not require the number of clusters to be known in advance. Then, we present a new objective function for generating the clustering ensemble from the component clusterings.

Method for tracking the changes of power consumption patterns

Given the sequence of sub-matrices W1,W2,,WB in B consecutive time windows, we can use the ensemble clustering method discussed in Section 3 to generate B ensemble clusterings λ1, λ2, ... , λB, one for each time window. Let λfi and λfi+1 be two ensemble clusterings of load profiles in two neighboring time windows Wi and Wi+1, each consisting of a set of clusters. For any cluster C1 from λfi, we can find a set of clusters in λfi+1 which have intersections of load profiles with C1, as shown in

Experiments on synthetic data

The motivation for development of the CPLP algorithm is to cluster noisy power consumption data and compute the change between two data distributions. Synthetic data is often used to validate a clustering algorithm [24]. To better understand the properties of the CPLP algorithm, synthetic data with different structures containing data noise were first used to investigate the performance of ensemble clustering method on clustering accuracy in comparison with other ensemble clustering algorithms,

Experimental results on real-world data

In this section, we conducted a series of experiments on real-world load profile data to verify the performance of the CPLP algorithm. At first, a description is given to the load profile data and experiment settings. Then, the experiment results are provided with a detailed discussion.

Conclusion

In this paper, we have presented a new CPLP algorithm for tracking the changes of power consumption patterns of load profile data of factories. It contains two parts. In the first part, we have used a new ensemble clustering method to cluster the load profile data of each time window and use the clusters to model the power consumption patterns. We have developed a new hierarchical binary k-means algorithm to generate high quality component clusterings without sacrificing too much diversity.

Acknowledgments

This work was supported by the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284), and National Natural Science Foundation of China (NSFC) Grant No. 61750110536. This work was partially supported by GDNSF fund (2015A030313782), and SUSTechStarup fund (Y01236215).

References (32)

  • A. Albert et al.

    Smart meter driven segmentation: what your consumption says about you, power systems

    IEEE Trans.

    (2013)
  • A. Bifet et al.

    Learning from time–changing data with adaptive windowing

    SDM

    (2007)
  • A. Bivens et al.

    Network-based intrusion detection using neural networks

    Department of Computer Science Rensselaer Polytechnic Institute Troy, New York

    (2002)
  • C. Carpineto et al.

    Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval

    Pattern Anal. Mach. Intell., IEEE Trans.

    (2012)
  • T. Dasu et al.

    An information-theoretic approach to detecting changes in multi-dimensional data streams

    In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications, Citeseer

    (2006)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. R. Stat. Soc.

    (1977)
  • Cited by (9)

    • Ensemble clustering using extended fuzzy k-means for cancer data analysis

      2021, Expert Systems with Applications
      Citation Excerpt :

      The performance of the these well-known clustering algorithms is always less than that of the consensus clustering approaches, which is also referred as cluster ensemble. The ensemble clustering methods always produce more stable, robust and accurate results (Khan et al., 2014; Khan et al., 2016; Khan et al., 2018; Monti et al., 2003; Yu et al., 2007; Yu and Wong, 2009). For instance, Yu et al. (2007) proposed a graph-based consensus clustering (GCC) algorithm perform cluster analysis on cancer data.

    • Recognition and classification of typical load profiles in buildings with non-intrusive learning approach

      2019, Applied Energy
      Citation Excerpt :

      Benchmarking the energy usage in the time domain, through load profiling, is then crucial also for the impact assessment of DMSs and DR initiatives [39,40]. The information about shape and magnitude of electrical power consumption patterns can reveal useful knowledge [41] about building energy flexibility potential and/or in some cases the presence of multiple typical patterns (e.g., seasonality, intra-week variation)[27]. From the design point of view, the in-depth characterization of the energy demand makes it possible to better address the current transition from large centralized generation plants to multi-energy distributed ones that are capable to provide, from different sources, energy at a small scale (e.g., neighborhood) when it is needed [31].

    • Entropy in Fuzzy k-Means Algorithm for Multi-view Data

      2023, Lecture Notes in Networks and Systems
    • Variable weighting in fuzzy k-Means clustering to determine the number of clusters

      2020, IEEE Transactions on Knowledge and Data Engineering
    • Neural Network-Based Deep Encoding for Mixed-Attribute Data Classification

      2019, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text