Abstract
Smart electricity meters have been replacing conventional meters worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety of smart meter analytics algorithms and applications have been proposed, mainly in the smart grid literature. However, the focus has been on what can be done with the data rather than how to do it efficiently. In this article, we examine smart meter analytics from a software performance perspective. First, we design a performance benchmark that includes common smart meter analytics tasks. These include offline feature extraction and model building as well as a framework for online anomaly detection that we propose. Second, since obtaining real smart meter data is difficult due to privacy issues, we present an algorithm for generating large realistic datasets from a small seed of real data. Third, we implement the proposed benchmark using five representative platforms: a traditional numeric computing platform (Matlab), a relational DBMS with a built-in machine learning toolkit (PostgreSQL/MADlib), a main-memory column store (“System C”), and two distributed data processing platforms (Hive and Spark/Spark Streaming). We compare the five platforms in terms of application development effort and performance on a multicore machine as well as a cluster of 16 commodity servers.
- J. M. Abreu, F. P. Camara, and P. Ferrao. 2012. Using pattern recognition to identify habitual behavior in residential electricity consumption. Energy and Buildings, 49:479--487. Google ScholarCross Ref
- G. Acs and C. Castelluccia. 2011. I have a DREAM (DiffeRentially privatE smArt Metering). In Conf. on Information Hiding, 118--132. Google ScholarDigital Library
- A. Albert, T. Gebru, J. Ku, J. Kwac, J. Leskovec, and R. Rajagopal. 2013. Drivers of variability in energy consumption. In ECML-PKDD DARE Workshop on Energy Analytics.Google Scholar
- A. Albert and R. Rajagopal. 2013a. Building dynamic thermal profiles of energy consumption for individuals and neighborhoods. In IEEE Big Data Conf., 723--728. Google ScholarCross Ref
- A. Albert and R. Rajagopal. 2013b. Smart meter driven segmentation: What your consumption says about you. IEEE Transactions on Power Systems, 4(28), 4019--4030. Google ScholarCross Ref
- E. Anderson and J. Tucek. 2010. Efficiency matters! SIGOPS Operating Systems Review, 44(1):40--45. Google ScholarDigital Library
- C. Anil. 2013. Benchmarking of data mining techniques as applied to power system analysis. Master’s Thesis, Uppsala University.Google Scholar
- O. Ardakanian, N. Koochakzadeh, R. P. Singh, L. Golab, and S. Keshav. 2014. Computing electricity consumption profiles from household smart meter data. In EnDM Workshop on Energy Data Management, 140--147.Google Scholar
- M. Arlitt, M. Marwah, G. Bellala, A. Shah, J. Healey, and B. Vandiver. 2015. IoTA bench: An internet of things analytics benchmark. In Proc. of the ACM/SPEC Int. Conf. on Performance Engineering. 133--144. Google ScholarDigital Library
- B. J. Birt, G. R. Newsham, I. Beausoleil-Morrison, M. M. Armstrong, N. Saldanha, and I. H. Rowlands. 2012. Disaggregating categories of electrical energy end-use from whole-house hourly data. Energy and Buildings 50:93--102. Google ScholarCross Ref
- N. Bruno and S. Chaudhuri. 2005. Flexible database generators. In Int. Conf. on Very Large Data Bases. 1097--1107. Google ScholarDigital Library
- E. Buchmann, K. Bohm, T. Burghardt, and S. Kessler. 2013. Re-identification of smart meter data. Pers. Ubiqit. Comput. 17(4):653--662. Google ScholarDigital Library
- C. Chen and D. Cook. 2011. Energy outlier detection in smart environments. In AAAI Workshop on Artificial Intelligence and Smarter Living: The Conquest of Complexity. Google ScholarDigital Library
- G. Chicco, R. Napoli, and F. Piglione. 2006. Comparisons among clustering techniques for electricity customer classification. IEEE Trans. on Power Systems, 21(2):933--940. Google ScholarCross Ref
- F. Eichinger, P. Efros, S. Karnouskos, and K. Bohm. 2015. A time-series compression technique and its application to the smart grid. VLDB Journal 24(2):193--218. Google ScholarDigital Library
- Electric Power Research Institute (EPRI). 2013. Big Data Survey Summary ReportGoogle Scholar
- M. Espinoza, C. Joye, R. Belmans, and B. DeMoor. 2005. Short-term load forecasting, profile identification, and customer segmentation: A methodology based on periodic time series. IEEE Trans. on Power Systems, 20(3):1622--1630. Google ScholarCross Ref
- V. Figueiredo, F. Rodrigues, Z. Vale, and J. Gouveia. 2005. An electric energy consumer characterization framework based on data mining techniques. IEEE Trans. on Power Systems, 20(2):596--602. Google ScholarCross Ref
- M. Ghofrani, M. Hassanzadeh, M. Etezadi-Amoli, and M. Fadali. 2011. Smart meter based short-term load forecasting for residential customers. In North American Power Symposium (NAPS’11). Google ScholarCross Ref
- L. Gu, M. Zhou, Z. Zhang, M.-C. Shan, A. Zhou, and M. Winslett. 2015. Chronos: An elastic parallel framework for stream benchmark generation and simulation. In IEEE Int. Conf. on Data Engineering. 101--112. Google ScholarCross Ref
- J. M. Hellerstein, C. Re, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, and A. Kumar. 2012. The MADlib analytics library: Or MAD skills, the SQL. Proc. of the VLDB Endowment, 5(12):1700--1711. Google ScholarDigital Library
- R.-S. Jeng, C.-Y. Kuo, Y.-H. Ho, M.-F. Lee, L.-W. Tseng, C.-L. Fu, P.-F. Liang, and L.-J. Chen. 2013. Missing data handling for meter data management system. In ACM Int. Conf. on Future Energy Systems. 275--276. Google ScholarDigital Library
- E. Keogh and S. Kasetty. 2013. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery (DMKD), 7(4):349--371. Google ScholarDigital Library
- S. Kessler, E. Buchmann, and K. Bohm. 2015. Deploying and evaluating pufferfish privacy for smart meter data. In Proc. Int. Conf. on Ubiquitous Intelligence and Computing (UIC’15). Google ScholarCross Ref
- X. Liu, L. Golab, W. Golab, and I. Ilyas. 2015a. Benchmarking smart meter data analytics. In Int. Conf. on Extending Database Technology. 285--396.Google Scholar
- X. Liu, L. Golab, and I. Ilyas. 2015b. SMAS: A smart meter data analytics system. In IEEE Int. Conf. on Data Engineering. 1476--1479. Google ScholarCross Ref
- Y. Liu, S. Hu, T. Rabl, W. Liu, H.-A. Jacobsen, K. Wu, J. Chen, and J. Li. 2014. DGFIndex for smart grid: Enhancing hive with a cost-effective multidimensional range index. Proc. of the VLDB Endowment 7(13): 1496--1507. Google ScholarDigital Library
- D. Mashima and A. Cardenas. 2012. Evaluating electricity theft detectors in smart grid networks. In Int. Conf. on Research in Attacks, Intrusions and Defenses (RAID’12), 210--229. Google ScholarDigital Library
- F. Mattern, T. Staake, and M. Weiss. 2010. ICT for green - how computers can help us to conserve energy. In ACM Int. Conf. on Future Energy Systems. 1--10. Google ScholarDigital Library
- A. J. Nezhad, T. K. Wijaya, M. Vasirani, and K. Aberer. 2014. SmartD: Smart meter data analytics dashboard. In ACM Int. Conf. on Future Energy Systems. 213--214. Google ScholarDigital Library
- T. Rasanen, D. Voukantsis, H. Niska, K. Karatzas, and M. Kolehmainen. 2010. Data-based method for creating electricity use load profiles using large amount of customer-specific hourly measured electricity use data. Applied Energy, 87(11):3538--3545. Google ScholarCross Ref
- B. A. Smith, J. Wong, and R. Rajagopal. 2012. A simple way to use interval data to segment residential customers for energy efficiency and demand response program targeting. In ACEEE Summer Study on Energy Efficiency in Buildings.Google Scholar
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. 2009. Hive - A warehousing solution over a map-reduce framework. Proc. of the VLDB Endowment 2(2): 1626--1629. Google ScholarDigital Library
- G. Tsekouras, N. Hatziargyriou, and E. Dialynas. 2007. Two-stage pattern recognition of load curves for classification of electricity customers. IEEE Trans. on Power Systems, 22(3):1120--1128. Google ScholarCross Ref
- T. K. Wijaya, J. Eberle, and K. Aberer. 2013. Symbolic representation of smart meter data. In EDBT Workshop on Energy Data Management (EnDM’13), 242--248. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. 2010. Spark: Cluster computing with working sets. In USENIX Conf., 10. Google ScholarDigital Library
- M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. 2012. Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters. In Proc. USENIX Conf. on Hot Topics in Cloud Computing. 10. Google ScholarDigital Library
Index Terms
- Smart Meter Data Analytics: Systems, Algorithms, and Benchmarking
Recommendations
A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
A Performance Study of Big Spatial Data Systems
BigSpatial '18: Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial DataWith the accelerated growth in spatial data volume, being generated from a wide variety of sources, the need for efficient storage, retrieval, processing and analyzing of spatial data is ever more important. Hence, spatial data processing system has ...
Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware
IDEAS '17: Proceedings of the 21st International Database Engineering & Applications SymposiumBig Data is currently conceptualized as data whose volume, variety or velocity impose significant difficulties in traditional techniques and technologies. Big Data Warehousing is emerging as a new concept for Big Data analytics. In this context, SQL-on-...
Comments