Abstract
This paper tackles a new challenge in power big data: how to improve the precision of short-term load forecasting with large-scale data set. The proposed load forecasting method is based on Spark platform and “clustering–regression” model, which is implemented by Apache Spark machine learning library (MLlib). Proposed scheme firstly clustering the users with different electrical attributes and then obtains the “load characteristic curve of each cluster”, which represents the features of various types of users and is considered as the properties of a regional total load. Furthermore, the “clustering–regression” model is used to forecast the power load of the certain region. Extensive experiments show that the proposed scheme can predict reasonably the short-term power load and has excellent robustness. Comparing with the single-alone model, the proposed method has a higher efficiency in dealing with large-scale data set and can be effectively applied to the power load forecasting.
Similar content being viewed by others
References
Cai, Y., et al.: Modeling and impact analysis of interdependent characteristics on cascading failures in smart grids. Int. J. Electr. Power Energy Syst. 89, 106–114 (2017)
Verma, V., Kumar, A.: Cascaded multilevel active rectifier fed three-phase smart pump load on single-phase rural feeder. IEEE Trans. Power Electr. 32(7), 5398–5410 (2017)
ZhenYa, L.: Global Energy Internet. China Electric Power Press, Beijing (2015)
ZhenYa, L.: Technology of Smart Grid. China Electric Power Press, Beijing (2010)
Song, D., Liu, X.: Medium and long-term electric power planning load forecasting based on variable weights gray model. In: Huang, B., Yao, Y. (eds.) Proceedings of the 5th International Conference on Electrical Engineering and Automatic Control, pp. 137–144 (2016)
Hassan, S., et al.: A systematic design of interval type-2 fuzzy logic system using extreme learning machine for electricity load demand forecasting. Int. J. Electr. Power Energy Syst. 82, 1–10 (2016)
Soudari, M., et al.: Learning based personalized energy management systems for residential buildings. Energy Build. 127, 953–968 (2016)
Lei, S.L., Sun, C.X., Zhou, X.X.: The research of local linear model of short term electrical load on multivariate time series. Proceedings of the CSEE 26(2), 5 (2006)
Hu, R., et al.: A short-term power load forecasting model based on the generalized regression neural network with decreasing step fruit fly optimization algorithm. Neurocomputing 221, 24–31 (2017)
Khwaja, A.S., et al.: Boosted neural networks for improved short-term electric load forecasting. Electr. Power Syst. Res. 143, 431–437 (2017)
Liang, Y., et al.: Short-term load forecasting based on wavelet transform and least squares support vector machine optimized by improved cuckoo search. Energies 9(12), 827 (2016)
Dudek, G.: Short-term load forecasting using random forests. In: Filev, D., et al. (eds.) Intelligent Systems. Architectures, Systems Applications, pp. 821–828. Springer, Cham (2015)
Lee, C.-W., Lin, B.-Y.: Application of hybrid quantum Tabu search with support vector regression (SVR) for load forecasting. Energies 9(11), 873 (2016)
Spark, A.:. http://spark.apache.org (2017)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10, 95 (2010)
Lin, F.: Research and Implementation of Memory Optimization Based on Parallel Computing Engine Spark. Tsinghua University, Beijing (2013)
Rodrigues, L.M., et al.: Parallel and distributed Kmeans to identify the translation initiation site of proteins. In: Proceedings 2012 IEEE International Conference on Systems, Man, and Cybernetics, pp. 1639–1645 (2012)
Yang, Y.: An improved cop-kmeans clustering for solving constraint violation based on mapreduce framework. Fundam. Inf. 126(4), 301–318 (2013)
Pandagale, A.A., Surve, A.R.: IEEE: Hadoop-HBase for finding association rules using Apriori MapReduce algarithm. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (Rteict), pp. 795–798 (2016)
Lee, K.C., , and , Development of detection system of vocal tic symptoms using SVM algorithm in Spark. Database Res. 32(3), pp. 115–127 (2016)
Wang, B., Wang, D., Zhang, S.: Distributed short-term load forecasting algorithm based on Spark and IPPSO-LSSVM. Electr. Power Autom. Equip. 36(1), 117–122 (2016)
Ma Tiannan, N.X., Huang, Y.: Short-term load forecasting for distributed energy system based on Spark platform and multi-variable L2-boosting regression model. Power Syst. Technol. 40(6), 8 (2016)
Xie, M., Ji, D.J.L.X.: Cooling load forecasting method based on support vector machine optimized with entropy and variable accuracy roughness set. Power Syst. Technol. 41(1), 5 (2017)
Yaslan, Y., Bican, B.: Empirical mode decomposition based denoising method with support vector regression for time series prediction: a case study for electricity load forecasting. Measurement 103, 52–61 (2017)
Meng, X., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17, 1235–1241 (2016)
Siegal, D., et al., Smart-MLlib: a high-performance machine-learning library. In 2016 IEEE International Conference on Cluster Computing, pp. 336–345 (2016)
Zhang, F., et al.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. lust. Comput. J. Netw. Softw. Tools Appl. 18(4), 1493–1501 (2015)
Zaharia, M.: Apache Spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Sepasi, S.: Very short term load forecasting of a distribution system with high PV penetration. Renew. Energy 106, 142–148 (2017)
Zhang, S., Liu, J., Zhao, B., et al.: Cloud computing-based analysis on residential electricity consumption behavior. Power Syst. Technol. 37(6), 1542–1546 (2013)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, New York (2006)
Huang, M.: Spark MLlib Machine Learning: Algorithm, Source Code and Practical. Publishing House of Electronics Industry, Beijing (2016)
Gonzalez, C., Mira-McWilliams, J., Juarez, I.: Important variable assessment and electricity price forecasting based on regression tree models: classification and regression trees, Bagging and Random Forests. LET Gener. Transm. Distrib. 9(11), 1120–1128 (2015)
Huang, N., Lu, D., Xu, D.: A permutation importance-based feature selection method for short-term electricity load forecasting using random forest. Energies 9(10), 767 (2016)
Lahouar, A., Slama, J.B.H.: Day-ahead load forecast using random forest and expert input selection. Energy Convers. Manag. 103, 1040–1051 (2015)
Acknowledgements
This work was supported by National Natural Science Foundation of China under Grants (Nos. 61472236, 61672337, 61602295, and 61562020), Natural Science Foundation of Shanghai (No. 16ZR1413100), and the Excellent University Young Teachers Training Program of Shanghai Municipal Education Commission (No. ZZsdl15105).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lei, J., Jin, T., Hao, J. et al. Short-term load forecasting with clustering–regression model in distributed cluster. Cluster Comput 22 (Suppl 4), 10163–10173 (2019). https://doi.org/10.1007/s10586-017-1198-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1198-4