Skip to main content
Log in

Short-term load forecasting with clustering–regression model in distributed cluster

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This paper tackles a new challenge in power big data: how to improve the precision of short-term load forecasting with large-scale data set. The proposed load forecasting method is based on Spark platform and “clustering–regression” model, which is implemented by Apache Spark machine learning library (MLlib). Proposed scheme firstly clustering the users with different electrical attributes and then obtains the “load characteristic curve of each cluster”, which represents the features of various types of users and is considered as the properties of a regional total load. Furthermore, the “clustering–regression” model is used to forecast the power load of the certain region. Extensive experiments show that the proposed scheme can predict reasonably the short-term power load and has excellent robustness. Comparing with the single-alone model, the proposed method has a higher efficiency in dealing with large-scale data set and can be effectively applied to the power load forecasting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Cai, Y., et al.: Modeling and impact analysis of interdependent characteristics on cascading failures in smart grids. Int. J. Electr. Power Energy Syst. 89, 106–114 (2017)

    Article  Google Scholar 

  2. Verma, V., Kumar, A.: Cascaded multilevel active rectifier fed three-phase smart pump load on single-phase rural feeder. IEEE Trans. Power Electr. 32(7), 5398–5410 (2017)

    Article  MathSciNet  Google Scholar 

  3. ZhenYa, L.: Global Energy Internet. China Electric Power Press, Beijing (2015)

    Google Scholar 

  4. ZhenYa, L.: Technology of Smart Grid. China Electric Power Press, Beijing (2010)

    Google Scholar 

  5. Song, D., Liu, X.: Medium and long-term electric power planning load forecasting based on variable weights gray model. In: Huang, B., Yao, Y. (eds.) Proceedings of the 5th International Conference on Electrical Engineering and Automatic Control, pp. 137–144 (2016)

  6. Hassan, S., et al.: A systematic design of interval type-2 fuzzy logic system using extreme learning machine for electricity load demand forecasting. Int. J. Electr. Power Energy Syst. 82, 1–10 (2016)

    Article  Google Scholar 

  7. Soudari, M., et al.: Learning based personalized energy management systems for residential buildings. Energy Build. 127, 953–968 (2016)

    Article  Google Scholar 

  8. Lei, S.L., Sun, C.X., Zhou, X.X.: The research of local linear model of short term electrical load on multivariate time series. Proceedings of the CSEE 26(2), 5 (2006)

    Google Scholar 

  9. Hu, R., et al.: A short-term power load forecasting model based on the generalized regression neural network with decreasing step fruit fly optimization algorithm. Neurocomputing 221, 24–31 (2017)

    Article  Google Scholar 

  10. Khwaja, A.S., et al.: Boosted neural networks for improved short-term electric load forecasting. Electr. Power Syst. Res. 143, 431–437 (2017)

    Article  Google Scholar 

  11. Liang, Y., et al.: Short-term load forecasting based on wavelet transform and least squares support vector machine optimized by improved cuckoo search. Energies 9(12), 827 (2016)

    Article  Google Scholar 

  12. Dudek, G.: Short-term load forecasting using random forests. In: Filev, D., et al. (eds.) Intelligent Systems. Architectures, Systems Applications, pp. 821–828. Springer, Cham (2015)

    Google Scholar 

  13. Lee, C.-W., Lin, B.-Y.: Application of hybrid quantum Tabu search with support vector regression (SVR) for load forecasting. Energies 9(11), 873 (2016)

    Article  Google Scholar 

  14. Spark, A.:. http://spark.apache.org (2017)

  15. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10, 95 (2010)

    Google Scholar 

  16. Lin, F.: Research and Implementation of Memory Optimization Based on Parallel Computing Engine Spark. Tsinghua University, Beijing (2013)

    Google Scholar 

  17. Rodrigues, L.M., et al.: Parallel and distributed Kmeans to identify the translation initiation site of proteins. In: Proceedings 2012 IEEE International Conference on Systems, Man, and Cybernetics, pp. 1639–1645 (2012)

  18. Yang, Y.: An improved cop-kmeans clustering for solving constraint violation based on mapreduce framework. Fundam. Inf. 126(4), 301–318 (2013)

    MathSciNet  Google Scholar 

  19. Pandagale, A.A., Surve, A.R.: IEEE: Hadoop-HBase for finding association rules using Apriori MapReduce algarithm. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (Rteict), pp. 795–798 (2016)

  20. Lee, K.C., , and , Development of detection system of vocal tic symptoms using SVM algorithm in Spark. Database Res. 32(3), pp. 115–127 (2016)

  21. Wang, B., Wang, D., Zhang, S.: Distributed short-term load forecasting algorithm based on Spark and IPPSO-LSSVM. Electr. Power Autom. Equip. 36(1), 117–122 (2016)

    Google Scholar 

  22. Ma Tiannan, N.X., Huang, Y.: Short-term load forecasting for distributed energy system based on Spark platform and multi-variable L2-boosting regression model. Power Syst. Technol. 40(6), 8 (2016)

    Google Scholar 

  23. Xie, M., Ji, D.J.L.X.: Cooling load forecasting method based on support vector machine optimized with entropy and variable accuracy roughness set. Power Syst. Technol. 41(1), 5 (2017)

  24. Yaslan, Y., Bican, B.: Empirical mode decomposition based denoising method with support vector regression for time series prediction: a case study for electricity load forecasting. Measurement 103, 52–61 (2017)

    Article  Google Scholar 

  25. Meng, X., et al.: MLlib: machine learning in Apache Spark. J. Mach. Learn. Res. 17, 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  26. Siegal, D., et al., Smart-MLlib: a high-performance machine-learning library. In 2016 IEEE International Conference on Cluster Computing, pp. 336–345 (2016)

  27. Zhang, F., et al.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. lust. Comput. J. Netw. Softw. Tools Appl. 18(4), 1493–1501 (2015)

    Google Scholar 

  28. Zaharia, M.: Apache Spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  29. Sepasi, S.: Very short term load forecasting of a distribution system with high PV penetration. Renew. Energy 106, 142–148 (2017)

    Article  Google Scholar 

  30. Zhang, S., Liu, J., Zhao, B., et al.: Cloud computing-based analysis on residential electricity consumption behavior. Power Syst. Technol. 37(6), 1542–1546 (2013)

    Google Scholar 

  31. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, New York (2006)

    Google Scholar 

  32. Huang, M.: Spark MLlib Machine Learning: Algorithm, Source Code and Practical. Publishing House of Electronics Industry, Beijing (2016)

    Google Scholar 

  33. Gonzalez, C., Mira-McWilliams, J., Juarez, I.: Important variable assessment and electricity price forecasting based on regression tree models: classification and regression trees, Bagging and Random Forests. LET Gener. Transm. Distrib. 9(11), 1120–1128 (2015)

    Article  Google Scholar 

  34. Huang, N., Lu, D., Xu, D.: A permutation importance-based feature selection method for short-term electricity load forecasting using random forest. Energies 9(10), 767 (2016)

    Article  Google Scholar 

  35. Lahouar, A., Slama, J.B.H.: Day-ahead load forecast using random forest and expert input selection. Energy Convers. Manag. 103, 1040–1051 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grants (Nos. 61472236, 61672337, 61602295, and 61562020), Natural Science Foundation of Shanghai (No. 16ZR1413100), and the Excellent University Young Teachers Training Program of Shanghai Municipal Education Commission (No. ZZsdl15105).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiawei Hao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lei, J., Jin, T., Hao, J. et al. Short-term load forecasting with clustering–regression model in distributed cluster. Cluster Comput 22 (Suppl 4), 10163–10173 (2019). https://doi.org/10.1007/s10586-017-1198-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1198-4

Keywords

Navigation