Heuristic attribute reduction and resource-saving algorithm for energy data of data centers

Chen, Mincheng; Yuan, Jingling; Li, Lin; Liu, Dongling; He, Yang

doi:10.1007/s10115-018-1288-5

Heuristic attribute reduction and resource-saving algorithm for energy data of data centers

Regular Paper
Published: 17 December 2018

Volume 61, pages 277–299, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Mincheng Chen¹,
Jingling Yuan¹,
Lin Li¹,
Dongling Liu¹ &
…
Yang He¹

368 Accesses
4 Citations
Explore all metrics

Abstract

Energy data, which consist of energy consumption statistics and other related data in green data centers, grow dramatically. The energy data have great value, but many attributes within them are redundant and unnecessary, and they have a serious impact on the performance of the data center’s decision-making system. Thus, attribute reduction for the energy data has been conceived as a critical step. However, many existing attribute reduction algorithms are often computationally time-consuming. To address these issues, firstly, we extend the methodology of rough sets to construct data center energy consumption knowledge representation system. Energy data will occur some degree of exceptions caused by power failure, energy instability or other factors; hence, we design an integrated data preprocessing method using Spark for energy data, which mainly includes sampling analysis, data classification, missing data filling, outlier data prediction and data discretization. By taking good advantage of in-memory computing, a fast heuristic attribute reduction algorithm (FHARA-S) for energy data using Spark is proposed. In this algorithm, we use an efficient algorithm for transforming energy consumption decision table, a heuristic formula for measuring the significance of attribute to reduce the search space, and introduce the correlation between condition attribute and decision attribute, which further improve the computational efficiency. We also design an adaptive decision management architecture for the green data center based on FHARA-S, which can improve decision-making efficiency and strengthen energy management. The experimental results show the speed of our algorithm gains up to 2.2X performance improvement over the traditional attribute reduction algorithm using MapReduce and 0.61X performance improvement over the algorithm using Spark. Besides, our algorithm also saves more computational resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Article 12 July 2021

Intelligent energy management systems: a review

Article Open access 13 March 2023

Big Data Analytics in Weather Forecasting: A Systematic Review

Article 28 June 2021

References

Anderson MR, Cafarella M (2016) Input selection for fast feature engineering. In: 2016 IEEE 32nd international conference on data engineering (ICDE). IEEE, pp 577–588
Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, et al (2015) Spark sql: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1383–1394
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Article Google Scholar
Chen D, Yang Y, Dong Z (2016a) An incremental algorithm for attribute reduction with variable precision rough sets. Appl Soft Comput 45:129–149
Article Google Scholar
Chen H, Li T, Cai Y, Luo C, Fujita H (2016b) Parallel attribute reduction in dominance-based neighborhood rough set. Inf Sci 373:351–368
Article Google Scholar
Chen M, Yuan J, Li L, Liu D, Li T (2017) A fast heuristic attribute reduction algorithm using spark. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 2393–2398
Chen YS, Cheng CH (2010) Forecasting pgr of the financial industry using a rough sets classifier based on attribute-granularity. Knowledge and information systems 25(1):57–79
Article Google Scholar
Chen YS, Cheng CH (2013) Application of rough set classifiers for determining hemodialysis adequacy in esrd patients. Knowl Inf Syst 34(2):453–482
Article Google Scholar
Czolombitko M, Stepaniuk J (2016) Attribute reduction based on mapreduce model and discernibility measure. In: IFIP International conference on computer information systems and industrial management. Springer, pp 55–66
Ding W, Lin CT, Chen S, Zhang X, Hu B (2018) Multiagent-consensus-mapreduce-based attribute reduction using co-evolutionary quantum pso for big data applications. Neurocomputing 272:136–153
Article Google Scholar
El-Alfy ESM, Alshammari MA (2016) Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in mapreduce. Simul Model Pract Theory 64:18–29
Article Google Scholar
Fiandrino C, Kliazovich D, Bouvry P, Zomaya AY (2015) Performance and energy efficiency metrics for communication systems of cloud computing data centers. IEEE Trans Cloud Comput 1–1
García S, Luengo J, Herrera F (2016) Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl Based Syst 98:1–29
Article Google Scholar
Hu J, Pedrycz W, Wang G, Wang K (2016) Rough sets in distributed decision information systems. Knowl Based Syst 94(C):13–22
Article Google Scholar
Hu Q, Zhang L, Zhou Y, Pedrycz W (2018) Large-scale multimodality attribute reduction with multi-kernel fuzzy rough sets. IEEE Trans Fuzzy Syst 26(1):226–238
Article Google Scholar
Iquebal AS, Pal A, Ceglarek D, Tiwari MK (2014) Enhancement of mahalanobis-taguchi system via rough sets based feature selection. Expert Syst Appl 41(17):8003–8015
Article Google Scholar
Jiang F, Sui Y (2015) A novel approach for discretization of continuous attributes in rough set theory. Knowl Based Syst 73:324–334
Article Google Scholar
Jing Y, Li T, Fujita H, Yu Z, Wang B (2017) An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view. Inf Sci 411:23–38
Article MathSciNet Google Scholar
Khayyat Z, Ilyas IF, Jindal A, Madden S, Ouzzani M, Papotti P, Quiané-Ruiz JA, Tang N, Yin S (2015) Bigdansing: a system for big data cleansing. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1215–1230
Ko YC, Fujita H, Tzeng GH (2013) A fuzzy integral fusion approach in analyzing competitiveness patterns from wcy2010. Knowl Based Syst 49:1–9
Article Google Scholar
Li C, Qouneh A, Li T (2012) iswitch: coordinating and optimizing renewable energy powered server clusters. In: 2012 39th annual international symposium on computer architecture (ISCA). IEEE, pp 512–523
Li C, Hu Y, Zhou R, Liu M, Liu L, Yuan J, Li T (2013a) Enabling datacenter servers to scale out economically and sustainably. In: Proceedings of the 46th annual IEEE/ACM international symposium on microarchitecture. ACM, pp 322–333
Li C, Zhou R, Li T (2013b) Enabling distributed generation powered sustainable high-performance data center. In: 2013 IEEE 19th international symposium on high performance computer architecture (HPCA2013). IEEE, pp 35–46
Liang J, Wang F, Dang C, Qian Y (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53(6):912–926
Article MathSciNet Google Scholar
Liang J, Wang F, Dang C, Qian Y (2014) A group incremental approach to feature selection applying rough set technique. IEEE Trans Knowl Data Eng 26(2):294–308
Article Google Scholar
Liu G, Shen H (2016) Minimum-cost cloud storage service across multiple cloud providers. In: 2016 IEEE 36th international conference on distributed computing systems (ICDCS). IEEE, pp 129–138
Lu Z, Qin Z, Zhang Y, Fang J (2014) A fast feature selection approach based on rough set boundary regions. Pattern Recognit Lett 36(1):81–88
Article Google Scholar
Ma Y, Yu X, Niu Y (2015) A parallel heuristic reduction based approach for distribution network fault diagnosis. Int J Electr Power Energy Syst 73:548–559
Article Google Scholar
Ouyang X, Irwin D, Shenoy P (2016) Spotlight: An information service for the cloud. In: 2016 IEEE 36th international conference on distributed computing systems (ICDCS). IEEE, pp 425–436
Pacheco F, Cerrada M, Sánchez RV, Cabrera D, Li C, de Oliveira JV (2017) Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Syst Appl 71:69–86
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Parallel Program 11(5):341–356
MATH Google Scholar
Pawlak Z, Skowron A (2007) Rough sets: some extensions. Inf Sci 177(1):28–40
Article MathSciNet MATH Google Scholar
Qian J, Miao D, Zhang Z, Yue X (2014) Parallel attribute reduction algorithms using mapreduce. Inf Sci 279:671–690
Article MathSciNet MATH Google Scholar
Qian J, Lv P, Yue X, Liu C, Jing Z (2015) Hierarchical attribute reduction algorithms for big data using mapreduce. Knowl Based Syst 73:18–31
Article Google Scholar
Ramírez-Gallego S, García S, Mouriño-Talín H, Martínez-Rego D, Bolón-Canedo V, Alonso-Betanzos A, Benítez JM, Herrera F (2016) Data discretization: taxonomy and big data challenge. Wiley Interdiscip Rev Data Min Knowl Discov 6(1):5–21
Article Google Scholar
Song S, Zhu H, Wang J (2016) Constraint-variance tolerant data repairing. In: Proceedings of the 2016 ACM SIGMOD international conference on management of data. ACM, pp 877–892
Venkataraman S, Yang Z, Liu D, Liang E, Falaki H, Meng X, Xin R, Ghodsi A, Franklin M, Stoica I, Zaharia M (2016) Sparkr: scaling r programs with spark. In: Proceedings of the 2016 ACM SIGMOD international conference on management of data. ACM, pp 1099–1104
Wang F, Liang J (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193(C):3341
Google Scholar
Wang X, Wang T, Junhai Z (2012) An attribute reduction algorithm based on instance selection. J Comput Res Dev 49(11):2305–2310
Google Scholar
Wei W, Liang J, Qian Y, Wang F (2009) An attribute reduction approach and its accelerated version for hybrid data. In: IEEE international conference on cognitive informatics (ICCI 2009), 15–17 June, 2009, Hong Kong, China, pp 167–173
Xie X, Qin X (2018) A novel incremental attribute reduction approach for dynamic incomplete decision systems. Int J Approx Reason 93:443–462
Article MathSciNet MATH Google Scholar
Xu Z, Liu Z, Yang b, wei S (2006) A quick attribute reduction algorithm with complexity of max \((o(|c||u|),o(|c|^2|u/c|))\). Chin J Comput 29(3):391–399
Google Scholar
Yuan J, Zhong L, Yang G, Chen M, Gu J, Li T (2015) Towards filling and classification of incomplete energy big data for green data centers. Chin J Comput 38(12):2499–2516
Google Scholar
Yuan J, Chen M, Jiang T, Li T (2017) Complete tolerance relation based parallel filling for incomplete energy big data. Knowl Based Syst 132:215–225
Article Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 2–2
Zhang CJ, Chen L, Tong Y, Liu Z (2015a) Cleaning uncertain data with a noisy crowd. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 6–17
Zhang J, Li T, Pan Y (2013) Plar: Parallel large-scale attribute reduction on cloud systems. In: International conference on parallel and distributed computing, applications and technologies, pp 184–191
Zhang J, Li T, Chen H (2014a) Composite rough sets for dynamic data mining. Inf Sci 257:81–100
Article MathSciNet MATH Google Scholar
Zhang J, Wong JS, Li T, Pan Y (2014b) A comparison of parallel large-scale knowledge acquisition using rough set theory on different mapreduce runtime systems. Int J Approx Reason 55(3):896–907
Article Google Scholar
Zhang J, Wong JS, Pan Y, Li T (2015b) A parallel matrix-based method for computing approximations in incomplete information systems. IEEE Trans Knowl Data Eng 27(2):326–339
Article Google Scholar
Zheng K, Hu J, Zhan Z, Ma J, Qi J (2014) An enhancement for heuristic attribute reduction algorithm in rough set. Expert Syst Appl 41(15):6748–6754
Article Google Scholar
Zliobaite I, Gabrys B (2014) Adaptive preprocessing for streaming data. IEEE Trans Knowl Data Eng 26(2):309–321
Article Google Scholar

Download references

Acknowledgements

This research project is supported by the National Natural Science Foundation of China (Grant No: 61303029), National Social Science Foundation of China (Grant No: 15BGL048), Hubei Province Science and Technology Support Project (Grant No: 2015BAA072), the Fund for Creative Research Group of the Key Natural Science Foundation of Hubei Province of China (Grant No: 2017CFA012), the Key Technical Innovation Project of Hubei (Grant No: 2017AAA122).

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430070, China
Mincheng Chen, Jingling Yuan, Lin Li, Dongling Liu & Yang He

Authors

Mincheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jingling Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Dongling Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yang He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingling Yuan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, M., Yuan, J., Li, L. et al. Heuristic attribute reduction and resource-saving algorithm for energy data of data centers. Knowl Inf Syst 61, 277–299 (2019). https://doi.org/10.1007/s10115-018-1288-5

Download citation

Received: 27 February 2018
Revised: 24 May 2018
Accepted: 28 November 2018
Published: 17 December 2018
Issue Date: 01 October 2019
DOI: https://doi.org/10.1007/s10115-018-1288-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heuristic attribute reduction and resource-saving algorithm for energy data of data centers

Abstract

Access this article

Similar content being viewed by others

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Intelligent energy management systems: a review

Big Data Analytics in Weather Forecasting: A Systematic Review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Heuristic attribute reduction and resource-saving algorithm for energy data of data centers

Abstract

Access this article

Similar content being viewed by others

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Intelligent energy management systems: a review

Big Data Analytics in Weather Forecasting: A Systematic Review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation