Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes

Hu, Cheng; Deng, Yuhui

doi:10.1007/s11227-018-2366-x

Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes

Published: 06 April 2018

Volume 75, pages 662–687, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Cheng Hu¹ &
Yuhui Deng^1,2

312 Accesses
12 Citations
Explore all metrics

Abstract

Under the circumstance of big data, traditional storage systems face the big challenge of energy consumption. Switching some storage nodes, which do not experience workloads, to a low-power state is a typical approach to reduce the consumption of energy. This method divides the storage nodes into an active group and a low-power one. That is, the frequently accessed data are stored into the active group which maintains the nodes in an active state to offer service, and the cold data accessed infrequently are stored into the low-power group. The storage nodes in this low-power group are normally called cold nodes, because they can be switched to a low-power state to save energy for a certain amount of time. In cold nodes, one fact, which is often neglected, is that the placement of cold data has a significant impact on the system performance and power consumption. To some extent, switching a storage node from a low-power state to an active state incurs a crucial delay and energy consumption. This paper proposes to aggregate and store the correlated cold data in the same cold node within the low-power group. Now that the correlated data are normally accessed together, our approach can greatly reduce the number of power state transitions and lengthen the idle periods that the cold nodes experience. On the other hand, it can also minimize the performance degradation and power consumption. Experimental results demonstrate that this method effectively reduces the energy consumption while maintaining system performance at an acceptable level in contrast to some state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ColdStore: A Storage System for Archival Data

Article 11 December 2019

An Energy-Aware File Relocation Strategy Based on File-Access Frequency and Correlations

Cheap Data Analytics on Cold Storage

Notes

Twitter Blog, “Dispatch from the Denver Debate.” http://blog.twitter.com/2012/10/dispatch-from-denver-debate.html.
IBM, “What Is Big Data” http://www-01.ibm.com/software/data/bigdata.
IDC, “The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things.” http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm.

References

Hu C, Yuhui D (2015) An energy-aware file relocation strategy based on file-access frequency and correlations. In: Proceedings of the 15th International Conference on Algorithms and Architectures for Parallel Processing, Springer, pp 640–653
Scardapane S, Wang D, Panella M (2016) A decentralized training algorithm for echo state networks in distributed big data applications. Neural Netw 78:65–74
Article Google Scholar
Brown R (2008) Report to congress on server and data center energy efficiency: public law 109-431, Lawrence Berkeley National Laboratory
Wan J, Qu X, Wang J, Xie C (2015) ThinRAID: thinning down RAID array for energy conservation. IEEE Trans Parallel Distrib Syst 26(10):2903–2915
Article Google Scholar
Pinheiro E, Bianchini R, Carrera EV, Heath T (2003) Dynamic cluster reconfiguration for power and performance. In: Benini L, Kandemir M, Ramanujam J (eds) Compilers and operating systems for low power. Springer, Berlin, pp 75–93
Chapter Google Scholar
Thereska E, Donnelly A, Narayanan D (2011) Sierra: practical power-proportionality for data center storage. In: Proceedings of the Sixth Conference on Computer Systems, ACM, pp 169–182
Entrialgo J, Medrano R, Garca DF, Garca J (2015) Autonomic power management with self-healing in server clusters under QoS constraints. Computing 98(9):1–24
MathSciNet Google Scholar
Maccio VJ, Down DG (2015) On optimal policies for energy-aware servers. Perform Eval 90:36–52
Article Google Scholar
Ferreira AM, Pernici B (2016) Managing the complex data center environment: an integrated energy-aware framework. Computing 96(7):709–749
Article MathSciNet Google Scholar
Chase JS, Anderson DC, Thakar PN, Vahdat AM, Doyle RP (2001) Managing energy and server resources in hosting centers. ACM SIGOPS Oper Syst Rev 35(5):103–116
Article Google Scholar
Krioukov A et al (2011) Napsac: design and implementation of a power-proportional web cluster. ACM SIGCOMM Comput Commun Rev 41(1):102–108
Article Google Scholar
Okamura H, Miyata S, Dohi T (2016) A markov decision process approach to dynamic power management in a cluster system. IEEE Access 3:3039–3047
Article Google Scholar
Deng Y, Hu Y, Meng X, Zhu Y, Zhang Z, Han J (2014) Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Clust Comput 17(4):1309–1322
Article Google Scholar
Zhang L, Deng Y, Zhu W, Peng J, Wang F (2015) Skewly replicating hot data to construct a power-efficient storage cluster. J Netw Comput Appl 50:168–179
Article Google Scholar
EMC VNX Virtual Provisioning Applied Technology, White Paper, EMC Corporation (2013)
Staelin C, Garcia-Molina H (1990) Clustering active disk data to improve disk performance. Technical Report CSTR-283-90, Department of Computer Science, Princeton University
Cherkasova L, Ciardo G (2000) Characterizing temporal locality and its impact on web server performance. Technical Report HPL-2000-82, Hewlett Packard Laboratories, July 2000
Gomez ME, Santonja V (2002) Characterizing temporal locality in I/O workload. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems
Pareto Principle, http://en.wikipedia.org/wiki/Pareto_principle
Narayanan D, Donnelly A, Rowstron A (2008) Write off-loading: practical power management for enterprise storage. ACM Trans Storage 4(3):256–267
Article Google Scholar
Weddle C et al (2007) PARAID: a gear-shifting power-aware RAID. ACM Trans Storage 3(3):13
Article Google Scholar
Mao B et al (2008) GRAID: a green RAID storage architecture with improved energy efficiency and reliability. In: Proceedings of the 16th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2008), IEEE
Bui DM, Nguyen HQ, Yoon Y, Jun S, Amin MB, Lee S (2015) Gaussian process for predicting CPU utilization and its application to energy efficiency. Appl Intell 43(4):874–891
Article Google Scholar
Deng Y (2011) What is the future of disk drives, death or rebirth? ACM Comput Surv 43(3):23
Article Google Scholar
Patterson DA, Gibson G, Katz RH (1988) A case for redundant arrays of inexpensive disks (RAID). In: Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD ’88), pp 109–116
Tait CD, Duchamp D (1991) Detection and exploitation of file working sets. In: Proceedings of the 11th International Conference on Distributed Computing Systems, pp 2–9
Lei H, Duchamp D (1997) An analytical approach to file prefetching. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference (UATEC ’97)
Kroeger TM, Long DDE (1999) The case for efficient file access pattern modeling. In: Proceedings of the 7th Workshop on Hot Topics in Operating Systems, IEEE, pp 14–19
Kroeger TM, Long DDE (2001) Design and implementation of a predictive file prefetching algorithm. In: Proceedings of the General Track: 2001 USENIX Annual Technical Conference, pp 105–118
Ishii Y, Inaba M, Hiraki K (2011) Access map pattern matching for high performance data cache prefetch. J Instr Level Parallelism 13:1–24
Google Scholar
Wu Y, Otagiri K, Watanabe Y, Yokota H (2011) A file search method based on intertask relationships derived from access frequency and rmc operations on files. In: Proceedings of the 22nd International Conference on Database and Expert Systems Applications (DEXA ’11), pp 364–378
He J, Sun XH, Thakur R (2012) Knowac, I/O prefetch via accumulated knowledge. In: Proceedings of the 2012 IEEE International Conference on CLUSTER Computing, pp 429–437
Jiang S, Ding X, Xu Y, Davis K (2013) A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage 9(3):1–23
Article Google Scholar
Xia P, Feng D, Jiang H, Tian L, Wang F (2008) FARMER: a novel approach to file access correlations mining and evaluation reference model for optimizing peta-scale file system performance. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, ACM
Agrawal R, Imieliski T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216
Article Google Scholar
Iritani M, Yokota H (2012) Effects on performance and energy reduction by file relocation based on file-access correlations. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops (EDBT-ICDT ’12), ACM, pp 79–86
Aye KN, Thein T (2015) A platform for big data analytics on distributed scale-out storage system. Int J Big Data Intell 2(2):127–141
Article Google Scholar
Lin W, Wu W, Wang H, Wang JZ, Hsu CH (2016) Experimental and quantitative analysis of server power model for cloud data centers. Future Gener Comput Syst. https://doi.org/10.1016/j.future.2016.11.034
Google Scholar
Sarwesh P et al (2017) Effective integration of reliable routing mechanism and energy efficient node placement technique for low power IoT networks. Int J Grid High Perform Comput 9(4):16–35
Article Google Scholar
Xie J, Deng Y, Min G, Zhou Y (2017) An incrementally scalable and cost-efficient interconnection structure for datacenters. IEEE Trans Parallel Distrib Syst 28(6):1578–1592
Article Google Scholar
Deng Y (2009) Deconstructing network attached storage systems. J Netw Comput Appl 32(5):1064–1072
Article Google Scholar
Li Z, Chen Z, Srinivasan SM, Zhou Y (2004) C-Miner: mining block correlations in storage systems. In: Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST ’04), pp 173–186

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation (NSF) of China under Grant (No. 61572232), in part by the Science and Technology Planning Project of Guangzhou under Grant 201604016100, in part by the Science and Technology Planning Project of Nansha (2016CX007), in part by the Open Research Fund of Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCH201705). The corresponding author is Yuhui Deng from Jinan University.

Author information

Authors and Affiliations

Department of Computer Science, Jinan University, Guangzhou, 510632, China
Cheng Hu & Yuhui Deng
State Key Laboratory of Computer Architecture, Institute of Computing, Chinese Academy of Sciences, Beijing, 100190, China
Yuhui Deng

Authors

Cheng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhui Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuhui Deng.

Additional information

A preliminary version of this paper [1] appears in the Proceedings of the 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP2015). We markedly elaborate the motivations and related work, expand the relevant algorithms, broaden the architecture analysis and enrich the experiments in this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, C., Deng, Y. Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes. J Supercomput 75, 662–687 (2019). https://doi.org/10.1007/s11227-018-2366-x

Download citation

Published: 06 April 2018
Issue Date: 06 February 2019
DOI: https://doi.org/10.1007/s11227-018-2366-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes

Abstract

Access this article

Similar content being viewed by others

ColdStore: A Storage System for Archival Data

An Energy-Aware File Relocation Strategy Based on File-Access Frequency and Correlations

Cheap Data Analytics on Cold Storage

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes

Abstract

Access this article

Similar content being viewed by others

ColdStore: A Storage System for Archival Data

An Energy-Aware File Relocation Strategy Based on File-Access Frequency and Correlations

Cheap Data Analytics on Cold Storage

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation