Skip to main content

Advertisement

Log in

Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Under the circumstance of big data, traditional storage systems face the big challenge of energy consumption. Switching some storage nodes, which do not experience workloads, to a low-power state is a typical approach to reduce the consumption of energy. This method divides the storage nodes into an active group and a low-power one. That is, the frequently accessed data are stored into the active group which maintains the nodes in an active state to offer service, and the cold data accessed infrequently are stored into the low-power group. The storage nodes in this low-power group are normally called cold nodes, because they can be switched to a low-power state to save energy for a certain amount of time. In cold nodes, one fact, which is often neglected, is that the placement of cold data has a significant impact on the system performance and power consumption. To some extent, switching a storage node from a low-power state to an active state incurs a crucial delay and energy consumption. This paper proposes to aggregate and store the correlated cold data in the same cold node within the low-power group. Now that the correlated data are normally accessed together, our approach can greatly reduce the number of power state transitions and lengthen the idle periods that the cold nodes experience. On the other hand, it can also minimize the performance degradation and power consumption. Experimental results demonstrate that this method effectively reduces the energy consumption while maintaining system performance at an acceptable level in contrast to some state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Twitter Blog, “Dispatch from the Denver Debate.” http://blog.twitter.com/2012/10/dispatch-from-denver-debate.html.

  2. IBM, “What Is Big Data” http://www-01.ibm.com/software/data/bigdata.

  3. IDC, “The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things.” http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm.

References

  1. Hu C, Yuhui D (2015) An energy-aware file relocation strategy based on file-access frequency and correlations. In: Proceedings of the 15th International Conference on Algorithms and Architectures for Parallel Processing, Springer, pp 640–653

  2. Scardapane S, Wang D, Panella M (2016) A decentralized training algorithm for echo state networks in distributed big data applications. Neural Netw 78:65–74

    Article  Google Scholar 

  3. Brown R (2008) Report to congress on server and data center energy efficiency: public law 109-431, Lawrence Berkeley National Laboratory

  4. Wan J, Qu X, Wang J, Xie C (2015) ThinRAID: thinning down RAID array for energy conservation. IEEE Trans Parallel Distrib Syst 26(10):2903–2915

    Article  Google Scholar 

  5. Pinheiro E, Bianchini R, Carrera EV, Heath T (2003) Dynamic cluster reconfiguration for power and performance. In: Benini L, Kandemir M, Ramanujam J (eds) Compilers and operating systems for low power. Springer, Berlin, pp 75–93

    Chapter  Google Scholar 

  6. Thereska E, Donnelly A, Narayanan D (2011) Sierra: practical power-proportionality for data center storage. In: Proceedings of the Sixth Conference on Computer Systems, ACM, pp 169–182

  7. Entrialgo J, Medrano R, Garca DF, Garca J (2015) Autonomic power management with self-healing in server clusters under QoS constraints. Computing 98(9):1–24

    MathSciNet  Google Scholar 

  8. Maccio VJ, Down DG (2015) On optimal policies for energy-aware servers. Perform Eval 90:36–52

    Article  Google Scholar 

  9. Ferreira AM, Pernici B (2016) Managing the complex data center environment: an integrated energy-aware framework. Computing 96(7):709–749

    Article  MathSciNet  Google Scholar 

  10. Chase JS, Anderson DC, Thakar PN, Vahdat AM, Doyle RP (2001) Managing energy and server resources in hosting centers. ACM SIGOPS Oper Syst Rev 35(5):103–116

    Article  Google Scholar 

  11. Krioukov A et al (2011) Napsac: design and implementation of a power-proportional web cluster. ACM SIGCOMM Comput Commun Rev 41(1):102–108

    Article  Google Scholar 

  12. Okamura H, Miyata S, Dohi T (2016) A markov decision process approach to dynamic power management in a cluster system. IEEE Access 3:3039–3047

    Article  Google Scholar 

  13. Deng Y, Hu Y, Meng X, Zhu Y, Zhang Z, Han J (2014) Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Clust Comput 17(4):1309–1322

    Article  Google Scholar 

  14. Zhang L, Deng Y, Zhu W, Peng J, Wang F (2015) Skewly replicating hot data to construct a power-efficient storage cluster. J Netw Comput Appl 50:168–179

    Article  Google Scholar 

  15. EMC VNX Virtual Provisioning Applied Technology, White Paper, EMC Corporation (2013)

  16. Staelin C, Garcia-Molina H (1990) Clustering active disk data to improve disk performance. Technical Report CSTR-283-90, Department of Computer Science, Princeton University

  17. Cherkasova L, Ciardo G (2000) Characterizing temporal locality and its impact on web server performance. Technical Report HPL-2000-82, Hewlett Packard Laboratories, July 2000

  18. Gomez ME, Santonja V (2002) Characterizing temporal locality in I/O workload. In: Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems

  19. Pareto Principle, http://en.wikipedia.org/wiki/Pareto_principle

  20. Narayanan D, Donnelly A, Rowstron A (2008) Write off-loading: practical power management for enterprise storage. ACM Trans Storage 4(3):256–267

    Article  Google Scholar 

  21. Weddle C et al (2007) PARAID: a gear-shifting power-aware RAID. ACM Trans Storage 3(3):13

    Article  Google Scholar 

  22. Mao B et al (2008) GRAID: a green RAID storage architecture with improved energy efficiency and reliability. In: Proceedings of the 16th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2008), IEEE

  23. Bui DM, Nguyen HQ, Yoon Y, Jun S, Amin MB, Lee S (2015) Gaussian process for predicting CPU utilization and its application to energy efficiency. Appl Intell 43(4):874–891

    Article  Google Scholar 

  24. Deng Y (2011) What is the future of disk drives, death or rebirth? ACM Comput Surv 43(3):23

    Article  Google Scholar 

  25. Patterson DA, Gibson G, Katz RH (1988) A case for redundant arrays of inexpensive disks (RAID). In: Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data (SIGMOD ’88), pp 109–116

  26. Tait CD, Duchamp D (1991) Detection and exploitation of file working sets. In: Proceedings of the 11th International Conference on Distributed Computing Systems, pp 2–9

  27. Lei H, Duchamp D (1997) An analytical approach to file prefetching. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference (UATEC ’97)

  28. Kroeger TM, Long DDE (1999) The case for efficient file access pattern modeling. In: Proceedings of the 7th Workshop on Hot Topics in Operating Systems, IEEE, pp 14–19

  29. Kroeger TM, Long DDE (2001) Design and implementation of a predictive file prefetching algorithm. In: Proceedings of the General Track: 2001 USENIX Annual Technical Conference, pp 105–118

  30. Ishii Y, Inaba M, Hiraki K (2011) Access map pattern matching for high performance data cache prefetch. J Instr Level Parallelism 13:1–24

    Google Scholar 

  31. Wu Y, Otagiri K, Watanabe Y, Yokota H (2011) A file search method based on intertask relationships derived from access frequency and rmc operations on files. In: Proceedings of the 22nd International Conference on Database and Expert Systems Applications (DEXA ’11), pp 364–378

  32. He J, Sun XH, Thakur R (2012) Knowac, I/O prefetch via accumulated knowledge. In: Proceedings of the 2012 IEEE International Conference on CLUSTER Computing, pp 429–437

  33. Jiang S, Ding X, Xu Y, Davis K (2013) A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage 9(3):1–23

    Article  Google Scholar 

  34. Xia P, Feng D, Jiang H, Tian L, Wang F (2008) FARMER: a novel approach to file access correlations mining and evaluation reference model for optimizing peta-scale file system performance. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, ACM

  35. Agrawal R, Imieliski T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Record 22(2):207–216

    Article  Google Scholar 

  36. Iritani M, Yokota H (2012) Effects on performance and energy reduction by file relocation based on file-access correlations. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops (EDBT-ICDT ’12), ACM, pp 79–86

  37. Aye KN, Thein T (2015) A platform for big data analytics on distributed scale-out storage system. Int J Big Data Intell 2(2):127–141

    Article  Google Scholar 

  38. Lin W, Wu W, Wang H, Wang JZ, Hsu CH (2016) Experimental and quantitative analysis of server power model for cloud data centers. Future Gener Comput Syst. https://doi.org/10.1016/j.future.2016.11.034

    Google Scholar 

  39. Sarwesh P et al (2017) Effective integration of reliable routing mechanism and energy efficient node placement technique for low power IoT networks. Int J Grid High Perform Comput 9(4):16–35

    Article  Google Scholar 

  40. Xie J, Deng Y, Min G, Zhou Y (2017) An incrementally scalable and cost-efficient interconnection structure for datacenters. IEEE Trans Parallel Distrib Syst 28(6):1578–1592

    Article  Google Scholar 

  41. Deng Y (2009) Deconstructing network attached storage systems. J Netw Comput Appl 32(5):1064–1072

    Article  Google Scholar 

  42. Li Z, Chen Z, Srinivasan SM, Zhou Y (2004) C-Miner: mining block correlations in storage systems. In: Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST ’04), pp 173–186

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation (NSF) of China under Grant (No. 61572232), in part by the Science and Technology Planning Project of Guangzhou under Grant 201604016100, in part by the Science and Technology Planning Project of Nansha (2016CX007), in part by the Open Research Fund of Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (CARCH201705). The corresponding author is Yuhui Deng from Jinan University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhui Deng.

Additional information

A preliminary version of this paper [1] appears in the Proceedings of the 15th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP2015). We markedly elaborate the motivations and related work, expand the relevant algorithms, broaden the architecture analysis and enrich the experiments in this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, C., Deng, Y. Aggregating correlated cold data to minimize the performance degradation and power consumption of cold storage nodes. J Supercomput 75, 662–687 (2019). https://doi.org/10.1007/s11227-018-2366-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2366-x

Keywords

Navigation