Adaptive metadata rebalance in exascale file system

Cha, Myung-Hoon; Kim, Dong-Oh; Kim, Hong-Yeon; Kim, Young-Kyun

doi:10.1007/s11227-016-1812-x

Adaptive metadata rebalance in exascale file system

Published: 09 July 2016

Volume 73, pages 1337–1359, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Myung-Hoon Cha ORCID: orcid.org/0000-0001-8817-2365¹,
Dong-Oh Kim¹,
Hong-Yeon Kim¹ &
…
Young-Kyun Kim¹

390 Accesses
8 Citations
Explore all metrics

Abstract

This paper presents an effective method of metadata rebalance in exascale distributed file systems. Exponential data growth has led to the need for an adaptive and robust distributed file system whose typical architecture is composed of a large cluster of metadata servers and data servers. Though each metadata server can have an equally divided subset from the entire metadata set at first, there will eventually be a global imbalance in the placement of metadata among metadata servers, and this imbalance worsens over time. To ensure that disproportionate metadata placement will not have a negative effect on the intrinsic performance of a metadata server cluster, it is necessary to recover the balanced performance of the cluster periodically. However, this cannot be easily done because rebalancing seriously hampers the normal operation of a file system. This situation continues to get worse with both an ever-present heavy workload on the file system and frequent failures of server components at exascale. As one of the primary reasons for such a degraded performance, file system clients frequently fail to look up metadata from the metadata server cluster during the period of metadata rebalance; thus, metadata operations cannot proceed at their normal speed. We propose a metadata rebalance model that minimizes failures of metadata operations during the metadata rebalance period and validate the proposed model through a cost analysis. The analysis results demonstrate that our model supports the feasibility of online metadata rebalance without the normal operation obstruction and increases the chances of maintaining balance in a huge cluster of metadata servers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective metadata management in exascale file system

Article 22 August 2019

Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

Article 16 September 2015

HaRD: a heterogeneity-aware replica deletion for HDFS

Article Open access 21 October 2019

References

Borthakur D (2013) HDFS architecture guide. The Apache Software Foundation. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 4 Aug 2013
Konstantin S, Hairong K, Sanjay R, Robert C (2010) The hadoop distributed file system. In: Proceedings of the $26{\rm th}$ IEEE Symposium on Mass Storage Systems and Technologies (MSST ’10), pp 1–10
Oracle (2010) Lustre 2.0 operations manual. Oracle corporation. https://docs.oracle.com/cd/E19527-01/821-2076-10/821-2076-10.pdf. Accessed July 2010
Shepler S, Eisler M, Noveck D (2010) Network file system version 4 minor version 1 protocol. Internet Engineering Task Force. https://tools.ietf.org/html/rfc5661. Accessed Jan 2010
Brent W, Marc U, Zainul A, Garth G, Brian M, Jason S, Jim Z, Bin Z (2008) Scalable performance of the panasas parallel file system. In: Proceedings of the $6{\rm th}$ USENIX Conference on File and Storage Technologies (FAST ’08), pp 17–33
Julian MK, Thomas L (2007) Performance evaluation of the PVFS2 architecture. In: Proceedings of the $15{\rm th}$ EUROMICRO International Conference on Parallel, Distributed and Network-based Processing (PDP’07), pp 509–516
Sage AW (2007) Ceph: reliable, scalable, and high-performance distributed storage. Doctoral dissertation, University of California
Sage AW, Scott AB, Ethan LM, Darrell DEL, Carlos M (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the $7{\rm th}$ USENIX Symposium on Operating Systems Design and Implementation (OSDI ’06), pp 307–320
Dingshan H (2006) Data management in intelligent storage systems. Doctoral dissertation, University of Minnesota
Sanjay G, Howard G, Leung ST (2003) The Google File System. In: Proceedings of ACM Symposium on Operating Systems Principles (SOSP’03), pp 29–43
Sadaf RA, Hussein NEH, Kristopher H, Neil S, Fabio V (2011) Parallel I/O and the metadata wall. In: Proceedings of the $6{\rm th}$ Workshop on Parallel Data Storage (PDSW’11), pp 13–18
Alexander T, Daniel JA (2015) CalvinFS: consistent WAN replication and scalable metadata management for distributed file Systems. In: Proceedings of the $13{\rm th}$ USENIX Conference on File and Storage Technologies (FAST ’15), pp 1–14
Hua Y, Zhu Y, Jiang H, Feng D, Tian L (2011) Supporting Scalable and Adaptive Metadata Management in Ultralarge-scale File Systems. IEEE Trans Parallel Distrib Syst (TPDS) 22(4):580–593
Article Google Scholar
Vilobh M, Xavier B, Xiangyong O, Raghunath R, Ravi PD, Dhabaleswar KP (2011) Can a decentralized metadata service layer benefit parallel filesystems? In: Proceedings of IEEE International Conference on Cluster Computing (CLUSTER’11), pp 484–493
Yifeng Z, Hong J, Jun W, Feng X (2008) HBA: distributed metadata management for large cluster-based storage systems. IEEE Trans Parallel Distrib Syst (TPDS) 19(6):750–763
Article Google Scholar
Sage AW, Kristal TP, Scott AB, Ethan LM (2004) Dynamic metadata management for petabyte-scale file systems. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing (SC’04)
Scott AB, Ethan LM, Darrell DEL, Lan X (2003) Efficient metadata management in large distributed storage systems. In: Proceedings of the $20{\rm th}$ IEEE/$11{\rm th}$ NASA Goddard Conference on Mass Storage Systems and Technologies (MSS’03), pp 290–298
Xiong J, Hu Y, Li G, Tang R, Fan Z (2011) Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Trans Parallel Distrib Syst (TPDS) 22(5):803–816
Article Google Scholar
Xiao L, Ren K, Zheng Q, Gibson GA (2015) ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems. In: Proceedings of the $6{\rm th}$ ACM Symposium on Cloud Computing (SoCC‘15), pp 236–249
Praveen Y, Suman N, Haifeng Y, Phillip BG, Srinivasan S (2004) Beyond availability: towards a deeper understanding of machine failure characteristics in large distributed systems. In: Proceedings of the Workshop on Real, large Distributed Systems (WORLDS’04)
Hsiao HC, Chung HY, Shen H, Chao YC (2013) Load balancing for distributed file systems in clouds. IEEE Trans Parallel Distrib Syst (TPDS) 24(5):951–962
Article Google Scholar
Hsiao HC, Chang CW (2013) A symmetric load balancing algorithm with performance guarantees for distributed hash tables. IEEE Trans Comput 62(4):662–675
Article MathSciNet Google Scholar
Hsiao HC, Liao H, Chen ST, Huang KC (2011) Load balance with imperfect information in structured peer-to-peer systems. IEEE Trans Parallel Distrib Syst (TPDS) 22(4):634–649
Article Google Scholar
Sean R, Dennis G, Timothy R, John K (2004) Handling churn in a DHT. In: Proceedings of the USENIX Annual Technical Conference (ATEC‘04), pp 127–140
Jeff S, Radek V, Bart S, Ben H, Chad W, Eric R, Mircea O, Kyle L, David M, Stephan E, John C, Ian R, Traian S, Himani A (2013) F1: a distributed SQL database that scales. In: Proceedings of the VLDBEndowment 6(11), pp 1068–1079
James CC, Jeffrey D, Michael E, Andrew F, Christopher F, JJ F, Sanjay G, Andrey G, Christopher H, Peter H, Wilson H, Sebastian K, Eugene K, Hongyi L, Alexander L, Sergey M, David M, David N, Sean Q, Rajesh R, Lindsay R, Yasushi S, Michal S, Christopher T, Ruth W, Dale W (2013) Spanner: Google’s globally distributed database. ACM Trans Comput Syst 31(3):1–22
Kim CK, Sedlar E, Chhugani J, Kaldewey T, Nguyen AD, Blas AD, Lee VW, Satish N, Dubey P (2009) Sort vs. hash revisited: fast join implementation on modern multi-core cpus. In: Proceedings of the VLDB Endowment 2(2), pp 1378–1389
Polychroniou O, Ross KA (2014) A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: Proceedings of the 2014 ACM SIGMOD International conference on Management of Data. Snowbird, UT, pp 755–766
Wu L, Polychroniou O, Barker RJ, Kim MA, Ross KA (2014) Energy analysis of hardware and software range partitioning. ACM Trans Comput Syst 32(3):1–24
Mai HT, Park KH, Lee HS, Kim CS, Lee MY, Hur SJ (2014) Dynamic data migration in hybrid main memories for in-memory big data storage. ETRI J 36(6):988–998
Article Google Scholar

Download references

Acknowledgments

This work was supported by Institute for Information and communications Technology Promotion(IITP) Grant Funded by the Korea government(MSIP) (No.R0126-15-1082, Management of Developing ICBMS(IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology).

Author information

Authors and Affiliations

Storage System Research Section, High Performance Computing Research Department, Electronics and Telecommunications Research Institute, Daejeon, Korea
Myung-Hoon Cha, Dong-Oh Kim, Hong-Yeon Kim & Young-Kyun Kim

Authors

Myung-Hoon Cha
View author publications
You can also search for this author inPubMed Google Scholar
Dong-Oh Kim
View author publications
You can also search for this author inPubMed Google Scholar
Hong-Yeon Kim
View author publications
You can also search for this author inPubMed Google Scholar
Young-Kyun Kim
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Myung-Hoon Cha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cha, MH., Kim, DO., Kim, HY. et al. Adaptive metadata rebalance in exascale file system. J Supercomput 73, 1337–1359 (2017). https://doi.org/10.1007/s11227-016-1812-x

Download citation

Published: 09 July 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11227-016-1812-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive metadata rebalance in exascale file system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effective metadata management in exascale file system

Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

HaRD: a heterogeneity-aware replica deletion for HDFS

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now