Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

Xu, Quanqing; Arumugam, Rajesh Vellore; Yong, Khai Leong; Wen, Yonggang; Ong, Yew-Soon; Xi, Weiya

doi:10.1007/s11704-015-4560-9

Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

Research Article
Published: 16 September 2015

Volume 9, pages 904–918, (2015)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Quanqing Xu¹,
Rajesh Vellore Arumugam¹,
Khai Leong Yong¹,
Yonggang Wen²,
Yew-Soon Ong² &
…
Weiya Xi¹

157 Accesses
13 Citations
Explore all metrics

Abstract

Big data is an emerging term in the storage industry, and it is data analytics on big storage, i.e., Cloud-scale storage. In Cloud-scale (or EB-scale) file systems, load balancing in request workloads across a metadata server cluster is critical for avoiding performance bottlenecks and improving quality of services.Many good approaches have been proposed for load balancing in distributed file systems. Some of them pay attention to global namespace balancing, making metadata distribution across metadata servers as uniform as possible. However, they do not work well in skew request distributions, which impair load balancing but simultaneously increase the effectiveness of caching and replication. In this paper, we propose Cloud Cache (C ²), an adaptive and scalable load balancing scheme for metadata server cluster in EB-scale file systems. It combines adaptive cache diffusion and replication scheme to cope with the request load balancing problem, and it can be integrated into existing distributed metadata management approaches to efficiently improve their load balancing performance. C ² runs as follows: 1) to run adaptive cache diffusion first, if a node is overloaded, loadshedding will be used; otherwise, load-stealing will be used; and 2) to run adaptive replication scheme second, if there is a very popular metadata item (or at least two items) causing a node be overloaded, adaptive replication scheme will be used, in which the very popular item is not split into several nodes using adaptive cache diffusion because of its knapsack property. By conducting performance evaluation in trace-driven simulations, experimental results demonstrate the efficiency and scalability of C ².

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Performance Comparison of Load Balancing in Cloud Computing Techniques

Replica-aware task scheduling and load balanced cache placement for delay reduction in multi-cloud environment

Article 17 November 2018

Performance Efficiency of Cloud Computing—A Literature Review

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Raicu I, Foster I, Beckman P. Making a case for distributed file systems at exascale. In: Proceedings of the 3rd International Workshop on Large-scale System and Application Performance. 2011, 11–18
Google Scholar
Amer A, Long D, and Schwarz T. Reliability challenges for storing exabytes. In: Proceedings of International Conference on Computing, Networking and Communications. 2014, 907–913
Google Scholar
Ousterhout J K, Costa H D, Harrison D, Kunze J A, Kupfer M D, Thompson J G. A trace-driven analysis of the UNIX 4.2 BSD file system. In: Proceedings of ACM Symposium on Operating Systems Principles. 1985, 15–24
Chapter Google Scholar
Zhu Y, Jiang H, Wang J, Xian F. HBA: Distributed metadata management for large cluster-based storage systems. IEEE Transactions on Parallel and Distributed Systems, 2008, 19(6): 750–763
Article Google Scholar
Hua Y, Zhu Y, Jiang H, Feng D, Tian L. Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(4): 580–593
Article Google Scholar
Welch B, Unangst M, Abbasi Z, Gibson G A, Mueller B, Small J, Zelenka J, Zhou B. Scalable performance of the panasas parallel file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies. 2008, 17–33
Google Scholar
Xu Q, Arumugam R V, Yang K L, Mahadevan S. DROP: Facilitating distributed metadata management in EB-scale storage systems. In: Proceedings of the 30th IEEE Symposium on Mass Storage Systems and Technologies. 2013, 1–10
Google Scholar
Chen Z, Xiong J, Meng D. Replication-based highly available metadata management for cluster file systems. In: Proceedings of IEEE International Conference on Cluster Computing. 2010, 292–301
Google Scholar
Wendell P, Freedman M J. Going viral: flash crowds in an open CDN. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement. 2011, 549–558
Google Scholar
Fan B, Lim H, Andersen D G, Kaminsky M. Small cache, big effect: provable load balancing for randomly partitioned cluster services. In: Proceedings of ACM Symposium on Cloud Computing. 2011, 26–28
Google Scholar
Xu Q, Arumugam R V, Yong K L, Wen Y, Ong Y S. C ²: Adaptive load balancing for metadata server cluster in cloud-scale storage systems. In: Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems. 2015, 195–209
Google Scholar
Kavalanekar S, Worthington B L, Zhang Q, Sharda V. Characterization of storage workload traces from production windows servers. In: Proceedings of IEEE International Symposium on Workload Characterization. 2008, 119–128
Google Scholar
Ellard D, Ledlie J, Malkani P, Seltzer MI. Passive NFS tracing of email and research workloads. In: Proceedings of USENIX Conference on File and Storage Technologies. 2003, 203–216
Google Scholar
Stoica I, Morris R, Karger D R, Kaashoek MF, Balakrishnan H. Chord: a scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review, 2001, 31(4): 149–160
Article Google Scholar
Ledlie J, Seltzer M I. Distributed, secure load balancing with skew, heterogeneity and churn. In: Proceedings of IEEE International Conference on Computer Communications. 2005, 1419–1430
Google Scholar
Andersen D G, Franklin J, Kaminsky M, Phanishayee A, Tan L, Vasudevan V. FAWN: a fast array of wimpy nodes. In: Proceedings of ACM Symposium on Operating Systems Principles. 2009, 1–14
Google Scholar
O’Neil P E, Cheng E, Gawlick D, O’ Neil E J. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996, 33(4): 351–385
Article Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh W C, Wallach D A, Burrows M, Chandra T, Fikes A, Gruber R. Bigtable: A distributed storage system for structured data. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2006, 205–218
Google Scholar
Shetty P, Spillane R P, Malpani R, Andrews B, Seyster J, Zadok E. Building workload-independent storage with VT-trees. In: Proceedings of USENIX conference on File and Storage Technologies. 2013, 17–30
Google Scholar
Wang P, Sun G, Jiang S, Ouyang J, Lin S, Zhang C, Cong J. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In: Proceedings of European Conference on Computer Systems. 2014, 13–16
Google Scholar
Sivasubramanian S, Pierre G, Steen M, Alonso G. Analysis of caching and replication strategies for web applications. IEEE Internet Computing, 2007, 11(1): 60–66
Article Google Scholar
Gummadi P K, Dunn R J, Saroiu S, Gribble S D, Levy H M, Zahorjan J. Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: Proceedings of ACM Symposium on Operating Systems Principles. 2003, 314–329
Google Scholar
Khuller S, Kim Y A, Wan Y J. Algorithms for data migration with cloning. In: Proceedings of ACM on Principles of Database Systems. 2003, 27–36
Google Scholar
Fan L, Cao P, Almeida J M, Broder A Z. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 2000, 8(3): 281–293
Article Google Scholar
Bykov S, Geller A, Kliot G, Larus J R, Pandya R, Thelin J. Orleans: cloud computing for everyone. In: Proceedings of ACM Symposium on Cloud Computing. 2011, 1–14
Chapter Google Scholar
Xu Q, Arumugam R, Yong K L, Mahadevan S. Efficient and scalable metadata management in EB-scale file systems. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(11): 2840–2850
Article Google Scholar
Ratnasamy S, Handley M, Karp R M, Shenker S. Topologically-aware overlay construction and server selection. In: Proceedings of IEEE International Conference on Computer Communications. 2002, 1190–1199
Google Scholar
Renesse R, Schneider F B. Chain replication for supporting high throughput and availability. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2004, 91–104
Google Scholar
Moritz R H, Williams R C. A coin-tossing problem and some related combinatorics. Mathematics Magazine, 1988, 61(1): 24–29
Article MATH MathSciNet Google Scholar
Berenbrink P, Brinkmann A, Friedetzky T, Meister D, Nagel L. Distributing storage in cloud environments. In: Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum. 2013, 963–973
Google Scholar
Berenbrink P, Brinkmann A, Friedetzky T, Nagel L. Balls into nonuniform bins. Journal of Parallel and Distributed Computing, 2014, 74(2): 2065–2076
Article Google Scholar
Aho A V, Lam M S, Sethi R, Ullman J. Compilers: Principles, Techniques, and Tools. Reading, Massachusetts: Addison-Wesley Publishing Company, 2006
Google Scholar
Hua Y, Jiang H, Zhu Y, Feng D, Tian L. Smartstore: a new metadata organization paradigm with semantic-awareness for next-generation file systems. In: Proceedings of the ACM/IEEE Conference on High Performance Computing Networking, Storage and Analysis. 2009, 1–12
Chapter Google Scholar
Godfrey B, Lakshminarayanan K, Surana S, Karp R M, Stoica I. Load balancing in dynamic structured P2P systems. In: Proceedings of IEEE International Conference on Computer Communications. 2004, 2253–2262
Google Scholar
Karger D R, Ruhl M. Simple efficient load balancing algorithms for peer-to-peer systems. In: Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures. 2004, 36–43
Google Scholar
Naor M, Wieder U. Novel architectures for P2P applications: the continuous-discrete approach. ACM Transactions on Algorithms, 2007, 3(3): 1–37
Article MathSciNet Google Scholar
You G, Hwang S, Jain N. Scalable load balancing in cluster storage systems. In: Proceedings of the 12th International Middleware Conference on International Federation for Information Processing. 2011, 101–122
Google Scholar
Annapureddy S, Freedman MJ,Mazières D. Shark: scaling file servers via cooperative caching. In: Proceedings of the 2nd USENIX Symposium on Networked Systems Design and Implementation. 2005, 129–142
Google Scholar
Batsakis A, Burns R C. NFS-CD: write-enabled cooperative caching in NFS. IEEE Transactions on Parallel and Distributed Systems, 2008, 19(3): 323–333
Article Google Scholar
Yadgar G, Factor M, Schuster A. Cooperative caching with return on investment. In: Proceedings of the 29th IEEE Symposium on Mass Storage Systems and Technologies. 2013, 1–13
Google Scholar
Ramaswamy L, Liu L, Iyengar A. Cache clouds: cooperative caching of dynamic documents in edge networks. In: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems. 2005, 229–238
Google Scholar
Xu Q, Shen H T, Chen Z, Cui B, Zhou X, Dai Y. Hybrid information retrieval policies based on cooperative cache in mobile P2P networks. Frontiers of Computer Science in China, 2009, 3(3): 381–395
Article Google Scholar
Dabek F, Kaashoek M F, Karger D R, Morris R, Stoica I. Wide-area cooperative storage with CFS. In: Proceedings of ACM Symposium on Operating Systems Principles. 2001, 202–215
Google Scholar
Ramasubramanian V, Sirer E G. Beehive: O(1) lookup performance for power-law query distributions in peer-to-peer overlays. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2004, 99–112
Google Scholar
Gopalakrishnan V, Silaghi B D, Bhattacharjee B, Keleher P J. Adaptive replication in peer-to-peer systems. In: Proceedings of the 24th IEEE International Conference on Distributed Computing Systems. 2004, 360–369
Google Scholar

Download references

Author information

Authors and Affiliations

Data Storage Institute, Agency for Science, Technology and Research, Singapore, 138632, Singapore
Quanqing Xu, Rajesh Vellore Arumugam, Khai Leong Yong & Weiya Xi
School of Computer Engineering, Nanyang Technological University, Singapore, 639798, Singapore
Yonggang Wen & Yew-Soon Ong

Authors

Quanqing Xu
View author publications
Search author on:PubMed Google Scholar
Rajesh Vellore Arumugam
View author publications
Search author on:PubMed Google Scholar
Khai Leong Yong
View author publications
Search author on:PubMed Google Scholar
Yonggang Wen
View author publications
Search author on:PubMed Google Scholar
Yew-Soon Ong
View author publications
Search author on:PubMed Google Scholar
Weiya Xi
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Quanqing Xu.

Additional information

Quanqing Xu received his PhD in computer science from Peking University, China. He is currently a research scientist at Data Storage Institute (DSI), Agency for Science, Technology and Research (A*STAR), Singapore. His research interests mainly include distributed systems, file systems, cloud computing and cloud storage.

Rajesh Vellore Arumugam was a senior researcher at Data Storage Institute (DSI), Agency for Science, Technology and Research (A*STAR), Singapore. Rajesh held his MS in Electronics and Communication Engineering from Anna University, India. Currently, he is a part-time PhD student in the School of Computer Engineering, Nanyang Technological University, Singapore.

Khai Leong Yong received his BS in electrical and electronics engineering and his PhD in communication software and networks from the National University of Singapore, Singapore. He is currently a division manager of the Data Storage Institute (DSI), Agency for Science, Technology and Research (A*STAR), Singapore. In his role with DSI, Khai Leong leads a team of research scientists and engineers in developing data and storage technologies for next generation data centers.

Yonggang Wen is an assistant professor with School of Computer Engineering at Nanyang Technological University, Singapore. He received his PhD in electrical engineering and computer science from Massachusetts Institute of Technology (MIT), USA. His research interests include cloud computing, green data center, big data analytics, multimedia network and mobile computing.

Yew-Soon Ong received his PhD on Artificial Intelligence in complex design from the Computational Engineering and Design Center, University of Southampton, UK in 2003. He is currently an associate professor and director of Agency for Science, Technology and Research (A*STAR) SIMTECHNTU Joint Lab on Complex Systems and Programme at Nanyang Technological University, Singapore. His current research interest in computational intelligence spans across memetic computation, evolutionary design, machine learning and Big data.

Weiya Xi is a scientist working at Data Center Technology Division, Data Storage Institute (DSI), Agency for Science and Technology (A*STAR), Singapore. She received her BE from the Beijing University of Aeronautics & Astronautics, China and degrees of ME, MComp and PhD from National University of Singapore, Singapore. Her research interests include storage system simulation, erasure codes, file system and distributed storage system.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Q., Arumugam, R.V., Yong, K.L. et al. Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems. Front. Comput. Sci. 9, 904–918 (2015). https://doi.org/10.1007/s11704-015-4560-9

Download citation

Received: 09 December 2014
Accepted: 22 May 2015
Published: 16 September 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11704-015-4560-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Performance Comparison of Load Balancing in Cloud Computing Techniques

Replica-aware task scheduling and load balanced cache placement for delay reduction in multi-cloud environment

Performance Efficiency of Cloud Computing—A Literature Review

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now