skip to main content
10.1145/1651263.1651271acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Hadoop high availability through metadata replication

Published: 02 November 2009 Publication History

Abstract

Hadoop is widely adopted to support data intensive distributed applications. Many of them are mission critical and require inherent high availability of Hadoop. Unfortunately, Hadoop has no high availability support yet, and it is not trivial to enhance Hadoop. Based on thorough investigation of Hadoop, this paper proposes a metadata replication based solution to enable Hadoop high availability by removing single point of failure in Hadoop. The solution involves three major phases: in initialization phase, each standby/slave node is registered to active/primary node and its initial metadata (such as version file and file system image) are caught up with those of active/primary node; in replication phase, the runtime metadata (such as outstanding operations and lease states) for failover in future are replicated; in failover phase, standby/new elected primary node takes over all communications. The solution presents several unique features for Hadoop, such as runtime configurable synchronization mode. The experiments demonstrate the feasibility and efficiency of our solution.

References

[1]
Apache Hadoop. http://hadoop.apache.org/
[2]
E. Baldeschwieler and D. Cutting. 2009. State of Hadoop. In Hadoop Summit 2009 (Santa Clara, US, June 10, 2009).
[3]
A. Stern. 2008. Amazon S3 Down. July 20, 2008. http://www.centernetworks.com/amazon-s3-down-july-2008
[4]
T. White. 2009. Hadoop: The Definitive Guide. O'Reilly Media, Inc. June 2009.
[5]
D. Skeen and M. Stonebraker. 1983. A Formal Model of Crash Recovery in a Distributed System. IEEE Transactions on Software Engineering. Vol. 9, Issue 3 (May 1983), 219--228. DOI = http://doi.acm.org/10.1109/TSE.1983.236608
[6]
M. Burrows. 2006. The Chubby Lock Service for Loosely-Coupled Distributed Systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, WA, USA, November 06-08, 2006). OSDI'06. USENIX Association, Berkeley, CA, 335--350.
[7]
Apache Zookeeper. http://hadoop.apache.org/zookeeper/
[8]
C. Bisciglia. Hadoop HA Configuration. Jul. 22, 2009. http://www.cloudera.com/blog/2009/07/22/hadoop-ha-configuration/
[9]
E. Sorensen. 2007. Derby: Replication and Availability. MS Thesis. Norwegian University of Science and Technology. June 2007.
[10]
H. Yadava. The Berkeley DB Book. Apress. Oct. 2007.
[11]
MySQL. http://www.mysql.com/
[12]
Torodanhan. 2009. Best Practice: DB2 High Availability Disaster Recovery. Apr. 1, 2009. http://www.ibm.com/developerworks/wikis/display/data/Best+Practice+-+DB2+High+Availability+Disaster+Recovery

Cited By

View all
  • (2023)Interminable Flows: A Generic, Joint, Customizable Resiliency Model for Big-Data Streaming PlatformsIEEE Access10.1109/ACCESS.2023.323936511(10762-10776)Online publication date: 2023
  • (2023)Performance Enhancement of Distributed System Using HDFS Federation and ShardingProcedia Computer Science10.1016/j.procs.2023.01.254218:C(2830-2841)Online publication date: 1-Jan-2023
  • (2022)Multi-Framework Reliability ApproachIEEE Transactions on Cloud Computing10.1109/TCC.2021.306590610:4(2750-2768)Online publication date: 1-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CloudDB '09: Proceedings of the first international workshop on Cloud data management
November 2009
62 pages
ISBN:9781605588025
DOI:10.1145/1651263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. hadoop

Qualifiers

  • Research-article

Conference

CIKM '09
Sponsor:

Acceptance Rates

CloudDB '09 Paper Acceptance Rate 8 of 11 submissions, 73%;
Overall Acceptance Rate 12 of 17 submissions, 71%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Interminable Flows: A Generic, Joint, Customizable Resiliency Model for Big-Data Streaming PlatformsIEEE Access10.1109/ACCESS.2023.323936511(10762-10776)Online publication date: 2023
  • (2023)Performance Enhancement of Distributed System Using HDFS Federation and ShardingProcedia Computer Science10.1016/j.procs.2023.01.254218:C(2830-2841)Online publication date: 1-Jan-2023
  • (2022)Multi-Framework Reliability ApproachIEEE Transactions on Cloud Computing10.1109/TCC.2021.306590610:4(2750-2768)Online publication date: 1-Oct-2022
  • (2021)Reliability of Large Scale GPU Clusters for Deep Learning WorkloadsCompanion Proceedings of the Web Conference 202110.1145/3442442.3452056(179-181)Online publication date: 19-Apr-2021
  • (2021)Metadata Replication with Synchronous OpCodes Writing for Namenode Multiplexing in Hadoop2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS)10.1109/IEMTRONICS52119.2021.9422639(1-7)Online publication date: 21-Apr-2021
  • (2019)Edge-to-Edge Resource Discovery using Metadata Replication2019 IEEE 3rd International Conference on Fog and Edge Computing (ICFEC)10.1109/CFEC.2019.8733149(1-6)Online publication date: May-2019
  • (2019)MapReduce: an infrastructure review and research insightsThe Journal of Supercomputing10.1007/s11227-019-02907-5Online publication date: 8-Jun-2019
  • (2019)Prefetching-based metadata management in Advanced Multitenant HadoopThe Journal of Supercomputing10.1007/s11227-017-2019-575:2(533-553)Online publication date: 1-Feb-2019
  • (2019)Computational grid scheduling architecture using MapReduce model-based non-dominated sorting genetic algorithmSoft Computing10.1007/s00500-019-03946-zOnline publication date: 27-Mar-2019
  • (2019)A Data Management Scheme for Micro-Level Modular Computation-Intensive Programs in Big Data PlatformsData Management and Analysis10.1007/978-3-030-32587-9_9(135-153)Online publication date: 21-Dec-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media