Abstract:
Cloud computing and big data technologies have gained great popularity in recent years. MapReduce is still one of the most efficient and well-adopted computing paradigms ...Show MoreNotes: IEEE Xplore Notice to Reader "McTAR: A Multi-trigger Checkpointing Tactic for Fast Task Recovery in MapReduce" by Jing Liu, Peng Wang, Jiantao Zhou, and Keqin Li published in IEEE Transactions on Services Computing Early Access Digital Object Identifier: 10.1109/TSC.2019.2904270 This article includes an author who was prohibited from publishing with IEEE prior to publication of the article. Due to this prohibition, reasonable effort should be made to remove all past references to this article, and refrain from future references to this article. We regret any inconvenience this may have caused.
Metadata
Abstract:
Cloud computing and big data technologies have gained great popularity in recent years. MapReduce is still one of the most efficient and well-adopted computing paradigms for providing big data services. MapReduce applications need to be executed on cloud platform where failures are inevitable. Hadoop is the de facto implementation of MapReduce, but it deploys a coarse grained and unsatisfactory fault tolerant services. The failed tasks are rescheduled from scratch to re-execute from the very beginning, which apparently brings amount of overload for failure recovery, and the whole job would be heavily delayed as failures happen. In this paper, we propose a novel multi-trigger checkpointing approach for fast recovery of MapReduce tasks, named a Multi-trigger Checkpointing Tactic for fAst TAsk Recovery (McTAR). As a finer-grained and better fault tolerance tactic, our McTAR employs multi-trigger checkpoint generation, push-pull combined intermediate data distribution and optimized failure task prediction techniques together to make the recovery task attempt be able to start at a specific progress according to the valid checkpoint for intermediate data. In this way, McTAR could effectively speed up the recovery process of MapReduce jobs and highly reduce the task recovery delay.
Notes: IEEE Xplore Notice to Reader "McTAR: A Multi-trigger Checkpointing Tactic for Fast Task Recovery in MapReduce" by Jing Liu, Peng Wang, Jiantao Zhou, and Keqin Li published in IEEE Transactions on Services Computing Early Access Digital Object Identifier: 10.1109/TSC.2019.2904270 This article includes an author who was prohibited from publishing with IEEE prior to publication of the article. Due to this prohibition, reasonable effort should be made to remove all past references to this article, and refrain from future references to this article. We regret any inconvenience this may have caused.
Published in: IEEE Transactions on Services Computing ( Volume: 14, Issue: 6, 01 Nov.-Dec. 2021)