ABSTRACT
Debugging of distributed computing model programs like MapReduce is a difficult task. That's why prior studies only focus on finding and fixing bugs in early stages of program development. Delta debugging tries to find minimal failing input in sequential programs by dividing inputs into subsets and testing these subsets one-by-one. But no prior work tries to find minimal failing input in distributed programs like MapReduce. In this paper, we present MapRedDD, a framework to efficiently find minimal failing input in MapReduce programs. MapRedDD employs failing input selection technique, focused on identifying the failing input subset in the single run of MapReduce program with multiple input subsets instead of testing each subset separately. This helps to reduce the number of executions of MapReduce program for each input subset and overcome the overhead of job submission, job scheduling and final outcome retrieval. Our work can efficiently find the minimal failing input in the number of executions equal to the number of inputs to MapReduce program N as opposed to the number of executions of MapReduce program equal to the number of input subsets 2N - 1 in worst case for binary search invariant algorithm to find minimal failing input.
- 2018. Apache MRUnit. https://mrunit.apache.org/. (2018).Google Scholar
- 2018. Mockito. https://code.google.com/p/mockito/. (2018).Google Scholar
- 2018. PowerMock. https://code.google.com/p/powermock/. (2018).Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
- Tom White. 2012. Hadoop: The Definitive Guide. " O'Reilly Media, Inc.". Google ScholarDigital Library
- Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Transactions on Software Engineering 28, 2 (2002), 183--200. Google ScholarDigital Library
Index Terms
- Efficiently finding minimal failing input in MapReduce programs
Recommendations
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Research on Multiple Files Input Programming Method Based on MapReduce
IScIDE 2015: Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243Hadoop is a software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is widely used due to its scalability, high reliability, low-cost, high efficiency and so ...
Implementation of Distributed Searching and Sorting using Hadoop MapReduce
ICTCS '14: Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive StrategiesThis paper focuses on implementation of MapReduce programming model on Hadoop cluster for parallel processing of huge amount of data efficiently. There is deluge of data everywhere and we need to process these data efficiently to take decisions and to ...
Comments