poster

Efficiently finding minimal failing input in MapReduce programs

Authors:
Muhammad Sohaib Ayub

Lahore University of Management Sciences, Lahore, Pakistan

Lahore University of Management Sciences, Lahore, Pakistan
View Profile

,
Junaid Haroon Siddiqui

Lahore University of Management Sciences, Lahore, Pakistan

Lahore University of Management Sciences, Lahore, Pakistan
View Profile

ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion ProceeedingsMay 2018Pages 177–178https://doi.org/10.1145/3183440.3195084

Published:27 May 2018Publication History

ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

Pages 177–178

ABSTRACT

Debugging of distributed computing model programs like MapReduce is a difficult task. That's why prior studies only focus on finding and fixing bugs in early stages of program development. Delta debugging tries to find minimal failing input in sequential programs by dividing inputs into subsets and testing these subsets one-by-one. But no prior work tries to find minimal failing input in distributed programs like MapReduce. In this paper, we present MapRedDD, a framework to efficiently find minimal failing input in MapReduce programs. MapRedDD employs failing input selection technique, focused on identifying the failing input subset in the single run of MapReduce program with multiple input subsets instead of testing each subset separately. This helps to reduce the number of executions of MapReduce program for each input subset and overcome the overhead of job submission, job scheduling and final outcome retrieval. Our work can efficiently find the minimal failing input in the number of executions equal to the number of inputs to MapReduce program N as opposed to the number of executions of MapReduce program equal to the number of input subsets 2^N - 1 in worst case for binary search invariant algorithm to find minimal failing input.

References

2018. Apache MRUnit. https://mrunit.apache.org/. (2018).Google Scholar
2018. Mockito. https://code.google.com/p/mockito/. (2018).Google Scholar
2018. PowerMock. https://code.google.com/p/powermock/. (2018).Google Scholar
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
Tom White. 2012. Hadoop: The Definitive Guide. " O'Reilly Media, Inc.". Google ScholarDigital Library
Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Transactions on Software Engineering 28, 2 (2002), 183--200. Google ScholarDigital Library

Index Terms

Efficiently finding minimal failing input in MapReduce programs
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More
Research on Multiple Files Input Programming Method Based on MapReduce
IScIDE 2015: Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243

Hadoop is a software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is widely used due to its scalability, high reliability, low-cost, high efficiency and so ...
Read More
Implementation of Distributed Searching and Sorting using Hadoop MapReduce
ICTCS '14: Proceedings of the 2014 International Conference on Information and Communication Technology for Competitive Strategies

This paper focuses on implementation of MapReduce programming model on Hadoop cluster for parallel processing of huge amount of data efficiently. There is deluge of data everywhere and we need to process these data efficiently to take decisions and to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings
May 2018
231 pages
ISBN:9781450356633
DOI:10.1145/3183440
Conference Chair:
Michel Chaudron
Chalmers University of Technology, University of Gothenburg, Sweden
,
General Chair:
Ivica Crnkovic
Chalmers University of Technology, University of Gothenburg, Sweden
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Mark Harman
Facebook and University College London, United Kingdom
Copyright © 2018 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2018
Check for updates
Author Tags
MapReduce
delta debugging
software verification
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 70
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficiently finding minimal failing input in MapReduce programs

ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

ABSTRACT

References

Cited By

Index Terms

Recommendations

MapReduce: Review and open challenges

Research on Multiple Files Input Programming Method Based on MapReduce

Implementation of Distributed Searching and Sorting using Hadoop MapReduce