Abstract
To provide timely results for ‘Big Data Analytics’, it is crucial to satisfy deadline requirements for MapReduce jobs in production environments. In this paper, we propose a deadline-oriented task scheduling approach, named Dart, to meet the given deadline and maximize the input size if only part of the dataset can be processed before the time limit. Dart uses an iterative estimation method which is based on both historical data and job running status to precisely estimate the real-time job completion time. By comparing the estimated time with the deadline constraint, a YARN-based task scheduler dynamically decides whether continuing or terminating the map phase. We have validated our approach using workloads from OpenCloud and Facebook on a cluster of 60 virtual machines. The results show that Dart can not only effectively meet the deadline but also process near-maximal data volumes even when the deadline is set to be extremely small and limited resources are allocated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hadoop: Open source implementation of MapReduce. http://hadoop.apache.org/
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In: EuroSys, pp. 29–42 (2013)
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, p. 24 (2010)
Bates, D.: Nonlinear Regression: Iterative Estimation and Linear Approximations. Wiley Online Library, New York (1988)
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. In: VLDB, pp. 1802–1813 (2012)
Chen, Y., Ganapathi, A., Griggith, R., Katz, R.: The case for evaluating mapreduce performance using workload suites. In: MASCOTS (2011)
Chowdhury, M., Zaharia, M., Ma, J., Jordan, M., Stoica, I.: Managing data transfers in computer clusters with orchestra. In: SIGCOMM (2011)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Garofalais, M., Gibbons, P.: Approximate query processing: taming the terabytes. In: VLDB (2001)
Lohr, S.: Sampling: Design and Analysis. Thomson (2009)
Morton, K., Balazinska, M., Grossman, D.: Paratimer: a progress indicator for mapreduce dags. In: SIGMOD, pp. 507–518 (2010)
Polo, J., Carrera, D., Becerra, Y., Torres, J., Ayguadé, E., Steinder, M., Whalley, I.: Performance-driven task co-scheduling for mapreduce environments. In: NOMS, pp. 373–380 (2010)
Ren, K., Kwon, Y., Balazinska, M., Howe, B.: Hadoop’s adolescence: an analysis of hadoop usage in scientific workloads. In: VLDB (2013)
Verma, A., Cherkasova, L., Campbell, R.: Aria: automatic resource inference and allocation for mapreduce environments. In: ICAC (2011)
Wang, C., Peng, Y., Tang, M., Li, D., Li, S., You, P.: Mapcheckreduce: an improved mapreduce computing model for imprecise applications. In: Big Data, pp. 366–373 (2014)
Zaharia, M., Borthakur, D., Sen, S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: EuroSys, pp. 265–278 (2010)
Acknowledgments
This work is sponsored in part by the National Natural Science Foundation of China under Grant No. 61572510, the National Natural Science Foundation of China under Grant No. 61402490, and the National Basic Research Program of China (973) under Grant No. 2014CB340303.
This work is also supported by the National Basic Research Program of China under Grant No. 2011CB302601.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hu, M., Wang, C., You, P., Huang, Z., Peng, Y. (2015). Deadline-Oriented Task Scheduling for MapReduce Environments. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-27122-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)