Meeting deadlines for approximation processing in MapReduce environments

Hu, Ming-hao; Wang, Chang-jian; Peng, Yu-xing

doi:10.1631/FITEE.1601056

Meeting deadlines for approximation processing in MapReduce environments

Published: 18 January 2018

Volume 18, pages 1754–1772, (2017)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

83 Accesses
Explore all metrics

Abstract

To provide timely results for big data analytics, it is crucial to satisfy deadline requirements for MapReduce jobs in today’s production environments. Much effort has been devoted to the problem of meeting deadlines, and typically there exist two kinds of solutions. The first is to allocate appropriate resources to complete the entire job before the specified time limit, where missed deadlines result because of tight deadline constraints or lack of resources; the second is to run a pre-constructed sample based on deadline constraints, which can satisfy the time requirement but fail to maximize the volumes of processed data. In this paper, we propose a deadline-oriented task scheduling approach, named ‘Dart’, to address the above problem. Given a specified deadline and restricted resources, Dart uses an iterative estimation method, which is based on both historical data and job running status to precisely estimate the real-time job completion time. Based on the estimated time, Dart uses an approach–revise algorithm to make dynamic scheduling decisions for meeting deadlines while maximizing the amount of processed data and mitigating stragglers. Dart also efficiently handles task failures and data skew, protecting its performance from being harmed. We have validated our approach using workloads from OpenCloud and Facebook on a cluster of 64 virtual machines. The results show that Dart can not only effectively meet the deadline but also process near-maximum volumes of data even with tight deadlines and limited resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deadline-Oriented Task Scheduling for MapReduce Environments

Improvement of Makespan and TCTime in Dynamic Job Ordering and Slot Utilization for MapReduce Workloads

HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs

Article 04 May 2016

References

Acharya, S., Gibbons, P., Poosala, V., 1999. Aqua: a fast decision support system using approximate query answers. Proc. 25th Int. Conf. on Very Large Data Bases, p.754–757.
Google Scholar
Agarwal, S., Mozafari, B., Panda, A., et al., 2013. Blinkdb: queries with bounded errors and bounded response times on very large data. Proc. 8th ACM European Conf. on Computer Systems, p.29–42. https://doi.org/10.1145/2465351.2465355
Google Scholar
Ananthanarayanan, G., Kandula, S., Greenberg, A.G., et al., 2010. Reining in the outliers in Map-Reduce clusters using Mantri. Proc. 10th USENIX Symp. on Operating Systems Design and Implementation, p.24–38.
Google Scholar
Ananthanarayanan, G., Ghodsi, A., Shenker, S., et al., 2013. Effective straggler mitigation: attack of the clones. Proc. 10th USENIX Symp. on Networked Systems Design and Implementation, p.185–198.
Google Scholar
Ananthanarayanan, G., Hung, M.C.C., Ren, X., et al., 2014. Grass: trimming stragglers in approximation analytics. Proc. 11th USENIX Symp. on Networked Systems Design and Implementation, p.289–302.
Google Scholar
Apache, 2016. The Apache Hadoop Project. http://hadoop.apache.org/
Google Scholar
Bates, D.M., Watts, D.G., 1988. Nonlinear regression inference using the linear approximation. In: Jantsch, E., Waddington, C. (Eds.), Nonlinear Regression: Iterative Estimation and Linear Approximations. Wiley Online Library, p.142–167. https://doi.org/10.1002/9780470316757.ch2
Google Scholar
Bell Laboratories, 2001. Approximate Query Processing: Taming the Terabytes. http://www.vldb.org/conf/2001/tut4.pdf
Google Scholar
Chen, Y., Ganapathi, A., Griggith, R., et al., 2011. The case for evaluating MapReduce performance using workload suites. Proc. IEEE 19th Int. Symp. on Modeling, Analysis & Simulation of Computer and Telecommunication Systems. https://doi.org/10.1109/MASCOTS.2011.12
Google Scholar
Chen, Y., Alspaugh, S., Katz, R., 2012. Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow., 5(12): 1802–1813. https://doi.org/10.14778/2367502.2367519
Article Google Scholar
Chowdhury, M., Zaharia, M., Ma, J., et al., 2011. Managing data transfers in computer clusters with orchestra. SIGCOMM Comput. Commun. Rev., 41(4): 98–109. https://doi.org/10.1145/2043164.2018448
Article Google Scholar
Chowdhury, M., Zhong, Y., Stoica, I., 2014. Efficient coflow scheduling with varys. SIGCOMM Comput. Commun. Rev., 44(4): 443–454. https://doi.org/10.1145/2740070.2626315
Article Google Scholar
Cloudera, 2013. Statistical Workload Injector for MapReduce. https://github.com/SWIMProjectUCB/SWIM
Google Scholar
Dean, J., Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1): 107–113. https://doi.org/10.1145/1327452.1327492
Article Google Scholar
Ferguson, A.D., Bodik, P., Kandula, S., 2012. Jockey: guaranteed job latency in data parallel clusters. Proc. 7th ACM European Conf. on Computer Systems, p.99–112. https://doi.org/10.1145/2168836.2168847
Google Scholar
Herodotou, H., Lim, H., Luo, G., 2011. Starfish: a self-tuning system for big data analytics. Proc. 7th Biennial Conf. on Innovative Data Systems Research, p.261–272.
Google Scholar
Hu, M., Wang, C., You, P., et al., 2015. Deadline-oriented task scheduling for mapreduce environments. LNCS, 9529: 359–372. https://doi.org/10.1007/978-3-319-27122-4_25
Google Scholar
Kc, K., Anyanwu, K., 2010. Scheduling Hadoop jobs to meet deadlines. IEEE 2nd Int. Conf. on Cloud Computing Technology and Science, p.388–392. https://doi.org/10.1109/CloudCom.2010.97
Google Scholar
Li, S., Hu, S., Wang, S., et al., 2014. Woha: deadlineaware Map-Reduce workflow scheduling framework over Hadoop clusters. IEEE 34th Int. Conf. on Distributed Computing Systems, p.93–103. https://doi.org/10.1109/ICDCS.2014.18
Google Scholar
Liu, J., Shih, K., Lin, W., et al., 1994. Imprecise computations. Proc. IEEE, 82: 83–94. https://doi.org/10.1109/5.259428
Article Google Scholar
Lohr, S., 2009. Simple probability samples. In: Sampling: Design and Analysis. Addison-Wesley, London, p.35–67.
Google Scholar
Marquardt, D.W., 1963. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math., 11(2): 431–441.
Article MathSciNet Google Scholar
Morton, K., Balazinska, M., Grossman, D., 2010a. Para- Timer: a progress indicator for MapReduce dags. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.507–518. https://doi.org/10.1145/1807167.1807223
Google Scholar
Morton, K., Friesen, A., Balazinska, M., et al., 2010b. Estimating the progress of MapReduce pipelines. Proc. IEEE 26th Int. Conf. on Data Engineering, p.681–684. https://doi.org/10.1109/ICDE.2010.5447919
Google Scholar
Motulsky, H.J., Ransnas, L.A., 1987. Fitting curves to data using nonlinear regression: a practical and nonmathematical review. FASEB J., 1(5): 365–374.
Article Google Scholar
OREILLY, 2013. Interactive Big Data Analysis Using Approximate Answers. https://tinyurl.com/k5favda/
Google Scholar
Polo, J., Carrera, D., Becerra, Y., et al., 2010. Performancedriven task co-scheduling for MapReduce environments. Proc. IEEE Int. Congress on Network Operations and Management Symp., p.373–380. https://doi.org/10.1109/NOMS.2010.5488494
Google Scholar
Ren, K., Kwon, Y., Balazinska, M., et al., 2013. Hadoop’s adolescence: an analysis of Hadoop usage in scientific workloads. Proc. VLDB Endow., 6(10): 853–864. https://doi.org/10.14778/2536206.2536213
Article Google Scholar
Vavilapalli, V.K., Murthy, A.C., Douglas, C., et al., 2013. Apache Hadoop Yarn: yet another resource negotiator. Proc. 4th Annual Symp. on Cloud Computing, p.5:1-5:16. https://doi.org/10.1145/2523616.2523633
Google Scholar
Venkataraman, S., Panda, A., Ananthanarayanan, G., et al., 2007. The power of choice in data-aware cluster scheduling. Proc. 11th USENIX Symp. on Operating Systems Design and Implementation, p.301–316.
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H., 2011. Aria: automatic resource inference and allocation for MapReduce environments. Proc. 8th ACM Int. Conf. on Autonomic Computing, p.235–244. https://doi.org/10.1145/1998582.1998637
Google Scholar
Verma, A., Cherkasova, L., Kumar, V.S., et al., 2012. Deadline-based workload management for MapReduce environments: pieces of the performance puzzle. Proc. IEEE Int. Congress on Network Operations and Management Symp., p.900–905. https://doi.org/10.1109/NOMS.2012.6212006
Google Scholar
Wang, C., Peng, Y., Tang, M., et al., 2014. MapCheckReduce: an improved MapReduce computing model for imprecise applications. Proc. IEEE Int. Congress on Big Data, p.366–373. https://doi.org/10.1109/BigData.Congress.2014.61
Google Scholar
Wang, X., Shen, D., Bai, M., et al., 2015. SAMES: deadlineconstraint scheduling in MapReduce. Front. Comput. Sci., 9(1): 128–141. https://doi.org/10.1007/s11704-014-4138-y
Article MathSciNet Google Scholar
Zacheilas, N., Kalogeraki, V., 2014. Real-time scheduling of skewed MapReduce jobs in heterogeneous environments. Proc. 11th Int. Conf. on Autonomic Computing, p.189–200.
Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., et al., 2008. Improving MapReduce performance in heterogeneous environments. Proc. 8th USENIX Symp. on Operating Systems Design and Implementation, p.7–21.
Google Scholar
Zaharia, M., Borthakur, D., Sen, S., et al., 2010. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. Proc. 5th European Conf. on Computer Systems, p.265–278. https://doi.org/10.1145/1755913.1755940
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, 410073, China
Ming-hao Hu, Chang-jian Wang & Yu-xing Peng

Authors

Ming-hao Hu
View author publications
You can also search for this author inPubMed Google Scholar
Chang-jian Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yu-xing Peng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ming-hao Hu.

Additional information

Project supported by the National Key Research and Development Program of China (No. 2016YFB1000101)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Mh., Wang, Cj. & Peng, Yx. Meeting deadlines for approximation processing in MapReduce environments. Frontiers Inf Technol Electronic Eng 18, 1754–1772 (2017). https://doi.org/10.1631/FITEE.1601056

Download citation

Received: 13 March 2016
Accepted: 24 June 2016
Published: 18 January 2018
Issue Date: November 2017
DOI: https://doi.org/10.1631/FITEE.1601056

Key words

CLC number

TP311

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Meeting deadlines for approximation processing in MapReduce environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deadline-Oriented Task Scheduling for MapReduce Environments

Improvement of Makespan and TCTime in Dynamic Job Ordering and Slot Utilization for MapReduce Workloads

HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now