Skip to main content
Log in

SAMES: deadline-constraint scheduling in MapReduce

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

MapReduce is a popular parallel data-processing system, and task scheduling is one of the kernel techniques in MapReduce. In many applications, users have requirements that their MapReduce jobs should be completed before specific deadlines. Hence, in this paper, a novel scheduling algorithm based on the most effective sequence (SAMES) is proposed for deadline-constraint jobs in MapReduce. First, according to the characteristics of MapReduce, we propose a novel sequence-based execution strategy for MapReduce jobs and a new concept, the effective sequence (ES). Then, we design some efficient approaches for finding ESes and choose the most effective sequence (MES) for job execution. We also propose methods for MES-updates and exception handling. Finally, we verify the effectiveness of SAMES through experiments. The experimental results show that SAMES is an efficient scheduling algorithm for deadline-constraint jobs in MapReduce.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107–113

    Article  Google Scholar 

  2. Jiang D, Ooi B C, Shi L, Wu S. The performance of mapreduce: an in-depth study. Proceedings of the VLDB Endowment, 2010, 3(1–2): 472–483

    Article  Google Scholar 

  3. Polo J, Carrera D, Becerra Y, Torres J. Performance-driven task coscheduling for mapreduce environments. In: Proceedings of the Network Operations and Managment Symposium (NOMS). 2010, 373–380

    Google Scholar 

  4. Kc K, Anyanwu K. Scheduling hadoop jobs to meet deadlines. In: Proceedings of 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom). 2010, 388–392

    Chapter  Google Scholar 

  5. Verma A, Cherkasova L, Kumar V S, Campbell R H. Deadline-based workload management for mapreduce environments: pieces of the performance puzzle. In: Proceedings of the Network Operations andManagment Symposium (NOMS). 2012, 900–905

    Google Scholar 

  6. Sandholm T, Lai K. Dynamic proportional share scheduling in hadoop. In: Proceedings of the Job Scheduling Strategies for Parallel Processing. Berlin: Springer, 2010, 110–131

    Chapter  Google Scholar 

  7. Schwarzkopf M, Konwinski A, Abd-El-Malek M, Wilkes J. Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, ACM. 2013, 351–364

    Chapter  Google Scholar 

  8. Wolf J, Balmin A, Rajan D, Hildrum K, Khandekar R, Parekh S, Wu K L, Vernica R. Circumflex: a scheduling optimizer for mapreduce workloads with shared scans. SIGOPS, 2012, 46(1): 26–32

    Article  Google Scholar 

  9. Morton K, Balazinska M, Grossman D. Paratimer: a progress indicator for mapreduce dags. In: SIGMOD Conference’10. 2010, 507–518

    Google Scholar 

  10. Condie T, Conway N, Alvaro P, Hellerstein J M. Mapreduce online. In: Proceedings of NSDI. 2010, 313–328

    Google Scholar 

  11. Zaharia M, Elmeleegy K, Borthakur D, Shenker S, Sen Sarma J, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of EuroSys, ACM. 2010, 265–278

    Google Scholar 

  12. Zaharia M, Konwinski A, Joseph A D, Katz R, Stoica I. Improving mapreduce performance in heterogeneous environments. In: Proceedings of OSDI. 2008, 29–42

    Google Scholar 

  13. Verma A, Cherkasova L, Campbell R H. Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, ACM. 2011, 235–244

    Google Scholar 

  14. Dou A, Kalogeraki V, Gunopulos D, Mielikainen T, Tuulos V H. Misco: a mapreduce framework for mobile systems. In: Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments, ACM. 2010, 32–39

    Google Scholar 

  15. Dou A J, Kalogeraki V, Gunopulos D, Mielikainen T, Tuulos V H. Scheduling for real-time mobile mapreduce systems. In: Proceedings of the 5th ACM International Conference on Distributed Event-based System. 2011, 347–358

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xite Wang.

Additional information

Xite Wang is a PhD candidate in the College of Information Science & Engineering, Northeastern University, China, from where he also received his BS and MS in 2009 and 2011, respectively. His research interests include cloud computing and big-data management.

Derong Shen is a full professor and a PhD supervisor in the College of Information Science & Engineering, Northeastern University, China, from where she received her PhD in 2004. She received her BS and MS from Jilin University, China in 1987 and 1990, respectively. Her interests include entity search and distributed computing.

Mei Bai is a PhD candidate in the College of Information Science & Engineering, Northeastern University, China, from where she received her BS and MS in 2009 and 2011, respectively. Her research interests include sensor data management and uncertain data management.

Tiezheng Nie is an associate professor in the College of Information Science & Engineering, Northeastern University, China, from where he received his BS, MS, and PhD in 2002, 2005, and 2009, respectively. His interests include data quality and data integration.

Yue Kou is an associate professor in the College of Information Science & Engineering, Northeastern University, China, from where she also received her BS, MS, and PhD in 2002, 2005, and 2009, respectively. Her interests include entity resolution and Web data management.

Ge Yu is a full professor and a PhD supervisor in the College of Information Science & Engineering, Northeastern University, China, from where he received his BS and MS in 1982 and 1985, respectively. He received his PhD from Kyushu University of Japan in 1996. He is a senior member of the CCF, and a member of the ACM, IEEE. His interests include databases and big-data management.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Shen, D., Bai, M. et al. SAMES: deadline-constraint scheduling in MapReduce. Front. Comput. Sci. 9, 128–141 (2015). https://doi.org/10.1007/s11704-014-4138-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-014-4138-y

Keywords

Navigation