Skip to main content
Log in

An efficient MapReduce scheduling scheme for processing large multimedia data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a scheduling scheme to minimize the deadline miss of jobs to which deadlines are assigned when processing large multimedia data such as video and image in MapReduce frameworks. The proposed scheme checks the satisfaction of data locality to process assigned jobs within a time limit and considers whether I/O load and deadline requirement are satisfied. If jobs are run in a node with excessive I/O load, multimedia data from the replica node can be utilized to improve a job task processing speed. If available nodes are not found due to expected job completion time exceeding the deadline, the job tasks in nodes whose deadlines are available are paused temporarily to shorten the job completion time. In addition, speculative tasks and hot data block replication are employed to prevent the overall deadline miss ratio from increasing due to the repetition of job pauses whose deadlines are available for the purpose of processing urgent jobs quickly. The speculative task is a technique for assigning the same job to other nodes redundantly and for taking the result from the node that completes the job first and then cancelling the other jobs assigned previously. To verify the superiority of the proposed scheme, a performance evaluation is conducted by comparing it with the existing scheme. The performance evaluation result showed that the proposed scheme reduced completion time by 13.8 % and improved the deadline success ratio by 11 % compared with those of the existing scheme on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Alam A, Ahmed J (2014) Hadoop Architecture and Its Issues. Proceedings of International Conference on Computational Science and Computational Intelligence:288–291. doi:10.1109/csci.2014.140

  2. Alham NK, Li M, Liu Y, Hammoud S (2011) A MapReduce-based distributed SVM algorithm for automatic image annotation. Computers & Mathematics with Applications 62(7):2801–2811. doi:10.1016/j.camwa.2011.07.046

    Article  MATH  Google Scholar 

  3. Althebyan Q, Jararweh Y, Yaseen Q, AlQudah O, Al-Ayyoub M (2015) Evaluating map reduce tasks scheduling algorithms over cloud computing infrastructure. Concurrency and Computation: Practice and Experience 27(18):5686–5699. doi:10.1002/cpe.3595

    Article  Google Scholar 

  4. Apache™ Hadoop (2013) Fair Scheduler https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html

  5. Asahara M, Nakadai S, Araki T (2012) LoadAtomizer: A locality and I/O load aware task scheduler for MapReduce. Proceedings of International Conference on Cloud Computing Technology and Science Proceedings:317–324. doi:10.1109/CloudCom.2012.6427572

  6. Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R (2015) Big data computing and clouds: trends and future directions. Journal of Parallel and Distributed Computing 79:3–15. doi:10.1016/j.jpdc.2014.08.003

    Article  Google Scholar 

  7. Azzedin F (2013) Towards a Scalable HDFS Architecture. Proceedings of International Conference on Collaboration Technologies and Systems:155–161. doi:10.1109/cts.2013.6567222

  8. Chen CLP, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347. doi:10.1016/j.ins.2014.01.015

    Article  Google Scholar 

  9. Dai X, Bensaou B (2016) Scheduling for response time in Hadoop MapReduce. Proceedings of IEEE International Conference on Communications:1–6. doi:10.1109/icc.2016.7511252

  10. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492

    Article  Google Scholar 

  11. Dittrich J, Quiani-Ruiz J (2012) Efficient Big Data Processing in Hadoop MapReduce. Proceedings of the VLDB Endowment 5(12):2014–2015. doi:10.14778/2367502.2367562

    Article  Google Scholar 

  12. Dong B, Zheng Q, Tian F, Chao K, Godwin N, Ma T, Xu H (2014) Performance models and dynamic characteristics analysis for HDFS write and read operations: a systematic view. J Syst Softw 93:132–151. doi:10.1016/j.jss.2014.02.038

    Article  Google Scholar 

  13. Dörre J, Apel S, Lengauer C (2015) Modeling and optimizing MapReduce programs. Concurrency and Computation: Practice and Experience 27(7):1734–1766. doi:10.1002/cpe.3333

    Article  Google Scholar 

  14. Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144. doi:10.1016/j.ijinfomgt.2014.10.007

    Article  Google Scholar 

  15. Ghemawat S, Gobioff H, Leung S (2003) The Google file system. Proceedings of ACM Symposium on Operating Systems Principles:29–43. doi:10.1145/1165389.945450

  16. Hare JS, Samangooei S, Lewis PH (2014) Practical scalable image analysis and indexing using Hadoop. Multimedia Tools and Applications 71(3):1215–1248. doi:10.1007/s11042-012-1256-0

    Article  Google Scholar 

  17. Hua X, Wu H, Li Z, Ren S (2014) Enhancing throughput of the Hadoop distributed file system for interaction-intensive tasks. Journal of Parallel and Distributed Computing 74(8):2770–2779. doi:10.1016/j.jpdc.2014.03.010

    Article  Google Scholar 

  18. Idris M, Hussain S, Ali M, Abdulali A, Siddiqi M. H, Kang B. H, Lee S (2015a) Context-aware scheduling in MapReduce: a compact review. Concurrency and Computation: Practice and Experience 27(17):5332–5349. doi:10.1002/cpe.3578

    Article  Google Scholar 

  19. Idris M, Hussain S, Ali M, Abdulali A, Siddiqi MH, Kang BH, Lee S (2015b) Context-aware scheduling in MapReduce: a compact review. Concurrency and Computation: Practice and Experience 27(17):5332–5349. doi:10.1002/cpe.3578

    Article  Google Scholar 

  20. Kao Y, Chen Y (2016) Data-locality-aware mapreduce real-time scheduling framework. J Syst Softw 112:65–77. doi:10.1016/j.jss.2015.11.001

    Article  Google Scholar 

  21. Kim Y, Araragi T, Nakamura J, Masuzawa T (2015) A distributed and cooperative NameNode cluster for a highly-available Hadoop distributed file system. IEICE Transactions on Information & Systems 98-D(4):835–851. doi:10.1587/transinf.2014EDP7258

    Article  Google Scholar 

  22. Kurazumi S, Tsumura T, Saito S, Matsuo H (2012) Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce. Proceedings of International Conference on Networking and Computing:288–292. doi:10.1109/icnc.2012.53

  23. Landset S, Khoshgoftaar TM, Richter AN, Hasanin T (2015) A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data 2(1):1–36. doi:10.1186/s40537–015–0032-1

    Article  Google Scholar 

  24. Li H, Wei X, Fu Q, Luo Y (2014) MapReduce delay scheduling with deadline constraint. Concurrency and Computation: Practice and Experience 26(3):766–778. doi:10.1002/cpe.3050

    Article  Google Scholar 

  25. Lin X, Meng Z, Xu C, Wang M (2012) A Practical Performance Model for Hadoop MapReduce. Proceedings of International Conference on Cluster Computing Workshops:231–239. doi:10.1109/clusterw.2012.24

  26. Ryu C, Lee D, Jang M, Kim C, Seo E (2013) Extensible Video Processing Framework in Apache Hadoop. Proceedings of International Conference on Cloud Computing Technology and Science:305–310. doi:10.1109/cloudcom.2013.153

  27. Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop Distributed File System. Proceedings of Symposium on Mass Storage Systems and Technologies:1–10. doi:10.1109/msst.2010.5496972

  28. Tan J, Meng X, Zhang L (2012) Performance analysis of Coupling Scheduler for MapReduce/Hadoop. Proceedings of the IEEE INFOCOM: 2586–2590. doi:10.1109/infcom.2012.6195658

  29. Tang Z, Zhou J, Li K, Li R (2013) A MapReduce task scheduling algorithm for deadline constraints. Clust Comput 16(4):651–662. doi:10.1007/s10586-012-0236-5

    Article  Google Scholar 

  30. Tian C, Zhou H, He Y, Zha L (2009) A Dynamic MapReduce Scheduler for Heterogeneous Workloads. Proceedings of International Conference on Grid and Cooperative Computing:218–224. doi:10.1109/gcc.2009.19

  31. Wang K, Wang Y, Yin B (2013) Deadline scheduling for MapReduce environment. Journal of Computational Information Systems 9(7):2819–2829

    Google Scholar 

  32. Yang S, Chen Y (2015) Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J Netw Comput Appl 57:61–70. doi:10.1016/j.jnca.2015.07.012

    Article  Google Scholar 

  33. Yazdanov L, Gorbunov M, Fetzer C (2015) EHadoop: Network I/O Aware Scheduler for Elastic MapReduce Cluster. Proceedings of International Conference on Cloud Computing:821–828. doi:10.1109/cloud.2015.113

  34. Zaharia M (2009) Job scheduling with the fair and capacity schedulers. Proceedings of Hadoop Summit

    Google Scholar 

  35. Zaharia M, Kowinski A, Joseph A, Katz R, Stoica I (2008) Improving MapReduce Performance in Heterogeneous Environments. Proceedings of USENIX Symposium on Operating Systems Design and Implementation:29–42

  36. Zaharia M, Borthankur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. Proceedings of European conference on Computer systems:265–278. doi:10.1145/1755913.1755940

Download references

Acknowledgment

This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2016-H8501-16-1013) supervised by the IITP (Institute for Information & communication Technology Promotion), by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A3A01015962), by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2016R1A2B3007527), and by the Support Program for Establishment of a National Scientific Data Governance System (K-16-L03-C01-S02) of Korea Institute of Science and Technology Information.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaesoo Yoo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bok, K., Hwang, J., Lim, J. et al. An efficient MapReduce scheduling scheme for processing large multimedia data. Multimed Tools Appl 76, 17273–17296 (2017). https://doi.org/10.1007/s11042-016-4026-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4026-6

Keywords

Navigation