An efficient MapReduce scheduling scheme for processing large multimedia data

Bok, Kyoungsoo; Hwang, Jaemin; Lim, Jongtae; Kim, Yeonwoo; Yoo, Jaesoo

doi:10.1007/s11042-016-4026-6

An efficient MapReduce scheduling scheme for processing large multimedia data

Published: 07 October 2016

Volume 76, pages 17273–17296, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Kyoungsoo Bok¹,
Jaemin Hwang¹,
Jongtae Lim¹,
Yeonwoo Kim¹ &
…
Jaesoo Yoo ORCID: orcid.org/0000-0001-9926-9947¹

329 Accesses
12 Citations
Explore all metrics

Abstract

In this paper, we propose a scheduling scheme to minimize the deadline miss of jobs to which deadlines are assigned when processing large multimedia data such as video and image in MapReduce frameworks. The proposed scheme checks the satisfaction of data locality to process assigned jobs within a time limit and considers whether I/O load and deadline requirement are satisfied. If jobs are run in a node with excessive I/O load, multimedia data from the replica node can be utilized to improve a job task processing speed. If available nodes are not found due to expected job completion time exceeding the deadline, the job tasks in nodes whose deadlines are available are paused temporarily to shorten the job completion time. In addition, speculative tasks and hot data block replication are employed to prevent the overall deadline miss ratio from increasing due to the repetition of job pauses whose deadlines are available for the purpose of processing urgent jobs quickly. The speculative task is a technique for assigning the same job to other nodes redundantly and for taking the result from the node that completes the job first and then cancelling the other jobs assigned previously. To verify the superiority of the proposed scheme, a performance evaluation is conducted by comparing it with the existing scheme. The performance evaluation result showed that the proposed scheme reduced completion time by 13.8 % and improved the deadline success ratio by 11 % compared with those of the existing scheme on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

A survey on the evolution of stream processing systems

Article Open access 22 November 2023

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

References

Alam A, Ahmed J (2014) Hadoop Architecture and Its Issues. Proceedings of International Conference on Computational Science and Computational Intelligence:288–291. doi:10.1109/csci.2014.140
Alham NK, Li M, Liu Y, Hammoud S (2011) A MapReduce-based distributed SVM algorithm for automatic image annotation. Computers & Mathematics with Applications 62(7):2801–2811. doi:10.1016/j.camwa.2011.07.046
Article MATH Google Scholar
Althebyan Q, Jararweh Y, Yaseen Q, AlQudah O, Al-Ayyoub M (2015) Evaluating map reduce tasks scheduling algorithms over cloud computing infrastructure. Concurrency and Computation: Practice and Experience 27(18):5686–5699. doi:10.1002/cpe.3595
Article Google Scholar
Apache™ Hadoop (2013) Fair Scheduler https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html
Asahara M, Nakadai S, Araki T (2012) LoadAtomizer: A locality and I/O load aware task scheduler for MapReduce. Proceedings of International Conference on Cloud Computing Technology and Science Proceedings:317–324. doi:10.1109/CloudCom.2012.6427572
Assunção MD, Calheiros RN, Bianchi S, Netto MAS, Buyya R (2015) Big data computing and clouds: trends and future directions. Journal of Parallel and Distributed Computing 79:3–15. doi:10.1016/j.jpdc.2014.08.003
Article Google Scholar
Azzedin F (2013) Towards a Scalable HDFS Architecture. Proceedings of International Conference on Collaboration Technologies and Systems:155–161. doi:10.1109/cts.2013.6567222
Chen CLP, Zhang C (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347. doi:10.1016/j.ins.2014.01.015
Article Google Scholar
Dai X, Bensaou B (2016) Scheduling for response time in Hadoop MapReduce. Proceedings of IEEE International Conference on Communications:1–6. doi:10.1109/icc.2016.7511252
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492
Article Google Scholar
Dittrich J, Quiani-Ruiz J (2012) Efficient Big Data Processing in Hadoop MapReduce. Proceedings of the VLDB Endowment 5(12):2014–2015. doi:10.14778/2367502.2367562
Article Google Scholar
Dong B, Zheng Q, Tian F, Chao K, Godwin N, Ma T, Xu H (2014) Performance models and dynamic characteristics analysis for HDFS write and read operations: a systematic view. J Syst Softw 93:132–151. doi:10.1016/j.jss.2014.02.038
Article Google Scholar
Dörre J, Apel S, Lengauer C (2015) Modeling and optimizing MapReduce programs. Concurrency and Computation: Practice and Experience 27(7):1734–1766. doi:10.1002/cpe.3333
Article Google Scholar
Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144. doi:10.1016/j.ijinfomgt.2014.10.007
Article Google Scholar
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. Proceedings of ACM Symposium on Operating Systems Principles:29–43. doi:10.1145/1165389.945450
Hare JS, Samangooei S, Lewis PH (2014) Practical scalable image analysis and indexing using Hadoop. Multimedia Tools and Applications 71(3):1215–1248. doi:10.1007/s11042-012-1256-0
Article Google Scholar
Hua X, Wu H, Li Z, Ren S (2014) Enhancing throughput of the Hadoop distributed file system for interaction-intensive tasks. Journal of Parallel and Distributed Computing 74(8):2770–2779. doi:10.1016/j.jpdc.2014.03.010
Article Google Scholar
Idris M, Hussain S, Ali M, Abdulali A, Siddiqi M. H, Kang B. H, Lee S (2015a) Context-aware scheduling in MapReduce: a compact review. Concurrency and Computation: Practice and Experience 27(17):5332–5349. doi:10.1002/cpe.3578
Article Google Scholar
Idris M, Hussain S, Ali M, Abdulali A, Siddiqi MH, Kang BH, Lee S (2015b) Context-aware scheduling in MapReduce: a compact review. Concurrency and Computation: Practice and Experience 27(17):5332–5349. doi:10.1002/cpe.3578
Article Google Scholar
Kao Y, Chen Y (2016) Data-locality-aware mapreduce real-time scheduling framework. J Syst Softw 112:65–77. doi:10.1016/j.jss.2015.11.001
Article Google Scholar
Kim Y, Araragi T, Nakamura J, Masuzawa T (2015) A distributed and cooperative NameNode cluster for a highly-available Hadoop distributed file system. IEICE Transactions on Information & Systems 98-D(4):835–851. doi:10.1587/transinf.2014EDP7258
Article Google Scholar
Kurazumi S, Tsumura T, Saito S, Matsuo H (2012) Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce. Proceedings of International Conference on Networking and Computing:288–292. doi:10.1109/icnc.2012.53
Landset S, Khoshgoftaar TM, Richter AN, Hasanin T (2015) A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data 2(1):1–36. doi:10.1186/s40537–015–0032-1
Article Google Scholar
Li H, Wei X, Fu Q, Luo Y (2014) MapReduce delay scheduling with deadline constraint. Concurrency and Computation: Practice and Experience 26(3):766–778. doi:10.1002/cpe.3050
Article Google Scholar
Lin X, Meng Z, Xu C, Wang M (2012) A Practical Performance Model for Hadoop MapReduce. Proceedings of International Conference on Cluster Computing Workshops:231–239. doi:10.1109/clusterw.2012.24
Ryu C, Lee D, Jang M, Kim C, Seo E (2013) Extensible Video Processing Framework in Apache Hadoop. Proceedings of International Conference on Cloud Computing Technology and Science:305–310. doi:10.1109/cloudcom.2013.153
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop Distributed File System. Proceedings of Symposium on Mass Storage Systems and Technologies:1–10. doi:10.1109/msst.2010.5496972
Tan J, Meng X, Zhang L (2012) Performance analysis of Coupling Scheduler for MapReduce/Hadoop. Proceedings of the IEEE INFOCOM: 2586–2590. doi:10.1109/infcom.2012.6195658
Tang Z, Zhou J, Li K, Li R (2013) A MapReduce task scheduling algorithm for deadline constraints. Clust Comput 16(4):651–662. doi:10.1007/s10586-012-0236-5
Article Google Scholar
Tian C, Zhou H, He Y, Zha L (2009) A Dynamic MapReduce Scheduler for Heterogeneous Workloads. Proceedings of International Conference on Grid and Cooperative Computing:218–224. doi:10.1109/gcc.2009.19
Wang K, Wang Y, Yin B (2013) Deadline scheduling for MapReduce environment. Journal of Computational Information Systems 9(7):2819–2829
Google Scholar
Yang S, Chen Y (2015) Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J Netw Comput Appl 57:61–70. doi:10.1016/j.jnca.2015.07.012
Article Google Scholar
Yazdanov L, Gorbunov M, Fetzer C (2015) EHadoop: Network I/O Aware Scheduler for Elastic MapReduce Cluster. Proceedings of International Conference on Cloud Computing:821–828. doi:10.1109/cloud.2015.113
Zaharia M (2009) Job scheduling with the fair and capacity schedulers. Proceedings of Hadoop Summit
Google Scholar
Zaharia M, Kowinski A, Joseph A, Katz R, Stoica I (2008) Improving MapReduce Performance in Heterogeneous Environments. Proceedings of USENIX Symposium on Operating Systems Design and Implementation:29–42
Zaharia M, Borthankur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. Proceedings of European conference on Computer systems:265–278. doi:10.1145/1755913.1755940

Download references

Acknowledgment

This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2016-H8501-16-1013) supervised by the IITP (Institute for Information & communication Technology Promotion), by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2015R1D1A3A01015962), by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2016R1A2B3007527), and by the Support Program for Establishment of a National Scientific Data Governance System (K-16-L03-C01-S02) of Korea Institute of Science and Technology Information.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Chungbuk National University, Chungdae-ro 1, Seowon-Gu, Cheongju, Chungbuk, 28644, South Korea
Kyoungsoo Bok, Jaemin Hwang, Jongtae Lim, Yeonwoo Kim & Jaesoo Yoo

Authors

Kyoungsoo Bok
View author publications
You can also search for this author in PubMed Google Scholar
Jaemin Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Jongtae Lim
View author publications
You can also search for this author in PubMed Google Scholar
Yeonwoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jaesoo Yoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaesoo Yoo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bok, K., Hwang, J., Lim, J. et al. An efficient MapReduce scheduling scheme for processing large multimedia data. Multimed Tools Appl 76, 17273–17296 (2017). https://doi.org/10.1007/s11042-016-4026-6

Download citation

Received: 11 March 2016
Revised: 10 August 2016
Accepted: 29 September 2016
Published: 07 October 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11042-016-4026-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient MapReduce scheduling scheme for processing large multimedia data

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

A survey on the evolution of stream processing systems

Performance improvement of the triangular matrix product in commodity clusters

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient MapReduce scheduling scheme for processing large multimedia data

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

A survey on the evolution of stream processing systems

Performance improvement of the triangular matrix product in commodity clusters

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation