Abstract
Due to the rapid growth of production and dissemination of big data from various sources, the speed of data processing must inevitably increase. In distributed big data processing systems such as cloud computing, the task scheduler is responsible for mapping a large set of various tasks to a set of possibly heterogeneous computing nodes in a way to raise resource efficiency and data locality and reduce makespan. Scheduling strategies that try to achieve these goals in one pass have lower performance than multi-pass strategies. To achieve higher performance, we propose MOTS (a hierarchical multi-objective task scheduling scheme) by first clustering tasks using the K-means algorithm alongside a load balancing equation to increase resource efficiency and then optimizing clusters to reduce makespan using evolutionary algorithms. The latter is achieved by using the state of physical machines and sending related consecutive tasks to a physical machine to eliminate data transfer. We have simulated and tested our scheme in Cloudsim. Our experiments show reduction of approximately 10% makespan and 4% higher CPU efficiency compared to Mai’s reinforcement learning approach and Bugerya’s parallel implementation method. The cost of data transfer between consecutive tasks is also decreased by 10% compared to Bugerya’s methods. With respect to the results and the fact that our proposed task scheduling scheme is inspired by the iHadoop method for parallel implementation, it is suitable for use in distributed big data processing systems. Information about previous executions of tasks and current status of computing nodes is highly influential in efficient mapping of tasks to computing nodes. Predictions of future resource needs of tasks and available capacities of computing nodes can complement the historical information in the way of finding a more near-to-optimal mapping, resulting in faster data processing. This issue and evaluation of our proposed scheme using real data will be pursued in the future.










Similar content being viewed by others
References
Singh T, Srivastava DK, Aggarwal A (2017) A novel approach for CPU utilization on a multicore paradigm using parallel quicksort. In: IEEE International Conference on "Computational Intelligence and Communication Technology. pp. 1–6
Gao Ch, Ma J, Shen Y, Li T, Li F, Gao Y (2019) Cloud computing task scheduling based on improved differential evolutionary. IEEE Int Conf Netw Network Appl. https://doi.org/10.1109/NaNA.2019.00084
Jena RK (2015) Multi objective task scheduling in cloud environment using nested PSO framework. Proc Comput Sci 57:1219–1227. https://doi.org/10.1016/j.procs.2015.07.419
Arunarani A, Manjula D, Sugumaran V (2019) Task scheduling techniques in cloud computing: a literature survey. J Fut Gener Comput Systs 91:407–415
Elnikety E, Elsayed T, Ramadan HE (2011) iHadoop: asynchronous iterations for MapReduce. In: Third IEEE International Conference on Cloud Computing Technology and Science. pp. 81–90
L. Mai, N. Dao, M. Park (2018) Real-time task assignment approach leveraging reinforcement learning with evolutionary strategies for long-term latency minimization in fog computing. J Sensors. pp. 1–19.
Bugerya AB, Kim ES, Solovev MA (2019) Parallelization of ımplementations of purely sequential algorithms. J Program Comput Softw 7:381–389
Tian Q, Li J, Xue D, Wu W, Wang J, Chen L, Wang J (2020) A hybrid task scheduling algorithm based on task clustering J. Mobile Netw Appl. https://doi.org/10.1007/s11036-019-01356-xpp.1-10
Abuallgah L, Diabat A (2020) A novel hybrid AntLion optimization algorithm for multi-objective task J. Clust Comput. https://doi.org/10.1007/s10586-020-03075-5,pp.1-19
Narayanan D, Santhanam K, Kazhamiaka F, Phanishayee A, Zaharia M (2020) Heterogeneity-aware cluster scheduling policies for deep learning workloads. In: http://arxiv.org/abs/2008.09213v1 pp. 1–19
Azumah KK, Kosta S, Sorensen LT (2018) Scheduling in the hybrid cloud constrained by process mining. In: IEEE International Conference on Cloud Computing Technology and Science (CloudCom)
Azumah KK, Sorensen LT, Montella R, Kosta S (2020) Process mining-constrained scheduling in the hybrid cloud. J WILEY. https://doi.org/10.1002/cpe.6025,pp.1-20
Jafar RA (2015) Best-worst multi-criteria decision-making method. Omega 53:49–57. https://doi.org/10.1016/j.omega.2014.11.009
Ablhubaishy A, Aljuhani A (2020) The best-worst method for resource allocation and task scheduling in cloud computing. J IEEE Xplore. 978–1–7281–4213–5/20, pp. 1–6
Ullah I, Youn HY (2020) Task classification and scheduling based on K-Means clustring for edge computing. J Wireless Personal Commun. https://doi.org/10.1007/s11277-020-07343-w
Suresh S, Mani V, Omkar SN, Kim HJ (2006) Divisible load scheduling in distributed system with buffer constraints: genetic algorithm and linear programming approach. Int J Parallel Emerg Distrib Syst 21(5):303–321
Velliangiri S, Karthikeyan P, Arul Xavier VM, Baswaraj D (2021) Hybrid electro search with genetic algorithm for task scheduling in cloud computing. Ain Shams Eng J 12(1):631–639. https://doi.org/10.1016/j.asej.2020.07.003
Motlagh AA, Movaghar A, Rahmani AM (2019) Task scheduling mechanisms in cloud computing: a systematic review. J WILEY. https://doi.org/10.1002/dac.4302,pp.1-23
Silva EC, Gabriel PHR (2020) A comprehensive review of evolutionary algorithms for multiprocessor dag scheduling. J Comput 26:1–16
Ggasemnezhad SMK, Rahmani AAH, Saemi B, Babazadeh M, Sangaiah AK, Bian G (2019) An enhancement of task scheduling in cloud computing based on imperialist competitive algorithm and firefly algorithm. J Supercomput. https://doi.org/10.1007/s11227-019-02816-7
Sharma P, Shilakari S, Chourasia U, Dixit P, Pandey A (2020) A survey on various types of task scheduling algorithm in cloud computing environment. Int J Sci Technol Res 1:1513–1521
Yin S, Bao J, Li J, Zhang J (2019) Real-time task processing method based on edge computing. J Front Mech Eng. https://doi.org/10.1007/s11465-019-0542-1,no.3,pp.320-331
Utrera G, Farreras M, Fornes J (2019) Task packing: efficient task scheduling in unbalanced parallel programs to maximize CPU utilization. J Parallel Distributed Comput 134:37–49
Bulchandani N, Chourasia U, Agrawal S, Dixit P, Pandey A (2020) A survey on task scheduling algorithms ın cloud. Int J Sci Technol Res 1:460–464
Liang B, Dong X, Wang Y, Zhang X (2020) A low-power task scheduling algorithm for heterogeneous cloud computing. J Supercomput. https://doi.org/10.1007/s11227-020-03163-8,pp.1-25
Aljarah I, Ludwig SA (2012) Parallel particle swarm optimization clustering algorithm based on mapreduce methodology. J IEEE, pp 1–8
Jalalian Z, Sharifi M (2017) Autonomous task scheduling for fast big data processing. In: TopHPC Conference, pp. 1–4, 2018
Wang S, Li Y, Pang S, Lu Q, Wang S, Zhao J (2020) A task scheduling strategy in edge-cloud collaborative scenario based on deadline. J Sci Program. https://doi.org/10.1155/2020/3967847pp 1–9
Tsai F, Huang C-H, Lin MH (2021) An optimal task assignment strategy in cloud-fog computing environment. J Appl Sci. https://doi.org/10.3390/app11041909
Singh H, Tyagi S, Kumar P (2021) Comparative analysis of various simulations tools used in a cloud environment for task-resource mapping. In: International Conference on Paradigms of Computing, Communication and Data Sciences, Algorithms for Intelligent Systems. https://doi.org/10.1007/978-981-15-7533-4_32
Rodriguez MA, Buyya R (2018) Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms. J Futuer Gener Comput Syst 79:739–750. https://doi.org/10.1016/j.future.2017.05.009
Bulaja D, Bozic K, Penevski N, Dzakula NB (2019) Introduction to Cloudsim. J Adv Comput Cloud Comput. https://doi.org/10.15308/Sinteza,pp.189-194
Acknowledgements
We thank the anonymous reviewers of the journal whose valuable comments helped us to make the revised version of paper stronger.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jalalian, Z., Sharifi, M. A hierarchical multi-objective task scheduling approach for fast big data processing. J Supercomput 78, 2307–2336 (2022). https://doi.org/10.1007/s11227-021-03960-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03960-9