Skip to main content
Log in

Replica-aware task scheduling and load balanced cache placement for delay reduction in multi-cloud environment

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the development of content-sharing and collaborative computing services such as online social networks, scientific workflow, there are huge amounts of data generated. To process this tremendous amount of data, multi-cloud system that integrates multiple clouds together to provide a unified service in a collaborative manner has been introduced. However, task scheduling in such heterogeneous multi-cloud environment is very challenging. To reduce response delay caused by cross-data centers file access, we proposed a replica-aware task scheduling algorithm based on data replication. For speeding up data access in multi-cloud cooperative caches, we presented a load balanced cache placement algorithm based on Bayesian networks. In our scheduling algorithm, combined transferring computation with transferring data, resource matching is accomplished according to node locality. Only non-local unassigned and failed map tasks’ input data are replicated and transferred in advance to target nodes to expedite task execution. In our cache placement method, based on Bayesian networks the next execute task is predicted. In accordance with caching profit and recycling cost, cache prefetching files are selected. For each prefetching file, according to load balancing, target placement node is determined. Extensive experimental results show that the performance of our proposed replica-aware task scheduling algorithm is better than benchmark scheduling algorithms in terms of node locality ratio and job response time, and our load balanced cache placement algorithm outperforms the baseline caching algorithms in performance of prefetching hit ratio and execution time saving ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Yang JY, Yang MQ, Zhu MM et al (2008) Promoting synergistic research and education in genomics and bioinformatics. BMC Genom 9(1):I1

    Article  MathSciNet  Google Scholar 

  2. Yang MQ, Athey BD, Arabnia HR et al (2009) High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genom 10(1):I1

    Article  Google Scholar 

  3. Arabnia HR, Taha TR (1998) A parallel numerical algorithm on a reconfigurable multi-ring network. Telecommun Syst 10(1–2):185–202

    Article  Google Scholar 

  4. Ehandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Trans Parallel Distrib Syst 8(3):292–309

    Article  Google Scholar 

  5. Chaudhary R, Aujla GS, Kumar N et al (2018) Optimized big data management across multi-cloud data centers: software-defined-network-based analysis. IEEE Commun Mag 56(2):118–126

    Article  Google Scholar 

  6. Nikolaou S, Van Renesse R, Schiper N (2016) Proactive cache placement on cooperative client caches for online social networks. IEEE Trans Parallel Distrib Syst 27(4):1174–1186

    Article  Google Scholar 

  7. Motavaselalhagh F, Esfahani FS, Arabnia HR (2015) Knowledge-based adaptable scheduler for SaaS providers in cloud computing. Hum Centric Comput Inf Sci 5(1):16

    Article  Google Scholar 

  8. Tang Z, Liu M, Ammar A et al (2016) An optimized MapReduce workflow scheduling algorithm for heterogeneous computing. J Supercomput 72(6):2059–2079

    Article  Google Scholar 

  9. Cai X, Li F, Li P et al (2017) SLA-aware energy-efficient scheduling scheme for Hadoop YARN. J Supercomput 73(8):3526–3546

    Article  Google Scholar 

  10. Hashem IAT, Anuar NB, Marjani M et al (2018) Multi-objective scheduling of MapReduce jobs in big data processing. Multimed Tools Appl 77(8):9979–9994

    Article  Google Scholar 

  11. Li C, Zhu L, Liu Y et al (2017) Resource scheduling approach for multimedia cloud content management. J Supercomput 73(12):5150–5172

    Article  Google Scholar 

  12. Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219

    Article  Google Scholar 

  13. Nguyen MC et al (2017) Prefetching-based metadata management in Advanced Multitenant Hadoop. J Supercomput 2017(2):1–21

    MathSciNet  Google Scholar 

  14. Xie Q, Pundir M, Lu Y et al (2017) Pandas: robust locality-aware scheduling with stochastic delay optimality. IEEE/ACM Trans Netw (TON) 25(2):662–675

    Article  Google Scholar 

  15. Naik NS, Negi A, Tapas Bapu BR et al (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434

    Article  Google Scholar 

  16. Kaur K, Kumar N, Garg S et al (2018) EnLoc: data locality-aware energy-efficient scheduling scheme for cloud data centers. In: 2018 IEEE International Conference on Communications (ICC). IEEE, pp 1–6

  17. Convolbo MW et al (2018) GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers. Computing 100(1):21–46

    Article  MathSciNet  MATH  Google Scholar 

  18. Sahoo J, Salahuddin MA, Glitho R et al (2016) A survey on replica server placement algorithms for content delivery networks. IEEE Commun Surv Tutor 19(2):1002–1026

    Article  Google Scholar 

  19. Chae SH, Quek TQS, Choi W (2017) Content placement for wireless cooperative caching helpers: a tradeoff between cooperative gain and content diversity gain. IEEE Trans Wirel Commun 16(10):6795–6807

    Article  Google Scholar 

  20. Chae SH, Choi W (2016) Caching placement in stochastic wireless caching helper networks: channel selection diversity via caching. IEEE Trans Wirel Commun 15(10):6626–6637

    Article  Google Scholar 

  21. Li C, Toni L, Zou J et al (2018) QoE-driven mobile edge caching placement for adaptive video streaming. IEEE Trans Multimed 20:965–984

    Article  Google Scholar 

  22. Song J, Song H, Choi W (2017) Optimal content placement for wireless femto-caching network. IEEE Trans Wirel Commun 16(7):4433–4444

    Article  Google Scholar 

  23. Liu J, Bai B, Zhang J et al (2017) Cache placement in Fog-RANs: from centralized to distributed algorithms. IEEE Trans Wirel Commun 16(11):7039–7051

    Article  Google Scholar 

  24. Sung J, Kim M, Lim K et al (2016) Efficient cache placement strategy in two-tier wireless content delivery network. IEEE Trans Multimed 18(6):1163–1174

    Article  Google Scholar 

  25. Poularakis K, Tassiulas L (2016) On the complexity of optimal content placement in hierarchical caching networks. IEEE Trans Commun 64(5):2092–2103

    Article  Google Scholar 

  26. Kovács J, Kacsuk P (2018) Occopus: a multi-cloud orchestrator to deploy and manage complex scientific infrastructures. J Grid Comput 16(1):19–37

    Article  Google Scholar 

  27. Moreno-Vozmediano R, Montero RS, Huedo E et al (2018) Orchestrating the deployment of high availability services on multi-zone and multi-cloud scenarios. J Grid Comput 16(1):39–53

    Article  Google Scholar 

  28. Guerrero C, Lera I, Juiz C (2018) Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications. J Supercomput 74(7):1–28

    Article  Google Scholar 

  29. Bruno R, Costa F, Ferreira P (2017) freeCycles-efficient multi-cloud computing platform. J Grid Comput 15(4):501–526

    Article  Google Scholar 

  30. Panda SK, Gupta I, Jana PK (2017) Task scheduling algorithms for multi-cloud systems: allocation-aware approach. Inf Syst Front 1–19

  31. Panda SK, Jana PK (2017) SLA-based task scheduling algorithms for heterogeneous multi-cloud environment. J Supercomput 73(6):2730–2762

    Article  Google Scholar 

  32. Thirumalaiselvan C, Venkatachalam V (2017) A strategic performance of virtual task scheduling in multi cloud environment. Clust Comput. https://doi.org/10.1007/s10586-017-1268-7

    Google Scholar 

  33. Kang S, Veeravalli B, Aung KMM (2018) Dynamic scheduling strategy with efficient node availability prediction for handling divisible loads in multi-cloud systems. J Parallel Distrib Comput 113:1–16

    Article  Google Scholar 

  34. Kavulya S, Tan J, Gandhi R et al (2010) An analysis of traces from a production MapReduce cluster. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 94–103

  35. Fair Scheduler. https://issues.apache.org/jira/browse/HADOOP-3746. Accessed 17 Feb 2016

  36. Abad CL, Lu Y, Campbell RH (2011) DARE: adaptive data replication for efficient cluster scheduling. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 159–168

  37. Chen Y, Ganapathi A, Griffith R et al (2011) The case for evaluating MapReduce performance using workload suites. In: IEEE 19th International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2011). IEEE, pp 390–399

  38. Arlitt M, Cherkasova L, Dilley J, Friedrich R, Jin T (2000) Evaluating content management techniques for Web proxy caches. ACM SIGMETRICS Perform Eval Rev 27(4):3–11

    Article  Google Scholar 

  39. Kim E, Liu JCL (2017) An integrated prefetching/caching scheme in multimedia servers. J Netw Comput Appl 88:1–21

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation (NSF) under Grants (Nos. 61672397, 61873341, 61472294, 61771354), Application Foundation Frontier Project of WuHan (No. 2018010401011290), the Young Teachers’ Scientific Research Ability Promotion Project of Huanghuai University (No. 2017LX09), Beijing Intelligent Logistics System Collaborative Innovation Center Open Project (No. BILSCIC-2018KF-02), Key Laboratory of Agricultural Remote Sensing [2017002], Beijing Youth Top-notch Talent Plan of High-Creation Plan (No. 2017000026833ZK25), Canal Plan-Leading Talent Project of Beijing Tongzhou District (No. YHLB2017038), and Beijing Key Laboratory of Intelligent Logistics System (No. BZ0211). Any opinions, findings, and conclusions are those of the authors and do not necessarily reflect the views of the above agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunlin Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Zhang, J. & Tang, H. Replica-aware task scheduling and load balanced cache placement for delay reduction in multi-cloud environment. J Supercomput 75, 2805–2836 (2019). https://doi.org/10.1007/s11227-018-2695-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2695-9

Keywords

Navigation