Skip to main content
Log in

Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Foster I, Kesselman C. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2004

    Google Scholar 

  2. Foster I, Kesselman C, Tuecke S. The anatomy of the grid: enabling scalable virtual organizations. International Journal of High Performance Computing Applications, 2001, 15: 200–222

    Article  Google Scholar 

  3. Balasangameshwara J, Raju N. A hybrid policy for fault tolerant load balancing in grid computing environments. Journal of Network and Computer Applications, 2012, 35: 412–422

    Article  Google Scholar 

  4. Li K, Tong Z, Liu D, Azghi T T, Liao X. A PTS-PGATS based approach for data-intensive scheduling in data grids. Frontiers of Computer Science in China, 2011, 5(4): 513–525

    Article  MathSciNet  Google Scholar 

  5. Jianjin J, Guangwen Y. An optimal replication strategy for data grid systems. Frontiers of Computer Science in China, 2007, 1(3): 338–348

    Article  Google Scholar 

  6. Amjad T, Sher M, Daud A. A survey of dynamic replication strategies for improving data availability in data grids. Future Generation Computer Systems, 2012, 28: 337–349

    Article  Google Scholar 

  7. Bsoul M, Khasawneh A, Abdallah E, Kilani Y. Enhanced fast spread replication strategy for data grid. Journal of Network and Computer Applications, 2011, 34: 575–580

    Article  Google Scholar 

  8. Muthuvelua N, Vecchiola C, Chai I, Chikkannan E, Buyya R. Task granularity policies for deploying bag-of-task applications on global grids. Future Generation Computer Systems, 2013, 29: 170–181

    Article  Google Scholar 

  9. Mansouri N, Dastghaibyfard G H. Job scheduling and dynamic data replication in data grid environment. Journal of Supercomputing, 2013, 64: 204–225

    Article  Google Scholar 

  10. Zhang J, Lee B S, Tang X, Yeo C K. A model to predict the optimal performance of the hierarchical data grid. Future Generation Computer Systems, 2010, 26: 1–11

    Article  Google Scholar 

  11. Kolodziej J, Khan A U, Xhafa F. Genetic algorithms for energy-aware scheduling in computational grids. In: Proceedings of the 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC). 2011, 17–24

    Google Scholar 

  12. BIRN. http://www.nbirn.net/

  13. LHC accelerator project. http://www-td.fnal.gov/LHC/USLHC.html

  14. Cameron D, Casey J, Guy L, Kunszt P, Lemaitre S, McCance G, Stockinger H, Stockinger K, Andronico G, Bell W, Ben-Akiva I, Bosio D, Chytracek R, Domenici A, Donno F, Hoschek W, Laure E, Lucio L, Millar P, Salconi L, Segal B, Silander M. Replica management in the European Data Grid Project. Journal of Grid Computer, 2004, 2(4): 341–351

    Article  Google Scholar 

  15. EU Data Grid project. http://www.eu-egee.org/

  16. IVOA. http://www.ivoa.net/pub/info/

  17. PPDG. http://www.ppdg.net

  18. GriPhyN: the Grid physics network project. http://www.griphyn.org

  19. CERN. Compact Muon Solenoid (CMS). http://public.web.cern.ch/public/en/lhc/CMS-en.htmlS; 2011

  20. Holtman K. CMS Data Grid System over view and requirements. The Compact Muon Solenoid (CMS) Experiment Note 2001/037. 2001

    Google Scholar 

  21. Holtman K. a model of a virtual data grid application. Lecture Notes in Computer Science, 2001, 2110: 711–720

    Article  Google Scholar 

  22. McClatchey R, Anjum A, Stockinger H, Ali A, Willers I, Thomas M. Data Intensive and Network Aware (DIANA) grid scheduling. Journal of Grid Computing, 2007, 5: 43–64

    Article  Google Scholar 

  23. Dang N N, Lim S B. Combination of replication and scheduling in data grid. International Journal of Computer Science and Network Security, 2007, 7(3): 304–308

    Google Scholar 

  24. Liu C, Baskiyar S. A scalable grid scheduler for real-time applications. International Journal of Computers and Their Applications, 2009, 16(1): 34–42

    Google Scholar 

  25. Chang R S, Chen P H. Complete and fragmented replica selection and retrieval in data grids. Future Generation Computer Systems, 2007, 23: 536–546

    Article  MathSciNet  Google Scholar 

  26. Mansouri N, Dastghaibyfard G H, Mansouri E. Combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications, 2013, 36: 711–722

    Article  Google Scholar 

  27. Song H J, Liu J, Jakobsen D, Zhang X, Taura K, Chien A. The MicroGrid: a scientific tool for modeling computational grids. Scientifics Programming, 2000, 8(3): 127–141

    Google Scholar 

  28. Takefusa A, Matsuoka S, Nakada H, Aida K, Nagashima U. Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 1999, 97–104

    Google Scholar 

  29. Casanova H. SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001, 430–437

    Chapter  Google Scholar 

  30. Buyya R, Murshed M. GridSim: a toolkit for modeling and simulation of distributed resource management and scheduling for grid computing. The Journal of Concurrency and Computation: Practice and Experience, 2002, 14: 1175–1200

    MATH  Google Scholar 

  31. Bell W H, Cameron D G, Capozza L, Millar A P, Stockinger K, Zini F. Optorsim: a grid simulator for studying dynamic data replication strategies. International Journal of High Performance Computing Applications, 2003, 17(4): 1–20

    Article  Google Scholar 

  32. Ranganathan K, Foster I. Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the 2nd International Workshop on Grid Computing, 2001, 75–86

    Google Scholar 

  33. Park S M, Kim J H, Go Y B, Yoon W S. Dynamic grid replication strategy based on internet hierarchy. Lecture Note in Computer Science, 2003, 1001: 1324–1331

    Google Scholar 

  34. Sashi K, Thanamani A. Dynamic replication in a data grid using a modified BHR region based algorithm. Future Generation Computer Systems, 2011, 27(2): 202–210

    Article  Google Scholar 

  35. Horri A, Sepahvand R, Dastghaibyfard G H. A hierarchical scheduling and replication strategy. International Journal of Computer Science and Network Security, 2008, 8(8): 30–35

    Google Scholar 

  36. Chang R, Chang J, Lin S. Job scheduling and data replication on data grids. Future Generation Computer Systems, 2007, 23(7): 846–860

    Article  Google Scholar 

  37. Mansouri N, Dastghaibyfard G H. A dynamic replica management strategy in data grid. Journal of Network and Computer Applications, 2012, 35(4): 1297–1303

    Article  Google Scholar 

  38. Tang M, Lee B S, Yao C K, Tang X Y. Dynamic replication algorithm for the multi-tier Data Grid. Future Generation Computer Systems, 2005, 21(5): 775–790

    Article  Google Scholar 

  39. Shorfuzzaman M, Graham P, Eskicioglu R. Adaptive popularity-driven replica placement in hierarchical data grids. The Journal of Supercomputing, 2010, 51: 374–392

    Article  Google Scholar 

  40. Abdullah A, Othman M, Ibrahim H, Sulaiman M N, Othman A T. Decentralized replication strategies for P2P based scientific data grid. In: Proceedings of the 2008 International Symposium on Information Technology. 2008, 3: 1–8

    Google Scholar 

  41. Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T. Dynamic QoS-aware data replication in grid environments based on data “importance”. Future Generation Computer Systems, 2012, 28(3): 544–553

    Article  Google Scholar 

  42. Shorfuzzaman M, Rasit Eskicioglu P G, QoS-aware distributed replica placement in hierarchical data grids. In: Proceedings of the 2011 International Conference on Advanced Information Networking and Applications. 2011: 291–299

    Chapter  Google Scholar 

  43. Taheri J, Lee Y C, Zomaya A Y, Siegel H J. A bee colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Computers & Operations Research, 2012 (in press) doi:10.1016/j.cor.2011.11.012

    Google Scholar 

  44. Zhang J, Lee B, Tang X, Yeo C. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing. 2008, 102–109

    Google Scholar 

  45. Tang M, Lee B S, Tang X, Yeo C. The impact of data replication on job scheduling performance in the data grid. Future Generation Computer System, 2006, 22(3): 254–268

    Article  MATH  Google Scholar 

  46. Vazhkudai S. Enabling the co-allocation of grid data transfers. In: Proceedings of the 4th International Workshop on Grid Computing. 2003, 44–51

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Najme Mansouri.

Additional information

Najme Mansouri is currently a faculty of computer science at Shahid Bahonar University of Kerman, lran. She received her MS in software engineering at Department of Computer Science & Engineering, College of Electrical & Computer Engineering, Shiraz University, Iran. She received her BS (Honor Student) in computer science from Shahid Bahonar University of Kerman, Iran, in 2009. Her research interests include parallel processing, distributed systems, and grid computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mansouri, N. Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments. Front. Comput. Sci. 8, 391–408 (2014). https://doi.org/10.1007/s11704-014-3146-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-014-3146-2

Keywords

Navigation