Abstract
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.
Similar content being viewed by others
References
Foster I, Kesselman C. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 2004
Foster I, Kesselman C, Tuecke S. The anatomy of the grid: enabling scalable virtual organizations. International Journal of High Performance Computing Applications, 2001, 15: 200–222
Balasangameshwara J, Raju N. A hybrid policy for fault tolerant load balancing in grid computing environments. Journal of Network and Computer Applications, 2012, 35: 412–422
Li K, Tong Z, Liu D, Azghi T T, Liao X. A PTS-PGATS based approach for data-intensive scheduling in data grids. Frontiers of Computer Science in China, 2011, 5(4): 513–525
Jianjin J, Guangwen Y. An optimal replication strategy for data grid systems. Frontiers of Computer Science in China, 2007, 1(3): 338–348
Amjad T, Sher M, Daud A. A survey of dynamic replication strategies for improving data availability in data grids. Future Generation Computer Systems, 2012, 28: 337–349
Bsoul M, Khasawneh A, Abdallah E, Kilani Y. Enhanced fast spread replication strategy for data grid. Journal of Network and Computer Applications, 2011, 34: 575–580
Muthuvelua N, Vecchiola C, Chai I, Chikkannan E, Buyya R. Task granularity policies for deploying bag-of-task applications on global grids. Future Generation Computer Systems, 2013, 29: 170–181
Mansouri N, Dastghaibyfard G H. Job scheduling and dynamic data replication in data grid environment. Journal of Supercomputing, 2013, 64: 204–225
Zhang J, Lee B S, Tang X, Yeo C K. A model to predict the optimal performance of the hierarchical data grid. Future Generation Computer Systems, 2010, 26: 1–11
Kolodziej J, Khan A U, Xhafa F. Genetic algorithms for energy-aware scheduling in computational grids. In: Proceedings of the 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC). 2011, 17–24
BIRN. http://www.nbirn.net/
LHC accelerator project. http://www-td.fnal.gov/LHC/USLHC.html
Cameron D, Casey J, Guy L, Kunszt P, Lemaitre S, McCance G, Stockinger H, Stockinger K, Andronico G, Bell W, Ben-Akiva I, Bosio D, Chytracek R, Domenici A, Donno F, Hoschek W, Laure E, Lucio L, Millar P, Salconi L, Segal B, Silander M. Replica management in the European Data Grid Project. Journal of Grid Computer, 2004, 2(4): 341–351
EU Data Grid project. http://www.eu-egee.org/
PPDG. http://www.ppdg.net
GriPhyN: the Grid physics network project. http://www.griphyn.org
CERN. Compact Muon Solenoid (CMS). http://public.web.cern.ch/public/en/lhc/CMS-en.htmlS; 2011
Holtman K. CMS Data Grid System over view and requirements. The Compact Muon Solenoid (CMS) Experiment Note 2001/037. 2001
Holtman K. a model of a virtual data grid application. Lecture Notes in Computer Science, 2001, 2110: 711–720
McClatchey R, Anjum A, Stockinger H, Ali A, Willers I, Thomas M. Data Intensive and Network Aware (DIANA) grid scheduling. Journal of Grid Computing, 2007, 5: 43–64
Dang N N, Lim S B. Combination of replication and scheduling in data grid. International Journal of Computer Science and Network Security, 2007, 7(3): 304–308
Liu C, Baskiyar S. A scalable grid scheduler for real-time applications. International Journal of Computers and Their Applications, 2009, 16(1): 34–42
Chang R S, Chen P H. Complete and fragmented replica selection and retrieval in data grids. Future Generation Computer Systems, 2007, 23: 536–546
Mansouri N, Dastghaibyfard G H, Mansouri E. Combination of data replication and scheduling algorithm for improving data availability in data grids. Journal of Network and Computer Applications, 2013, 36: 711–722
Song H J, Liu J, Jakobsen D, Zhang X, Taura K, Chien A. The MicroGrid: a scientific tool for modeling computational grids. Scientifics Programming, 2000, 8(3): 127–141
Takefusa A, Matsuoka S, Nakada H, Aida K, Nagashima U. Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 1999, 97–104
Casanova H. SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001, 430–437
Buyya R, Murshed M. GridSim: a toolkit for modeling and simulation of distributed resource management and scheduling for grid computing. The Journal of Concurrency and Computation: Practice and Experience, 2002, 14: 1175–1200
Bell W H, Cameron D G, Capozza L, Millar A P, Stockinger K, Zini F. Optorsim: a grid simulator for studying dynamic data replication strategies. International Journal of High Performance Computing Applications, 2003, 17(4): 1–20
Ranganathan K, Foster I. Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the 2nd International Workshop on Grid Computing, 2001, 75–86
Park S M, Kim J H, Go Y B, Yoon W S. Dynamic grid replication strategy based on internet hierarchy. Lecture Note in Computer Science, 2003, 1001: 1324–1331
Sashi K, Thanamani A. Dynamic replication in a data grid using a modified BHR region based algorithm. Future Generation Computer Systems, 2011, 27(2): 202–210
Horri A, Sepahvand R, Dastghaibyfard G H. A hierarchical scheduling and replication strategy. International Journal of Computer Science and Network Security, 2008, 8(8): 30–35
Chang R, Chang J, Lin S. Job scheduling and data replication on data grids. Future Generation Computer Systems, 2007, 23(7): 846–860
Mansouri N, Dastghaibyfard G H. A dynamic replica management strategy in data grid. Journal of Network and Computer Applications, 2012, 35(4): 1297–1303
Tang M, Lee B S, Yao C K, Tang X Y. Dynamic replication algorithm for the multi-tier Data Grid. Future Generation Computer Systems, 2005, 21(5): 775–790
Shorfuzzaman M, Graham P, Eskicioglu R. Adaptive popularity-driven replica placement in hierarchical data grids. The Journal of Supercomputing, 2010, 51: 374–392
Abdullah A, Othman M, Ibrahim H, Sulaiman M N, Othman A T. Decentralized replication strategies for P2P based scientific data grid. In: Proceedings of the 2008 International Symposium on Information Technology. 2008, 3: 1–8
Andronikou V, Mamouras K, Tserpes K, Kyriazis D, Varvarigou T. Dynamic QoS-aware data replication in grid environments based on data “importance”. Future Generation Computer Systems, 2012, 28(3): 544–553
Shorfuzzaman M, Rasit Eskicioglu P G, QoS-aware distributed replica placement in hierarchical data grids. In: Proceedings of the 2011 International Conference on Advanced Information Networking and Applications. 2011: 291–299
Taheri J, Lee Y C, Zomaya A Y, Siegel H J. A bee colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Computers & Operations Research, 2012 (in press) doi:10.1016/j.cor.2011.11.012
Zhang J, Lee B, Tang X, Yeo C. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing. 2008, 102–109
Tang M, Lee B S, Tang X, Yeo C. The impact of data replication on job scheduling performance in the data grid. Future Generation Computer System, 2006, 22(3): 254–268
Vazhkudai S. Enabling the co-allocation of grid data transfers. In: Proceedings of the 4th International Workshop on Grid Computing. 2003, 44–51
Author information
Authors and Affiliations
Corresponding author
Additional information
Najme Mansouri is currently a faculty of computer science at Shahid Bahonar University of Kerman, lran. She received her MS in software engineering at Department of Computer Science & Engineering, College of Electrical & Computer Engineering, Shiraz University, Iran. She received her BS (Honor Student) in computer science from Shahid Bahonar University of Kerman, Iran, in 2009. Her research interests include parallel processing, distributed systems, and grid computing.
Rights and permissions
About this article
Cite this article
Mansouri, N. Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments. Front. Comput. Sci. 8, 391–408 (2014). https://doi.org/10.1007/s11704-014-3146-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-014-3146-2