Abstract
Grid computing is the combination of computer resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.
Similar content being viewed by others
References
Foster I, Kesselman C, eds. The Grid: Blueprint for a New Computing Infrastructure. San Francisco: Morgan Kaufmann Publishers, 1999
Kenli L, Tianfang T, Feng W. Parallelization methods for implementation of discharge simulation along resin insulator surfaces. Computers & Electrical Engineering, 2011, 37(1): 30–40
Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S. The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 2000, 23(3): 187–200
Kim S, Weissman J B. A genetic algorithm based approach for scheduling decomposable data grid applications. In: Proceedings of the 2004 International Conference on Parallel Processing. 2004, 406–413
Maheswaran M, Ali S, Sieel H J, Hensgen D, Freund R F. Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Proceedings of 8th Heterogeneous Computing Workshop. 1999, 30–44
Tang X, Li K. A novel security-driven scheduling algorithm for precedence constrained tasks in heterogeneous distributed systems. IEEE Transactions on Computers, 2011, 60(7): 1017–1029
Dauzere-Peres S, Paulli J. An integrated approach for modeling and solving the general multiprocessor job-shop scheduling problem using tabu search. Annals of Operations Research, 1997, 70(0): 281–306
Abdelaziz AY, Mohamed FM, Mekhamer S F. Distribution system reconfiguration using a modified tabu search algorithm. Electric Power Systems Research, 2010, 80(8): 943–953
Etminani K, Naghibzadeh M. A min-min max-min selective algorithm for grid task scheduling. In: Proceedings of 3rd IEEE/IFIP International Conference on Internet. 2007, 1–7
Schwiegelshohn U, Tchernykh A, Yahyapour R. Online scheduling in grids. In Proc. of IEEE International Symposium on Parallel and Distributed Processing. Los Alamitos: IEEE Computer Society, 2008, 1–10
Noriguki F, Kenichi H. A comparison among grid scheduling algorithms for independent coarse-grained tasks. In: Proceedings of the 2004 International Symposium on Applications and the Internet Workshops. 2004, 674–680
Casanova H, Legrand A, Zagorodnov D, Berman F. Heuristics for scheduling parameter sweep applications in grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop. 2000, 349–363
Elghirani A, Subrata R, Zomaya A Y, Mazari A A. Performance enhancement through hybrid replication and genetic algorithm coscheduling in data grids. In: Proceedings of IEEE/ACS International Conference on Computer System and Applications. 2008, 436–443
Elghirani A, Subrata R, Zomaya A Y. Intelligent scheduling and replication in data grids: a synergistic approach. In: Proceedings of 7th IEEE International Symposium on Cluster Computing and the Grid. 2007, 179–182
Dang N N, Hwang S, Lim S B. Improvement of data grid’s performance by combining job scheduling with dynamic replication strategy. In: Proceedings of 6th International Conference on Grid and Cooperative Computing. 2007, 513–520
Venugopal S, Buyya R. An scp-based heuristic approach for scheduling distributed data-intensive applications on global grids. Journal of Parallel and Distributed Computing, 2008, 68(4): 471–487
Wang Zhixin and Ju Gang. A parallel genetic algorithm in multiobjective optimization. In: Proceedings of Control and Decision Conference. 2009, 3497–3501
Guangyuan L, Jingjun Z, Ruizhen G, Yanmin S. An improved parallel adaptive genetic algorithm based on pareto front for multiobjective problems. In: Proceedings of 2nd International Symposium on Knowledge Acquisition and Modeling. 2009, 212–215
Yi H, Yuhui Q, Guangyuan L, Kaiyou L. A parallel tabu search approach based on genetic crossover operation. In: Proceedings of 19th International Conference on Advanced Information Networking and Application. 2005, 467–470
Czajkowski K, Fitzgerald S, Foster I, Kesselman C. Grid information services for distributed resource sharing. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing. 2001, 181–194
Rajasekar A, Moore R, Wan M. Mysrb & srb: components of a data grid. In: Proceedings of 11th IEEE International Symposium on High Performance Distributed Computing. 2002, 301–310
Wolski R, Spring N, Hayes J. The network weather services: a distributed resource performance forcasting service for metacomputing. Journal of Future Generation Computer Systems, 1999, 15(5–6): 757–768
Kakarontzas G, Savvas I K. Agent-based resource discovery and selection for dynamic grids. In: Proceedings of 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises. 2006, 195–200
Chapman C, Musolesi M, Emmerich W, Mascolo C. Predictive resource scheduling in computational grids. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium. 2007, 1–10
Ranganathan K, Foster I. Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of 11th IEEE Symposium on High Performance Distributed Computing. 2002, 352–358
Lee W, Mcgough S, Newhouse S, Darlington J. A standard based approach to job submission through web services. In: Proceedings of the UK e-Science All Hands Meeting. 2004, 901–905
Srinivas M, Patnaik L M. Genetic algorithms: A survey. Computer, 1994, 27(4): 17–26
Schengjun X, Shaoyong G, Dongling B. The analysis and research of parallel genetic algorithm. In: Proceedings of 4th International Conference on Wireless Communications, Networking and Mobile Computing. 2008, 1–4
Zhang J, Lee B S, Tang X, Yeo C K. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing. 2008, 102–109
Buyya R, Murshed M. Gridsim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurrency and Computation: Practice and Experience, 2002, 14(13–15): 1175–1220
Li J, Pan Q, Liang Y. An effective hybrid tabu search algorithm for multi-objective flexible job-shop scheduling problems. Computers & Industrial Engineering, 2010, 59(4): 647–662
Author information
Authors and Affiliations
Corresponding author
Additional information
Kenli Li received his PhD in Computer Science from Huazhong University of Science and Technology, China, in 2003, his MSc in Mathematics from Central South University, China, in 2000, and BSc in Mathematics from the National University of Defense Technology, in 1995. He was a visiting scholar at University of Illinois at Champaign and Urbana from 2004 to 2005. Now He is a professor of Computer Science and Technology at Hunan University, a senior member of CCF. His major research includes parallel computing, grid and cloud computing, and DNA computing.
Zhao Tong received his MSc from Hunan Agricultural University, China, in 2010, and his BSc in Computer Science from Beijing Institute of Technology in 2007. He is currently a PhD candidate in Hunan University, China. His research interests include modeling and scheduling for parallel and distributed computing systems, parallel system reliability, and parallel algorithms.
Dan Liu received his BSc in accounting and MSc in Computer Science from Hunan University, China, in 2007 and 2010, respectively. His research interests focus on distributed computing, high performance computing, and grid computing.
Teklay Tesfazghi is a lecturer at MaiNefhi Institute of Technology, Eritrea. He received his BSc in Mathematics from the University of Asmara in 1996 and his MSc in Computer Engineering and Applications from Hunan University, China, in 2011. His research interests include distributed computing, parallel databases, and data mining.
Xiangke Liao received his BSc degree in Computer Science from Tsinghua University, China, in 1985, and his MSc in 1988 from the National University of Defense Technology, China, where he is now a professor and the dean of the School of Computers. His research interests include parallel and distributed computing, highperformance computer systems, operating systems, and networked embedded systems.
Rights and permissions
About this article
Cite this article
Li, K., Tong, Z., Liu, D. et al. A PTS-PGATS based approach for data-intensive scheduling in data grids. Front. Comput. Sci. China 5, 513–525 (2011). https://doi.org/10.1007/s11704-011-0970-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-011-0970-5