Skip to main content
Log in

A PTS-PGATS based approach for data-intensive scheduling in data grids

  • Research Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

Grid computing is the combination of computer resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Foster I, Kesselman C, eds. The Grid: Blueprint for a New Computing Infrastructure. San Francisco: Morgan Kaufmann Publishers, 1999

    Google Scholar 

  2. Kenli L, Tianfang T, Feng W. Parallelization methods for implementation of discharge simulation along resin insulator surfaces. Computers & Electrical Engineering, 2011, 37(1): 30–40

    Article  Google Scholar 

  3. Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S. The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 2000, 23(3): 187–200

    Article  Google Scholar 

  4. Kim S, Weissman J B. A genetic algorithm based approach for scheduling decomposable data grid applications. In: Proceedings of the 2004 International Conference on Parallel Processing. 2004, 406–413

  5. Maheswaran M, Ali S, Sieel H J, Hensgen D, Freund R F. Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Proceedings of 8th Heterogeneous Computing Workshop. 1999, 30–44

  6. Tang X, Li K. A novel security-driven scheduling algorithm for precedence constrained tasks in heterogeneous distributed systems. IEEE Transactions on Computers, 2011, 60(7): 1017–1029

    Article  MathSciNet  Google Scholar 

  7. Dauzere-Peres S, Paulli J. An integrated approach for modeling and solving the general multiprocessor job-shop scheduling problem using tabu search. Annals of Operations Research, 1997, 70(0): 281–306

    Article  MATH  MathSciNet  Google Scholar 

  8. Abdelaziz AY, Mohamed FM, Mekhamer S F. Distribution system reconfiguration using a modified tabu search algorithm. Electric Power Systems Research, 2010, 80(8): 943–953

    Article  Google Scholar 

  9. Etminani K, Naghibzadeh M. A min-min max-min selective algorithm for grid task scheduling. In: Proceedings of 3rd IEEE/IFIP International Conference on Internet. 2007, 1–7

  10. Schwiegelshohn U, Tchernykh A, Yahyapour R. Online scheduling in grids. In Proc. of IEEE International Symposium on Parallel and Distributed Processing. Los Alamitos: IEEE Computer Society, 2008, 1–10

    Google Scholar 

  11. Noriguki F, Kenichi H. A comparison among grid scheduling algorithms for independent coarse-grained tasks. In: Proceedings of the 2004 International Symposium on Applications and the Internet Workshops. 2004, 674–680

  12. Casanova H, Legrand A, Zagorodnov D, Berman F. Heuristics for scheduling parameter sweep applications in grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop. 2000, 349–363

  13. Elghirani A, Subrata R, Zomaya A Y, Mazari A A. Performance enhancement through hybrid replication and genetic algorithm coscheduling in data grids. In: Proceedings of IEEE/ACS International Conference on Computer System and Applications. 2008, 436–443

  14. Elghirani A, Subrata R, Zomaya A Y. Intelligent scheduling and replication in data grids: a synergistic approach. In: Proceedings of 7th IEEE International Symposium on Cluster Computing and the Grid. 2007, 179–182

  15. Dang N N, Hwang S, Lim S B. Improvement of data grid’s performance by combining job scheduling with dynamic replication strategy. In: Proceedings of 6th International Conference on Grid and Cooperative Computing. 2007, 513–520

  16. Venugopal S, Buyya R. An scp-based heuristic approach for scheduling distributed data-intensive applications on global grids. Journal of Parallel and Distributed Computing, 2008, 68(4): 471–487

    Article  Google Scholar 

  17. Wang Zhixin and Ju Gang. A parallel genetic algorithm in multiobjective optimization. In: Proceedings of Control and Decision Conference. 2009, 3497–3501

  18. Guangyuan L, Jingjun Z, Ruizhen G, Yanmin S. An improved parallel adaptive genetic algorithm based on pareto front for multiobjective problems. In: Proceedings of 2nd International Symposium on Knowledge Acquisition and Modeling. 2009, 212–215

  19. Yi H, Yuhui Q, Guangyuan L, Kaiyou L. A parallel tabu search approach based on genetic crossover operation. In: Proceedings of 19th International Conference on Advanced Information Networking and Application. 2005, 467–470

  20. Czajkowski K, Fitzgerald S, Foster I, Kesselman C. Grid information services for distributed resource sharing. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing. 2001, 181–194

  21. Rajasekar A, Moore R, Wan M. Mysrb & srb: components of a data grid. In: Proceedings of 11th IEEE International Symposium on High Performance Distributed Computing. 2002, 301–310

  22. Wolski R, Spring N, Hayes J. The network weather services: a distributed resource performance forcasting service for metacomputing. Journal of Future Generation Computer Systems, 1999, 15(5–6): 757–768

    Article  Google Scholar 

  23. Kakarontzas G, Savvas I K. Agent-based resource discovery and selection for dynamic grids. In: Proceedings of 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises. 2006, 195–200

  24. Chapman C, Musolesi M, Emmerich W, Mascolo C. Predictive resource scheduling in computational grids. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium. 2007, 1–10

  25. Ranganathan K, Foster I. Decoupling computation and data scheduling in distributed data-intensive applications. In: Proceedings of 11th IEEE Symposium on High Performance Distributed Computing. 2002, 352–358

  26. Lee W, Mcgough S, Newhouse S, Darlington J. A standard based approach to job submission through web services. In: Proceedings of the UK e-Science All Hands Meeting. 2004, 901–905

  27. Srinivas M, Patnaik L M. Genetic algorithms: A survey. Computer, 1994, 27(4): 17–26

    Article  Google Scholar 

  28. Schengjun X, Shaoyong G, Dongling B. The analysis and research of parallel genetic algorithm. In: Proceedings of 4th International Conference on Wireless Communications, Networking and Mobile Computing. 2008, 1–4

  29. Zhang J, Lee B S, Tang X, Yeo C K. Impact of parallel download on job scheduling in data grid environment. In: Proceedings of 7th International Conference on Grid and Cooperative Computing. 2008, 102–109

  30. Buyya R, Murshed M. Gridsim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. Concurrency and Computation: Practice and Experience, 2002, 14(13–15): 1175–1220

    Article  MATH  Google Scholar 

  31. Li J, Pan Q, Liang Y. An effective hybrid tabu search algorithm for multi-objective flexible job-shop scheduling problems. Computers & Industrial Engineering, 2010, 59(4): 647–662

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenli Li.

Additional information

Kenli Li received his PhD in Computer Science from Huazhong University of Science and Technology, China, in 2003, his MSc in Mathematics from Central South University, China, in 2000, and BSc in Mathematics from the National University of Defense Technology, in 1995. He was a visiting scholar at University of Illinois at Champaign and Urbana from 2004 to 2005. Now He is a professor of Computer Science and Technology at Hunan University, a senior member of CCF. His major research includes parallel computing, grid and cloud computing, and DNA computing.

Zhao Tong received his MSc from Hunan Agricultural University, China, in 2010, and his BSc in Computer Science from Beijing Institute of Technology in 2007. He is currently a PhD candidate in Hunan University, China. His research interests include modeling and scheduling for parallel and distributed computing systems, parallel system reliability, and parallel algorithms.

Dan Liu received his BSc in accounting and MSc in Computer Science from Hunan University, China, in 2007 and 2010, respectively. His research interests focus on distributed computing, high performance computing, and grid computing.

Teklay Tesfazghi is a lecturer at MaiNefhi Institute of Technology, Eritrea. He received his BSc in Mathematics from the University of Asmara in 1996 and his MSc in Computer Engineering and Applications from Hunan University, China, in 2011. His research interests include distributed computing, parallel databases, and data mining.

Xiangke Liao received his BSc degree in Computer Science from Tsinghua University, China, in 1985, and his MSc in 1988 from the National University of Defense Technology, China, where he is now a professor and the dean of the School of Computers. His research interests include parallel and distributed computing, highperformance computer systems, operating systems, and networked embedded systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, K., Tong, Z., Liu, D. et al. A PTS-PGATS based approach for data-intensive scheduling in data grids. Front. Comput. Sci. China 5, 513–525 (2011). https://doi.org/10.1007/s11704-011-0970-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-011-0970-5

Keywords

Navigation