Skip to main content
Log in

On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In heterogeneous environments, dynamic scheduling algorithms are a powerful tool towards performance improvement of scientific applications via load balancing. However, these scheduling techniques employ heuristics that require prior knowledge about workload via profiling resulting in higher overhead as problem sizes and number of processors increase. In addition, load imbalance may appear only at run-time, making profiling work tedious and sometimes even obsolete. Recently, the integration of dynamic loop scheduling algorithms into a number of scientific applications has been proven effective. This paper reports on performance improvements obtained by integrating the Adaptive Weighted Factoring, a recently proposed dynamic loop scheduling technique that addresses these concerns, into two scientific applications: computational field simulation on unstructured grids, and N-Body simulations. Reported experimental results confirm the benefits of using this methodology, and emphasize its high potential for future integration into other scientific applications that exhibit substantial performance degradation due to load imbalance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. C. Anderson, An implementation of the fast multipole method without multipoles, SIAM J. Sci. Stat. Comput. 13(4) (July 1992) 923-947.

    Google Scholar 

  2. A.W. Appel, An efficient program for many-body simulations, SIAM Journal of Computing 6(1985).

  3. I. Banicescu and S.F. Hummel, Balancing processor loads and exploiting data locality in N-Body simulations, in: Proceedings of Supercomputing '95 Conference (1995).

  4. I. Banicescu and R. Lu, Experiences with fractiling in N-Body simulations, in: Proceedings of High Performance Computing'98 Symposium (1998) pp. 121-126.

  5. I. Banicescu, P. Soni, S. Ghafoor and V. Velusamy, Effectiveness of adaptive weighted factoring in computational field simulation on unstructured grids, in: Proceedings of the High Performance Computing Symposium (HPC 2000) (2000) pp. 168-177.

  6. I. Banicescu and V. Velusamy, Performance of scheduling scientific applications with adaptive weighted factoring, in: Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2001)–;Heterogenous Computing Workshop (on CD-ROM) (April 2001).

  7. J. Barnes and P. Hutt, A hierarchical O(Nlog(N)) force calculation algorithm, Nature 324 (1986).

  8. R. Biswas, S. Das, D. Harvey and L. Oliker, Portable parallel programming for dynamic load balancing of unstructured grid applications, in: Proceedings of 13th International Parallel Processing Symposium (April 1999).

  9. R. Biswas and L. Oliker, Load balancing unstructured adaptive grids for CFD problems, in: Proc. 8th SIAM Conference on Parallel Processing for Scientific Computing (1997).

  10. J. Chen and V. Taylor, Mesh partitioning for distributed systems: Exploring optimal number of partitions with local and remote communication, in: Proceedings of 9th SIAM Conference on Parallel Processing for Scientific Computing (March 1999).

  11. R. Diekmann, D. Meyer and B. Monien, Parallel decomposition of unstructured FEM-meshes, Concurrency: Practice and Experience 10(1) (1995) 53-72.

    Google Scholar 

  12. A.Y. Grama, V. Kumar and A. Sameh, Scalable parallel formulations of Barnes-Hut method for N-Body simulations, in: Proceedings of Supercomputing'94 (November 1994) pp. 439-448.

  13. L. Greengard, The Rapid Evaluation of Potential Fields in Particle Systems (ACM Press, New York, 1987).

    Google Scholar 

  14. L. Greengard and W. Gropp, A parallel version of the fast multipole algorithm, Scientific Computing (1987) 213-222.

  15. B. Hendrickson and R. Leland, An improved spectral graph partitioning algorithm for mapping parallel computations, SIAM J. Sci. Comput. (1995) 452-469.

  16. Y. Hu and S.L. Johnsson, Implementing O(N) N-body algorithms ef-ficiently in parallel languages, Journal of Scientific Programming 5(4) (1996) 337-364.

    Google Scholar 

  17. S.F. Hummel, J. Schmidt, R.N. Uma and J. Wein, Load-sharing in heterogeneous systems via weighted factoring, in: Proceedings of Symposium on Parallel Algorithms and Architectures (1996) pp. 318-328.

  18. S.F. Hummel, E. Schonberg and L.E. Flynn, Factoring: A method for scheduling parallel loops, Communications of the ACM 35(8) (1992) 90-101.

    Google Scholar 

  19. C. Kruskal and A. Weiss, Allocating independent subtasks on parallel processors, IEEE Trans. Software Eng. SE-11(10) (1985) 1001-1016.

    Google Scholar 

  20. J.F. Leathrum Jr., Parallelization of the fast multipole algorithm: Algorithm and Architecture Design, Ph.D. thesis, Duke University, 1992.

  21. E. Luke, I. Banicescu and J. Li, The optimal effectiveness metric for parallel application analysis, Information Processing Letters 66(5) (June 1998) 223-229.

    Google Scholar 

  22. B. Maerten, D. Roose, A. Basermann and J. Fingberg, DRAMA: A library for parallel dynamic load balancing of finite element applications, in: Proceedings of 9th SIAMC onference on Parallel Processing for Scientific Computing (March 1999).

  23. E.P. Markatos and T.J. LeBlanc, Using processor affinity in loop scheduling on shared-memory multiprocessors, in: Proceedings of Supercomputing'92 (November 1992) pp. 104-113.

  24. B. Monien and R. Diekmann, A local graph partitioning heuristic meeting bisection bounds, in: 8th SIAM Conference on Parallel Processing for Scientific Computing (PP'97) (1997).

  25. L. Oliker and R. Biswas, Efficient load balancing and data remapping for adaptive grid calculations, in: Symposium on Parallel Algorithms and Architectures (SPAA'97) (1997) pp. 33-42.

  26. C. Polychronopoulos and D. Kuck, Guided self-scheduling: A practical scheduling scheme for parallel supercomputers, IEEE Trans. Comput. C-36(12) (1987) pp. 1425-1439.

    Google Scholar 

  27. J. Savage and M. Wloka, Parallelism in graph partitioning, J. Par. Dist. Comput. (1991) 257-272.

  28. K. Schloegel, G. Karypis and V. Kumar, Dynamic repartitioning of adaptively refined meshes, in: Proceedings of Supercomputing'98 (November1998).

  29. H. Simon, Partitioning of unstructured problems for parallel processing, Computing Systems in Engineering (1991).

  30. J. Singh, Parallel hierarchical N-body methods and their implications for multiprocessors, Ph.D. thesis, Stanford University, 1993.

  31. J. Singh, J. Hennessy and A. Gupta, Scaling parallel programs for multiprocessors: Methodology and examples, Computer (July 1993) 42-50.

  32. A. Sohn and H. Simon, S-HARP: A scalable paralel dynamic partitioner for adaptive mesh-based computations, in: Proceedings of Supercomputing'98 (November 1998).

  33. T.H. Tzen and L.M. Ni, Trapezoid self-scheduling: A practical scheduling scheme for parallel computers, IEEE Trans. Parallel Distributed Syst. 4 (January 1993) 87-98.

    Google Scholar 

  34. M. Warren and J. Salmon, A parallel hashed oct tree N-body algorithm, in: Proceedings of Supercomputing'93 (IEEE Computer Society, Press, Los Alamitos, CA, 1993) pp. 12-21.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioana Banicescu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banicescu, I., Velusamy, V. & Devaprasad, J. On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring. Cluster Computing 6, 215–226 (2003). https://doi.org/10.1023/A:1023588520138

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023588520138

Navigation