Load Balancing and Data Locality Via Fractiling: An Experimental Study

Hummel, Susan Flynn; Banicescu, Ioana; Wang, Chui-Tzu; Wein, Joel

doi:10.1007/978-1-4615-2315-4_7

Susan Flynn Hummel^3,4,
Ioana Banicescu³,
Chui-Tzu Wang³ &
…
Joel Wein³

55 Accesses
3 Citations

Abstract

In order to fully exploit the power of a parallel computer, an application must be distributed onto processors so that, as much as possible, each has an equal-sized, independent portion of the work. There is a tension between balancing processor loads and maximizing locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals.

Fractiling accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. Probabilistic analysis gives evidence that it should achieve close to optimal load-balance. We have applied fractiling to two applications, an N-body problem and dense matrix multiplication, running on shared-address space and on private-address space parallel machines, namely the Kendall Square KSR1 and the IBM SPI. Although the applications contained little or no algorithmic variance, fractiling improved performance over static scheduling due to systemic variance; however, artifacts of the memory subsystems of the two architectures impeded the scalability of the fractiled code.

Research supported by ARPA/USAF under Grant no. F30602–95-1-0008 and the New York State Science and Technology Foundation. The research was conducted using the resources of the Cornell Theory Center, which receives major funding from the National Science Foundation and New York State; additional funding comes from the Advanced Research Projects Agency, the National Institute of Health, IBM Corporation and other members of the center’s Corporate Research Institute. Susan Hummel was also supported in part by NSF Grant CCR-9321424; Joel Wein by NSF Grant CCR-92l 1494. We thank Bob Walkup for his assistance in programming the SP1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

I. Banicescu and S. Flynn Hummel, Balancing Processor Loads and Exploiting Locality in Irregular Computations, IBM Research Report RC19934, Feb. 1995.
Google Scholar
I. Banicescu, Load Balancing in the Parallel Fast Multipole Algorithm Solution to the N-body Problem, PhD Thesis, Polytechnic University, Computer Science Dept., in preparation.
Google Scholar
J. A. Board, KSR implementation of Greengard’s PFMA, email, Oct. 1994.
Google Scholar
Carter, L., J. Ferrante and S. Flynn Hummel, Efficient Parallelism via Hierarchical Tiling, Proc. of SIAM Conference on Parallel Processing for Scientific Computing, Feb. 1995
Google Scholar
Carter, L., J. Ferrante and S. Flynn Hummel, Hierarchical Tiling for Improved Superscalar Perfomance, Proc. of International Parallel Processing Symposium, Apr. 1995.
Google Scholar
M.D. Durand, T. Montaut, L. Kervella and W. Jalby, Impact of Memory Contention on Task Duration in Self-Scheduled Programs, Int. Conf. on Parallel Processing, Aug. 1993.
Google Scholar
L. E. Flynn and S. Flynn Hummel, The Mathematical Foundations of the Factoring Scheduling Method, IBM Research Report RC18462, Oct. 1992.
Google Scholar
S. Flynn Hummel, E. Schonberg, and L. E. Flynn, Factoring: A Practical and Robust Method for Scheduling Parallel Loops, Comm. of the ACM 35(8) pp. 90–101, Aug. 1992.
Article Google Scholar
S. Flynn Hummel, Fractiling: A Method for Scheduling Parallel Loops on NUMA Machines, IBM Research Report RC18958, June 1993.
Google Scholar
S. Flynn Hummel, C. Wang, and J. Wein, Simulations of Fractiling in the logP Model, unpublished manuscript, 1994.
Google Scholar
S. Frank, H. Burkhardt and J. Rothnie, The KSR1: Bridging the Gap between Shared Memory and MMPs, Proc. Compcon ‘83.
Google Scholar
H. Franke, C. E. Wu, M. Riviere, P. Pattnaik, and M. Snir, MPI Programming Environment for IBM SP1/SP2, unpublished manuscript, 1995.
Google Scholar
L. Greengard, The Rapid Evaluation of Potential Fields in Particle Systems, ACM Distinguished Dissertaion Series, MIT Press, 1987.
Google Scholar
L. Greengard and W. D. Gropp, A Parallel Version of the Fast Multipole Algorithm, Computers Math. Applic. 20(7) pp. 63–71, 1992.
Article MathSciNet Google Scholar
M. Gupta and P. Banerjee, Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers, IEEE Tran. on Parallel and Distributed Systems, 3(2) pp. 179–193, Mar. 1992.
Article Google Scholar
F. Irigoin and R. Triolet, Supernode Partitioning, Proc. 15th ACM Symp. Principles of Programming Languages pp. 319–329, Jan. 1988.
Google Scholar
G. KhermouchTechnology 1994: large computers, IEEE Spectrum 31(1) pp. 46–49, 1994.
Article Google Scholar
C. Kruskal and A. Weiss, Allocating Independent Subtasks on Parallel Processors, IEEE Trans. Software Eng. SE-11(10) pp. 1001–1016, Oct. 1985.
Article Google Scholar
E R Lee, Partitioning of Regular Computation on Multiprocessor Systems, Journal of Parallel and Distributed Computing 9 pp. 312–317, 1990.
Article Google Scholar
H. Li, S. Tandri, M. Stumm, K. C. Sevcik, Locality and Loop Scheduling on NUMA Machines, Int. Conf. on Parallel Processing, Aug. 1993, to appear.
Google Scholar
E. P. Markatos and T. J. LeBlanc, Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors, Proceeedings of Supercomputing ’92 pp. 104–113, Nov. 1992.
Article Google Scholar
MPI Furum, Document for a standard message passing interface, Tech Rep. CS-93–214, University of Tennessee, Nov. 1993.
Google Scholar
C. Polychronopoulos, Loop Coalescing: A Compiler Transformation for Parallel Machines, Int. Conf. on Parallel Processing pp. 235–242, 1987.
Google Scholar
C. Polychronopoulos and D. Kuck, Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Computers. IEEE Transactions on Computers C-36(12) pp. 1425–1439, Dec. 1987.
Article Google Scholar
R. Mraz, Reducing the Variance of Point-to-Point Transfers for Parallel Real-Time Programs, Parallel and Distributed Technology 2(4) pp. 20–31, Winter 1994.
Article Google Scholar
D. A. Reed, L. M. Adams, and M. L. Patrick, Stencils and Problem Partitionings: Their Influence on the Performance of Multiple Processor Systemn, IEEE Tran. on Computers C-36(7) pp.845–858, July 1987.
Article Google Scholar
S. Talla, C implementation of Greengard’s FMA, email, June 1994.
Google Scholar
C. D. Thompson and H. T. Kung, Sorting on a Mesh-Connected Parallel Computer, Comm. of the ACM 20(4) pp. 263–271, 1977.
Article MathSciNet MATH Google Scholar
T. H. Tzen and L. M. Ni, Dynamic Loop Scheduling for Shared-Memory Multiprocessors, Proc. Int. Conf. on Parallel Processing, Vol. II, pp. 247–250, 1991.
Google Scholar
M. E. Wolf and M. S. Lam, A Data Locality Algorithm, Proc. of the ACM SIGPLAN ‘81 Conference on Programming Language Design and Implementation pp. 30–44, June 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Polytechnic University, Six MetroTech Center, Brooklyn, NY, 11201, USA
Susan Flynn Hummel, Ioana Banicescu, Chui-Tzu Wang & Joel Wein
IBM T J. Watson Research Center, USA
Susan Flynn Hummel

Authors

Susan Flynn Hummel
View author publications
You can also search for this author in PubMed Google Scholar
Ioana Banicescu
View author publications
You can also search for this author in PubMed Google Scholar
Chui-Tzu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Joel Wein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Rensselaer Polytechnic Institute, Troy, NY, USA
Boleslaw K. Szymanski
IBM Corporation, Poughkeepsie, NY, USA
Balaram Sinharoy

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hummel, S.F., Banicescu, I., Wang, CT., Wein, J. (1996). Load Balancing and Data Locality Via Fractiling: An Experimental Study. In: Szymanski, B.K., Sinharoy, B. (eds) Languages, Compilers and Run-Time Systems for Scalable Computers. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2315-4_7

Download citation

DOI: https://doi.org/10.1007/978-1-4615-2315-4_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5979-1
Online ISBN: 978-1-4615-2315-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics