Combining Performance Aspects of Irregular Gauss-Seidel Via Sparse Tiling

Strout, Michelle Mills; Carter, Larry; Ferrante, Jeanne; Freeman, Jonathan; Kreaseck, Barbara

doi:10.1007/11596110_7

Michelle Mills Strout⁶,
Larry Carter⁶,
Jeanne Ferrante⁶,
Jonathan Freeman⁶ &
…
Barbara Kreaseck⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 2481))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

584 Accesses
13 Citations
3 Altmetric

Abstract

Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, M.F.: Finite element market, http://www.cs.berkeley.edu/~madams/femarket/index.html
Adams, M.F.: A distributed memory unstructured Gauss-Seidel algorithm for multigrid smoothers. In: ACM (ed.) SC 2001: High Performance Networking and Computing. Denver,CO (2001)
Google Scholar
Adams, M.F.: Evaluation of three unstructured multigrid methods on 3D finite element problems in solid mechanics. International Journal for Numerical Methods in Engineering (to appear)
Google Scholar
Alms̀i, G., Padua, D.: Majic: Compiling matlab for speed and responsiveness. In: PLDI 2002 (2002)
Google Scholar
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
Google Scholar
Bassetti, F., Davis, K., Quinlan, D.: Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures. In: Caromel, D., Oldehoeft, R.R., Tholburn, M. (eds.) ISCOPE 1998. LNCS, vol. 1505, pp. 107–118. Springer, Heidelberg (1998)
Chapter Google Scholar
Berger, E., Lin, C., Guyer, S.Z.: Customizing software libraries for performance portability. In: 10th SIAM Conference on Parallel Processing for Scientific Computing (March 2001)
Google Scholar
Carr, S., Kennedy, K.: Compiler blockability of numerical algorithms. The Journal of Supercomputing, 114–124 (November 1992)
Google Scholar
Chauhan, A., Kennedy, K.: Optimizing strategies for telescoping languages: Procedure strength reduction and procedure vectorization. In: Proceedings of the 15th ACM International Conference on Supercomputing, New York, pp. 92–102 (2001)
Google Scholar
Culberson, J.: Graph coloring programs, http://www.cs.ualberta.ca/joe/Coloring/Colorsrc/index.html
Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1-4, pp. 229–241 (1999)
Google Scholar
Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 21–40 (February 2000)
Google Scholar
Engler, D.R.: Interface compilation: Steps toward compiling program interfaces as languages. IEEE Transactions on Software Engineering 25(3), 387–400 (1999)
Article Google Scholar
Gannon, D., Jalby, W., Gallivan, K.: Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing 5(5), 587–616 (1988)
Article Google Scholar
Hagger, M.J.: Automatic domain decomposition on unstructured grids (doug). Advances in Computational Mathematics (9), 281–310 (1998)
Article MATH MathSciNet Google Scholar
Han, H., Tseng, C.-W.: A comparison of locality transformations for irregular codes. In: Dwarkadas, S. (ed.) LCR 2000. LNCS, vol. 1915, pp. 70–84. Springer, Heidelberg (2000)
Chapter Google Scholar
Van Henson, E., Yang, U.M.: BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS 41(1), 155–177 (2002)
Article MATH MathSciNet Google Scholar
Holst, M.: Fetk - the finite element tool kit, http://www.fetk.org
Im, E.-J.: Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley (May 2000)
Google Scholar
Irigoin, F., Triolet, R.: Supernode partitioning. In: Proceedings of the 15th Annual ACM SIGPLAN Symposium on Priniciples of Programming Languages, 319–329 (1988)
Google Scholar
Jin, G., Mellor-Crummey, J., Fowler, R.: Increasing temporal locality with skewing and recursive blocking. In: SC 2001: High Performance Networking and Computing, Denver, Colorodo, November 2001. ACM Press/IEEE Computer Society Press (2001)
Google Scholar
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing 48(1), 96–129 (1998)
Article MathSciNet Google Scholar
Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 1997), June 15-18. ACM SIGPLAN Notices, vol. 32(5), pp. 346–357. ACM Press, New York (1997)
Chapter Google Scholar
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)
Article Google Scholar
Mellor-Crummey, J., Whalley, D., Kennedy, K.: Improving memory hierarchy performance for irregular applications. In: Proceedings of the 1999 Conference on Supercomputing, ACM SIGARCH, pp. 425–433 (June 1999)
Google Scholar
Mitchell, N., Carter, L., Ferrante, J.: Localizing non-affine array references. In: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT 1999), Newport Beach, California, October 12-16, pp. 192–202. IEEE Computer Society Press, Los Alamitos (1999)
Google Scholar
Pugh, W., Rosser, E.: Iteration space slicing for locality. In: Carter, L., Ferrante, J. (eds.) LCPC 1999. LNCS, vol. 1863, p. 164. Springer, Heidelberg (2000)
Chapter Google Scholar
Quinlan, D.: Rose: Compiler support for object-oriented frameworks. In: Proceedings of Conference on Parallel Compilers (CPC 2000), Aussois, France, January 2000. Also published in a special issue of Parallel Processing Letters 10 (2000)
Google Scholar
Sellappa, S., Chatterjee, S.: Cache-efficient multigrid algorithms. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2073, p. 107. Springer, Heidelberg (2001)
Chapter Google Scholar
Sharma, S.D., Ponnusamy, R., Moon, B., Hwang, Y.-S., Das, R., Saltz, J.: Run-time and compile-time support for adaptive irregular problems. In: Supercomputing 1994. IEEE Computer Society, Los Alamitos (1994)
Google Scholar
Smith, B.F., Bjørstad, P.E., Gropp, W.: Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Song, Y., Li, Z.: New tiling techniques to improve cache temporal locality. ACM SIGPLAN Notices 34(5), 215–228 (1999)
Article Google Scholar
Strout, M.M., Carter, L., Ferrante, J.: Rescheduling for locality in sparse matrix computations. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2073, p. 137. Springer, Heidelberg (2001)
Chapter Google Scholar
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Programming Language Design and Implementation (1991)
Google Scholar
Wolfe, M.J.: Iteration space tiling for memory hierarchies. In: Third SIAM Conference on Parallel Processing for Scientific Computing, pp. 357–361 (1987)
Google Scholar
Wonnacott, D.: Achieving scalable locality with time skewing. International Journal of Parallel Programming 30(3), 181–221 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of California, San Diego, 9500, Gilman Dr. La Jolla, CA, 92093-0114, USA
Michelle Mills Strout, Larry Carter, Jeanne Ferrante, Jonathan Freeman & Barbara Kreaseck

Authors

Michelle Mills Strout
View author publications
You can also search for this author in PubMed Google Scholar
Larry Carter
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne Ferrante
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Freeman
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Kreaseck
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deptartment of Computer Science, University of Maryland, 4135 A.V. Williams Bldg., College Park, 20742, MD, USA
Bill Pugh
Dept. of Computer Science, Univ. of Maryland at College Park,
Chau-Wen Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Strout, M.M., Carter, L., Ferrante, J., Freeman, J., Kreaseck, B. (2005). Combining Performance Aspects of Irregular Gauss-Seidel Via Sparse Tiling. In: Pugh, B., Tseng, CW. (eds) Languages and Compilers for Parallel Computing. LCPC 2002. Lecture Notes in Computer Science, vol 2481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596110_7

Download citation

DOI: https://doi.org/10.1007/11596110_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30781-5
Online ISBN: 978-3-540-31612-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics