Abstract
Partial differential equation solvers spend most of their computation time performing nearest neighbor (stencil) computations on grids that model spatial domains. Tiling is an effective performance optimization for improving the data locality and enabling course-grain parallelization for such computations. However, when the domains are periodic, tiling through time is not directly applicable due to wrap-around dependencies. It is possible to tile within the spatial domain, but tiling across time (i.e. time skewing) is not legal since no constant skewing can render all loops fully permutable. We introduce a technique called smashing that maps a periodic domain to computer memory without creating any wrap-around dependencies. For a periodic cylinder domain where time skewing improves performance, the performance of smashing is comparable to another method, circular skewing, which also handles the periodicity of a cylinder. Unlike circular skewing, smashing can remove wrap-around dependencies for an icosahedron model of earth’s atmosphere.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adcroft, A., Campin, J.-M., Hill, C., Marshall, J.: Implementation of an atmosphere-ocean general circulation model on the expanded spherical cube. Monthly Weather Review, 2845–2863 (2004)
Ahmed, N., Mateev, N., Pingali, K.: Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In: Conference Proceedings of the 2000 International Conference on Supercomputing, Santa Fe, New Mexico, May 2000, pp. 141–152 (2000)
Bassetti, F., Davis, K., Quinlan, D.: Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures. In: Caromel, D., Oldehoeft, R.R., Tholburn, M. (eds.) ISCOPE 1998. LNCS, vol. 1505, pp. 107–118. Springer, Heidelberg (1998)
Douglas, C.C., Hu, J., Kowarschik, M., RĂ¼de, U., WeiĂŸ, C.: Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, 21–40 (February 2000)
Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: Proceedings of the 19th Annual International Conference on Supercomputing (ICS), pp. 361–366. ACM, New York (2005)
Irigoin, F., Triolet, R.: Supernode partitioning. In: Proceedings of the 15th Annual ACM SIGPLAN Symposium on Priniciples of Programming Languages, pp. 319–329 (1988)
Jin, G., Mellor-Crummey, J., Fowler, R.: Increasing temporal locality with skewing and recursive blocking. In: High Performance Networking and Computing (SC), Denver, Colorodo, November 2001. ACM Press and IEEE Computer Society Press (2001)
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explict optimizations for stencil computations. In: Memory Systems Performance and Correctness (2006)
Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: Proceedings of the Workshop on Memory System Performance, pp. 36–43. ACM Press, New York (2005)
Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A.: Effective automatic parallelization of stencil computations. In: Proceedings of Programming Languages Design and Implementation (PLDI), pp. 235–244. ACM, New York (2007)
Randall, D.A., Ringler, T.D., Heikes, R.P., Jones, P., Baumgardner, J.: Climate modeling with spherical geodesic grids. Computing in Science and Engineering 4(5), 32–41 (2002)
Schreiber, R., Dongarra, J.J.: Automatic blocking of nested loops. Technical Report UT-CS-90-108, Department of Computer Science, University of Tennessee (1990)
Sellappa, S., Chatterjee, S.: Cache-efficient multigrid algorithms. In: Proceedings of the 2001 International Conference on Computational Science, San Francisco, CA, USA, May 28-30, 2001. LNCS. Springer, Heidelberg (2001)
Song, Y., Li, Z.: New tiling techniques to improve cache temporal locality. ACM SIGPLAN Notices (PLDI) 34(5), 215–228 (1999)
Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Programming Language Design and Implementation (1991)
Wolfe, M.J.: Iteration space tiling for memory hierachies. In: Proc. of the 3rd SIAM Conf. on Parallel Processing for Scientific Computing, pp. 357–361 (1987)
Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1996)
Wonnacott, D.: Achieving scalable locality with time skewing. International Journal of Parallel Programming 30(3), 181–221 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Osheim, N., Strout, M.M., Rostron, D., Rajopadhye, S. (2008). Smashing: Folding Space to Tile through Time. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-89740-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89739-2
Online ISBN: 978-3-540-89740-8
eBook Packages: Computer ScienceComputer Science (R0)