Skip to main content

Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8967))

Abstract

In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO [6], combined with a highly optimized fine grain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intel Xeon Phi board up to 32.25 % against instances produced by state-of-the-art compiler frameworks for selected stencil applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Where \(m\) is less or equal to the number of dimensions of the iteration space.

  2. 2.

    The parallel hyperplane.

  3. 3.

    Where \(n\) is the size of a dimension in the iteration space. For our example, both dimensions are the same.

References

  1. perf: Linux profiling with performance counters

    Google Scholar 

  2. Bandishti, V., Pananilath, I., Bondhugula, U.: Tiling stencil computations to maximize parallelism. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, Los Alamitos, CA, USA, pp. 40:1–40:11 (2012)

    Google Scholar 

  3. Baskaran, M.M., et al.: Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–10. ACM (2008)

    Google Scholar 

  4. Bastoul, C.: Generating loops for scanning polyhedra: cloog users guide. Polyhedron 2, 10 (2004)

    Google Scholar 

  5. Bikshandi, G., et al.: Programming for parallelism and locality with hierarchically tiled arrays. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2006, pp. 48–57. ACM, New York (2006)

    Google Scholar 

  6. Bondhugula, U., Ramanujam, J.: Pluto: a practical and fully automatic polyhedral parallelizer and locality optimizer (2007)

    Google Scholar 

  7. Intel Open Source Technology Center. Open community runtime (2012)

    Google Scholar 

  8. Cepeda, S.: Optimization and performance tuning for Intel Xeon Phi coprocessors, part 2: understanding and using hardware events (2012)

    Google Scholar 

  9. Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. Siam Rev. (2008)

    Google Scholar 

  10. Dursun, H., et al.: Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters. J. Supercomput. 62(2), 946–966 (2012)

    Article  MathSciNet  Google Scholar 

  11. Feautrier, P.: Some efficient solutions to the affine scheduling problem. i. one-dimensional time. Int. J. Parallel Program. 21(5), 313–347 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  12. Feautrier, P.: Some efficient solutions to the affine scheduling problem. part ii. multidimensional time. Int. J. Parallel Program. 21(6), 389–420 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  13. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS 1999, p. 285. IEEE Computer Society, Washington, DC (1999)

    Google Scholar 

  14. Gan, G., Wang, X., Manzano, J., Gao, G.R.: Tile percolation: an OpenMP tile aware parallelization technique for the cyclops-64 multicore processor. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 839–850. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Griebl, M., Lengauer, C., Wetzel, S.: Code generation in the polytope model. In: Proceedings 1998 International Conference on Parallel Architectures and Compilation Techniques, pp. 106–111. IEEE (1998)

    Google Scholar 

  16. Grosser, T., Verdoolaege, S., Cohen, A., Sadayappan, P.: The relation between diamond tiling and hexagonal tiling. In: HiStencils 2014, p. 65 (2014)

    Google Scholar 

  17. Högstedt, K., Carter, L., Ferrante, J.: Selecting tile shape for minimal execution time. In: Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 201–211. ACM (1999)

    Google Scholar 

  18. ET International. Swarm (swift adaptive runtime machine) (2012)

    Google Scholar 

  19. Kim, D., et al.: Physical experimentation with prefetching helper threads on intel’s hyper-threaded processors. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO 2004, p. 27. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  20. Kodukula, I., Ahmed, N., Pingali, K.: Data-centric multi-level blocking, pp. 346–357 (1997)

    Google Scholar 

  21. Lewis, J., et al.: An automatic prefetching and caching system. In: 2010 IEEE 29th International Performance Computing and Communications Conference (IPCCC), pp. 180–187, December 2010

    Google Scholar 

  22. Massachusetts Institute of Technology: Laboratory for Computer Science and D.O.J. Tanguay. Compile-time Loop Splitting for Distributed Memory Multiprocessors. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science (1993)

    Google Scholar 

  23. Theobald, K.B.: Earth: An Efficient Architecture for Running Threads. McGill University, Montreal (1999)

    Google Scholar 

  24. Wilde, D.K.: A library for doing polyhedral operations, Technical report (1997)

    Google Scholar 

  25. Wolf, M.E., Lam, M.S.: A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI 1991, pp. 30–44. ACM, New York (1991)

    Google Scholar 

  26. Wolfe, M.: More iteration space tiling. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, Supercomputing 1989, pp. 655–664. ACM, New York (1989)

    Google Scholar 

  27. Wolfe, M.: Iteration space tiling for memory hierarchies. In: Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp. 357–361. Society for Industrial and Applied Mathematics, Philadelphia (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunil Shrestha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Shrestha, S., Manzano, J., Marquez, A., Feo, J., Gao, G.R. (2015). Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17473-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17472-3

  • Online ISBN: 978-3-319-17473-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics