Skip to main content

A Practical and Aggressive Loop Fission Technique

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11338))

Abstract

Loop fission is an effective loop optimization for exploiting fine-grained parallelism. Currently, loop fission is widely used in existing parallelizing compilers. To fully exploit the optimization, we proposed and implemented a practical and aggressive loop fission technique. First, we present an aggressive dependence graph pruning method to eliminate pseudo dependences caused by the conservativeness of compilers. Second, we introduce a topological sort based loop fission algorithm to distribute loops correctly. Finally, to enhance the performance of the generated programs which have potential of loop fission, we propose an advanced loop fission strategy. We evaluate these techniques and algorithms in the experimental section.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/software Approach. Gulf Professional Publishing, Houston (1999)

    Google Scholar 

  2. Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  3. Kumar, V., et al.: Introduction to Parallel Computing: Design and Analysis of Algorithms, vol. 400. Benjamin/Cummings, Redwood City (1994)

    MATH  Google Scholar 

  4. Pugh, W.: The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM (1991)

    Google Scholar 

  5. Luporini, F., et al.: Cross-loop optimization of arithmetic intensity for finite element local assembly. ACM Trans. Archit. Code Optim. (TACO) 11(4), 57 (2015)

    Google Scholar 

  6. Kennedy, K., McKinley, K.S.: Optimizing for parallelism and data locality. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM (2014)

    Google Scholar 

  7. Allen, J.R., Kennedy, K.: Automatic loop interchange. ACM Sigplan Notices 19(6), 233–246 (1984)

    Article  Google Scholar 

  8. Banerjee, U.: Loop Parallelization. Springer, Heidelberg (2013)

    MATH  Google Scholar 

  9. open64-5.0 compiler source code. http://sourceforge.net/projects/open64/files/open64/Open64-5.0

  10. McFarling, S.: Program optimization for instruction caches. ACM SIGARCH Comput. Archit. News 17(2), 183–191 (1989)

    Article  Google Scholar 

  11. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: a Dependence-Based Approach, vol. 1. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  12. Pouchet, L.-N., et al.: Loop transformations: convexity, pruning and optimization. ACM SIGPLAN Notices 46(1), 549–562 (2011)

    Article  Google Scholar 

  13. Kong, M., et al.: When polyhedral transformations meet SIMD code generation. ACM Sigplan Notices. 48(6), 127–138 (2013)

    Article  Google Scholar 

  14. Maleki, S., et al.: An evaluation of vectorizing compilers. In: 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE (2011)

    Google Scholar 

  15. Bastoul, C., Cohen, A., Girbal, S., Sharma, S., Temam, O.: Putting polyhedral loop transformations to work. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 209–225. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24644-2_14

    Chapter  Google Scholar 

  16. Hoefler, T., Lumsdaine, A., Dongarra, J.: Towards efficient mapreduce using MPI. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) EuroPVM/MPI 2009. LNCS, vol. 5759, pp. 240–249. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03770-2_30

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, B. et al. (2018). A Practical and Aggressive Loop Fission Technique. In: Hu, T., Wang, F., Li, H., Wang, Q. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11338. Springer, Cham. https://doi.org/10.1007/978-3-030-05234-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05234-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05233-1

  • Online ISBN: 978-3-030-05234-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics