Skip to main content
Log in

Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Reversible computing aims at keeping all information on input and intermediate values available at any step of the computation, making information virtually present everywhere. Rematerialization in register allocation amounts to recomputing values instead of spilling them in memory when registers run out. In this paper we detail a heuristic algorithm for exploiting reverse computing for register materialization. This improves information locality as it provides more opportunities for retrieving data. Rematerialization adds instructions and we show on one specifically designed example that reverse computing may alleviate the impact of these additional instructions on performance. We also show how thread parallelism may be optimized on GPUs by performing register allocation with reverse recomputing that increases the number of threads per Streaming Multiprocessor. This is done on the main kernel of Lattice Quantum Chromo Dynamics simulation program where we gain a 11 % speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. http://developer.nvidia.com/nvidia-gpu-computing-documentation

  2. Bahi, M., Eisenbeis, C.: Rematerialization-based register allocation through reverse computing. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, CF ’11, pp. 24:1–24:2. New York, NY, USA, ACM (2011)

  3. Baker, H.G.: NREVERSAL of fortune—the thermodynamics of garbage collection. In: IWMM, pp. 507–524 (1992)

  4. Bennett C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973)

    Article  MATH  Google Scholar 

  5. Bennett C.H.: Time/space trade-offs for reversible computation. SIAM J. Comput. 18, 766–776 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  6. Berson, D.A., Gupta, R., Soffa, M.L.: URSA: A unified resource allocator for registers and functional units in vliw architectures. In: PACT’93: Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, pp. 243–254. North-Holland Publishing Co., Amsterdam, The Netherlands (1993)

  7. Bishop, P.G.: Using reversible computing to achieve fail-safety. In: Proceedings of the Eighth International Symposium on Software Reliability Engineering, ISSRE ’97, pp. 182–191. IEEE Computer Society, Washington, DC, USA (1997)

  8. Bouchez, F.: A Study of Spilling and Coalescing in Register Allocation as Two Separate Phases. PhD thesis, ENS Lyon (2009)

  9. Briggs, P.: Register allocation via graph coloring. PhD thesis Rice University, Houston TX USA (1992)

  10. Briggs, P., Cooper, K.D., Torczon, L.: Rematerialization. In: Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, PLDI ’92, pp. 311–321. ACM, New York, NY, USA (1992)

  11. Carothers, C.D., Perumalla, K.S., Fujimoto, R.M.: Efficient optimistic parallel simulations using reverse computation. In: Proceedings of the Thirteenth Workshop on Parallel and Distributed Simulation, PADS ’99, pp. 126–135. IEEE Computer Society, Washington, DC, USA (1999)

  12. Chaitin, G.J.: Register allocation & spilling via graph coloring. In: SIGPLAN ’82: Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, pp. 98–105. ACM, New York, NY, USA (1982)

  13. Frank, M.P.: The R programming language and compiler, pp. 1–18. Memo M8 MIT AI Lab (1997)

  14. Govindarajan R., Yang H., Amaral J.N., Zhang C., Gao G.R.: Minimum register instruction sequencing to reduce register spills in out-of-order issue superscalar architectures. IEEE Trans. Comput. 52(1), 4–20 (2003)

    Article  Google Scholar 

  15. Lutz, C., Derby, H.: Janus: a Time-Reversible Language. Caltech Class Project (1982)

  16. Murphy, M.: Nvidia’s experience with open64. Open64 Workshop at CGO (2008)

  17. Punjani, M.: Register rematerialization in gcc. In: GCC Developers’ Summit, pp. 131–139 (2004)

  18. Simpson L.T.: Value-Driven Redundancy Elimination. PhD thesis, Rice University, Houston, TX, USA (1996)

  19. Urbach C., Jansen K., Shindler A., Wenger U.: HMC algorithm with multiple time scale integration and mass preconditioning. Comput. Phys. Commun. 174, 87–98 (2006)

    Article  Google Scholar 

  20. Zhang, T., Zhuang, X., Pande, S.: Compiler optimizations to reduce security overhead. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’06, pp. 346–357. IEEE Computer Society, Washington, DC, USA (2006)

  21. Zhang Y., Kwon Y.-J., Lee H.J.: A systematic generation of initial register-reuse chains for dependence minimization. SIGPLAN Not. 36(2), 47–54 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mouad Bahi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bahi, M., Eisenbeis, C. Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing. Int J Parallel Prog 42, 49–76 (2014). https://doi.org/10.1007/s10766-012-0212-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-012-0212-y

Keywords

Navigation