Abstract
Reversible computing aims at keeping all information on input and intermediate values available at any step of the computation, making information virtually present everywhere. Rematerialization in register allocation amounts to recomputing values instead of spilling them in memory when registers run out. In this paper we detail a heuristic algorithm for exploiting reverse computing for register materialization. This improves information locality as it provides more opportunities for retrieving data. Rematerialization adds instructions and we show on one specifically designed example that reverse computing may alleviate the impact of these additional instructions on performance. We also show how thread parallelism may be optimized on GPUs by performing register allocation with reverse recomputing that increases the number of threads per Streaming Multiprocessor. This is done on the main kernel of Lattice Quantum Chromo Dynamics simulation program where we gain a 11 % speedup.
Similar content being viewed by others
References
http://developer.nvidia.com/nvidia-gpu-computing-documentation
Bahi, M., Eisenbeis, C.: Rematerialization-based register allocation through reverse computing. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, CF ’11, pp. 24:1–24:2. New York, NY, USA, ACM (2011)
Baker, H.G.: NREVERSAL of fortune—the thermodynamics of garbage collection. In: IWMM, pp. 507–524 (1992)
Bennett C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525–532 (1973)
Bennett C.H.: Time/space trade-offs for reversible computation. SIAM J. Comput. 18, 766–776 (1989)
Berson, D.A., Gupta, R., Soffa, M.L.: URSA: A unified resource allocator for registers and functional units in vliw architectures. In: PACT’93: Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, pp. 243–254. North-Holland Publishing Co., Amsterdam, The Netherlands (1993)
Bishop, P.G.: Using reversible computing to achieve fail-safety. In: Proceedings of the Eighth International Symposium on Software Reliability Engineering, ISSRE ’97, pp. 182–191. IEEE Computer Society, Washington, DC, USA (1997)
Bouchez, F.: A Study of Spilling and Coalescing in Register Allocation as Two Separate Phases. PhD thesis, ENS Lyon (2009)
Briggs, P.: Register allocation via graph coloring. PhD thesis Rice University, Houston TX USA (1992)
Briggs, P., Cooper, K.D., Torczon, L.: Rematerialization. In: Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, PLDI ’92, pp. 311–321. ACM, New York, NY, USA (1992)
Carothers, C.D., Perumalla, K.S., Fujimoto, R.M.: Efficient optimistic parallel simulations using reverse computation. In: Proceedings of the Thirteenth Workshop on Parallel and Distributed Simulation, PADS ’99, pp. 126–135. IEEE Computer Society, Washington, DC, USA (1999)
Chaitin, G.J.: Register allocation & spilling via graph coloring. In: SIGPLAN ’82: Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, pp. 98–105. ACM, New York, NY, USA (1982)
Frank, M.P.: The R programming language and compiler, pp. 1–18. Memo M8 MIT AI Lab (1997)
Govindarajan R., Yang H., Amaral J.N., Zhang C., Gao G.R.: Minimum register instruction sequencing to reduce register spills in out-of-order issue superscalar architectures. IEEE Trans. Comput. 52(1), 4–20 (2003)
Lutz, C., Derby, H.: Janus: a Time-Reversible Language. Caltech Class Project (1982)
Murphy, M.: Nvidia’s experience with open64. Open64 Workshop at CGO (2008)
Punjani, M.: Register rematerialization in gcc. In: GCC Developers’ Summit, pp. 131–139 (2004)
Simpson L.T.: Value-Driven Redundancy Elimination. PhD thesis, Rice University, Houston, TX, USA (1996)
Urbach C., Jansen K., Shindler A., Wenger U.: HMC algorithm with multiple time scale integration and mass preconditioning. Comput. Phys. Commun. 174, 87–98 (2006)
Zhang, T., Zhuang, X., Pande, S.: Compiler optimizations to reduce security overhead. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’06, pp. 346–357. IEEE Computer Society, Washington, DC, USA (2006)
Zhang Y., Kwon Y.-J., Lee H.J.: A systematic generation of initial register-reuse chains for dependence minimization. SIGPLAN Not. 36(2), 47–54 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bahi, M., Eisenbeis, C. Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing. Int J Parallel Prog 42, 49–76 (2014). https://doi.org/10.1007/s10766-012-0212-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-012-0212-y