Aggressive Function Inlining: Preventing Loop Blockings in the Instruction Cache

Ben Asher, Yosi; Boehm, Omer; Citron, Daniel; Haber, Gadi; Klausner, Moshe; Levin, Roy; Shajrawi, Yousef

doi:10.1007/978-3-540-77560-7_26

Yosi Ben Asher¹,
Omer Boehm¹,
Daniel Citron¹,
Gadi Haber¹,
Moshe Klausner¹,
Roy Levin¹ &
…
Yousef Shajrawi¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4917))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

750 Accesses

Abstract

Aggressive function inlining can lead to significant improvements in execution time. This potential is reduced by extensive instruction cache (Icache) misses caused by subsequent code expansion. It is very difficult to predict which inlinings cause Icache conflicts, as the exact location of code in the executable depends on completing the inlining first. In this work we propose a new method for selective inlining called “Icache Loop Blockings” (ILB). In ILB we only allow inlinings that do not create multiple inlined copies of the same function in hot execution cycles. This prevents any increase in the Icache footprint. This method is significantly more aggressive than previous ones, experiments show it is also better.

Results on a server level processor and on an embedded CPU, running SPEC CINT2000, show an improvement of 10% in the execution time of the ILB scheme in comparison to other inlining methods. This was achieved without bloating the size of the hot code executed at any single point of execution, which is crucial for the embedded processor domain.

We have also considered the synergy between code reordering and inlining focusing on how inlining can help code reordering. This aspect of inlining has not been studied in previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Instruction Level Loop De-optimization

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Article 06 August 2021

Interleaving with coroutines: a systematic and practical approach to hide memory latency in index joins

Article 14 December 2018

References

Arnold, M., Fink, S., Sarkar, V., Sweeney, P.: A Comparative Study of Static and Profile-based Heuristics for Inlining. In: Proceedings of the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization, pp. 52–64 (2000)
Google Scholar
Aydin, H., Kaeli, D.: Using Cache Line Coloring to Perform Aggressive Procedure Inlining. SIGARCH Computer Architecture News 28(1), 62–71 (2000)
Article Google Scholar
Ayers, A., Gottlieb, R., Schooler, R.: Aggressive Inlining. In: Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 134–145 (June 1997)
Google Scholar
Ball, J.E.: Program improvement by the selective integration of procedure calls. Technical report, PhD thesis, University of Rochester (1982)
Google Scholar
Das, D.: Function Inlining versus Function Cloning. ACM SIGPLAN Notices 38(6), 23–29 (2003)
Article Google Scholar
Eades, P., Lin, X., Smyth, W.F.: A fast and effective heuristic for the feedback arc set problem. Info. Proc. Letters 47, 319–323 (1993)
Article MATH MathSciNet Google Scholar
Haber, G., Klausner, M., Eisenberg, V., Mendelson, B., Gurevich, M.: Optimization Opportunities Created by Global Data Reordering. In: CGO 2003. First International Symposium on Code Generation and Optimization (March 2003)
Google Scholar
Kaser, O., Ramakrishnan, C.R.: Evaluating inlining techniques. Computer Languages 24(2), 55–72 (1998)
Article MATH Google Scholar
McFarling, S.: Procedure merging with instruction caches. In: Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pp. 71–79 (June 1991)
Google Scholar
Muth, R., Debray, S., Watterson, S.: ALTO: A Link-Time Optimizer for the Compaq Alpha. Technical Report 98-14, Dept. of Computer Science, The University of Arizona (December 1998)
Google Scholar
Nahshon, I., Bernstein, D.: FDPR - A Post-Pass Object Code Optimization Tool (April 1996)
Google Scholar
Scheifler, R.W.: An analysis of inline substitution for a structured programming language. Communications of the ACM 20(9), 647–654 (1977)
Article MATH Google Scholar
Schwarz, B., Debray, S., Andrews, G., Legendre, M.: PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture. In: Proceedings of Workshop on Binary Rewriting (September 2001)
Google Scholar
Way, T., Breech, B., Du, W., Stoyanov, V., Pollock, L.: Using path-pectra-based cloning in regional-based optimization for instruction level parallelism. In: Proceedings of the 14th International Conference on Parallel and Distributed Computing Systems, pp. 83–90 (2001)
Google Scholar
Way, T., Pollock, L.: Evaluation of a Region-based Partial Inlining Algorithm for an ILP Optimizing Compiler. In: IASTED International Conference on Parallel and Distributed Computing and Systems (November 2002)
Google Scholar
Zhao, P., Amaral, J.N.: To inline or not to inline? enhanced inlining decisions (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research Lab in Haifa, Israel, Computer Science Department Haifa University, Haifa, Israel
Yosi Ben Asher, Omer Boehm, Daniel Citron, Gadi Haber, Moshe Klausner, Roy Levin & Yousef Shajrawi

Authors

Yosi Ben Asher
View author publications
You can also search for this author in PubMed Google Scholar
Omer Boehm
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Citron
View author publications
You can also search for this author in PubMed Google Scholar
Gadi Haber
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Klausner
View author publications
You can also search for this author in PubMed Google Scholar
Roy Levin
View author publications
You can also search for this author in PubMed Google Scholar
Yousef Shajrawi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Per Stenström Michel Dubois Manolis Katevenis Rajiv Gupta Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben Asher, Y. et al. (2008). Aggressive Function Inlining: Preventing Loop Blockings in the Instruction Cache. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-77560-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77559-1
Online ISBN: 978-3-540-77560-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics