Abstract
Branch predictors are associated with critical design issues for nowadays instruction greedy processors. We study two important domains where the optimization of decision trees — implemented through switch-case or nested if-then-else constructs — makes the precise modeling of these hardware mechanisms determining for performance: compute-intensive libraries with versioning and cloning, and high-performance interpreters. Against common belief, the complexity of recent microarchitectures does not necessarily hamper the design of accurate cost models, in the special case of decision trees. We build a simple model that illustrates the reasons for which decision tree performance is predictable.
Based on this model, we compare the most significant code generation strategies on the Itanium2 processor. We show that no strategy dominates in all cases, and although they used to be penalized by traditional superscalar processors, indirect branches regain a lot of interest in the context of predicated execution and delayed branches. We validate our study with an improvement from 15% to 40% over Intel ICC compiler for a Daxpy code focused on short vectors.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aho, A., Sethi, R., Ullman, J.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)
Ertl, A., Gregg, D.: Optimizing indirect branch prediction accuracy in virtual machine interpreters. In: ACM Symp. on Programming Language Design and Implementation (PLDI 2003), San Diego, California (June 2003)
Faraboschi, P., Brown, G., Fisher, J.A., Desoli, G., Homewood, F.: Lx: a technology platform for customizable VLIW embedded processing. In: ACM and IEEE Intl. Symp. on Computer Architecture (ISCA 2000), Vancouver, BC (June 2000)
Intel IA-64 Architecture Software Developer’s Manual, vol. 3: Instruction Set Reference, revision 2.1 edition, http://developer.intel.com/design/itanium/family
Li, X., Garzaran, M.-J., Padua, D.: A dynamically tuned sorting library. In: ACM Conference on Code Generation and Optimization (CGO 2004), Palo Alto, California (March 2004)
McNairy, C., Soltis, D.: Itanium 2 processor microarchitecture. IEEE Micro, 44–55 ( March-April 2003)
Open research compiler, http://ipf-orc.sourceforge.net
Packard, H.: Inside the intel itanium 2 processor: an itanium processor family member for balanced performance over a wide range of applications. White paper, Hewlett Packard (July 2002)
Philips Semiconductors, Sunnyvale, CA. TriMedia Compilation System, v.2.1User and Reference Manual (1999)
Yeh, T.Y., Patt, Y.N.: Alternative implementations of two-level adaptive branch prediction. In: 19th International Symposium on Computer Architecture, Gold Coast, Australia, pp. 124–134. ACM and IEEE CS (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carribault, P., Lemuet, C., Acquaviva, JT., Cohen, A., Jalby, W. (2005). Branch Strategies to Optimize Decision Trees for Wide-Issue Architectures. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science, vol 3602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532378_31
Download citation
DOI: https://doi.org/10.1007/11532378_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28009-5
Online ISBN: 978-3-540-31813-2
eBook Packages: Computer ScienceComputer Science (R0)