Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations

Abboud, Fadi; Ben-Asher, Yosi; Shajrawi, Yousef; Stein, Esti

doi:10.1007/s10766-012-0196-7

Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations

Published: 10 May 2012

Volume 40, pages 488–513, (2012)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Fadi Abboud¹,
Yosi Ben-Asher¹,
Yousef Shajrawi² &
…
Esti Stein³

160 Accesses
Explore all metrics

Abstract

In here we consider a technique to automatically extract three-argument instructions from sequential arithmetic code. The instructions include: multiply and add, three argument additions and three argument multiplications (MUL3). The proposed solution combines a height reduction technique that generates three-argument instructions and a VLIW scheduling that can benefit from these instructions. The proposed height reduction technique is based on a known theoretical algorithm (MRK) that in some cases can evaluate an algebraic circuit faster than its depth. We modified the MRK algorithm to generate less instructions and emit VLIW instructions. The modified MRK algorithm was implemented in the LLVM compiler and the potential usefulness was measured. Our results show that for arithmetic benchmarks the proposed technique can improve the VLIW scheduling while emitting three-argument instructions. The contribution of this work includes: the modified MRK algorithm as a new technique for height reduction optimizations and the study of the potential usefulness of three-argument instructions. Though our results are for a non existing hardware they show the usefulness of adding such instructions to VLIW CPUs. Note that a previous research showed that MUL3 can be executed as fast as MUL2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dartagnan: Leveraging Compiler Optimizations and the Price of Precision (Competition Contribution)

GVLE: a highly optimized GPU-based implementation of variable-length encoding

Article 18 December 2022

Speeding Up Non-archimedean Numerical Computations Using AVX-512 SIMD Instructions

References

Arm instruction set, arm7tdmi data sheet. Technical report(2009)
Ben Asher, Y., Stein, E.: Extending booth algorithm to multiplications of three numbers on the fpga. In: IEEE International Conference on Field-Programmable Technology, Taiwan (2008)
Cortadella, J.: Bi-decomposition and tree-height reduction for timing optimization. In: IEEE/ACM International Workshop on Logic and Synthesis (2002)
De Souza A.F., Rounce P.: Dynamically scheduling VLIW instructions. J. Parallel Distrib. Comput. 60(12), 1480–1511 (2000)
Article MATH Google Scholar
Freescale Semiconductor, Inc., SC140 DSP Core Reference Manual (2005)
Miller G.L., Ramachandran V., Kaltofen E.: Efficient parallel evaluation of straight-line code and arithmetic circuits. SIAM J. Comput. 17(4), 687–695 (1988)
Article MathSciNet MATH Google Scholar
Ghuloum, A.M., Fisher, A.L.: Parallelizing complex scans and reductions, pp. 135–146 (1994)
Haney, R., Meuse, T., Kepner, J., Lebak, J.: Hpec challenge overview. In: Ninth Anuual High-Performance Embedded Computing Workshop (2005)
Intel 64 and ia-32 architectures, software developers manual, instruction set reference. Technical report, Intel (2011)
Jaja J.: An Introduction to Parallel Algorithms. Addison-Wesley, Boston (1992)
MATH Google Scholar
Lattner, C., Adve, V. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California (2004)
Lee, C., Lee, J.K., Hwang, T.T., Tsai, S.C.: Compiler optimization on instruction scheduling for low power. In: International Symposium on System Synthesis (ISSS), p 55, IEEE Computer Society (2000)
Mangalam, G.N., Narayan, S., van Besouw, P., Avra, L., Mathur, A., Saluja, S.: Graph transformations for improved tree height reduction. In: VLSID ’03: Proceedings of the 16th International Conference on VLSI Design (2003)
McMahon, F.H.: The livermore Fortran Kernels Test of the Numerical Performance Range. Performance Evaluation of Supercomputers. Elsevier, New York, pp. 143–186 (1988)
Muchnick S.: Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, Waltham (1997)
Google Scholar
Nakatani, T., Ebcioglu, K.: Combining as a compilation technique for vliw architectures. In: Proceedings of MICRO-22, pp. 43–55 (1989)
Nicolau, A., Potasmann, R.: Incremental tree height reduction for high level synthesis. In: DAC ’91: Proceedings of the 28th conference on ACM/IEEE design automation, pp. 770–774 (1991)
Reif J.: Synthesis of Parallel Algorithms. Morgan Kaufmann, Waltham (1993)
Google Scholar
Revol N., Roch J.: Parallel evaluation of arithmetic circuits. Theor. Comput. Sci. 162, 133–150 (1996)
Article MathSciNet MATH Google Scholar
Santos R., Azevedo R., Araujo G.: Instruction scheduling based on subgraph isomorphism for a high performance computer processor. J. Univers. Comput. Sci. 14(21), 3465–3480 (2008)
Google Scholar
Schlansker, M., Kathail, V., Anik, S.: Height reduction of control recurrences for ilp processors. In: MICRO 27: Proceedings of the 27th Annual International Symposium on Microarchitecture (1994)
Sethi, R., Aho, A.V., Lam, M.S., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Boston, 1986 (2006)
Shobaki G., Wilken K., Heffernan M.: Optimal trace scheduling using enumeration. ACM Trans. Architect. Code Optim. (TACO) 5(4), 1–32 (2009)
Article Google Scholar
Steven, F.L., Potter, R.D., Steven, G.B., Vintan, L., Romania, S.: Static data dependence collapsing in a high performance superscalar architecture. In: The 3-rd International Conference on Massively Parallel Computing Systems (MPCS ’98) (1998)
Stpiczynski, P.: Parallel algorithms for solving linear recurrence systems. Parallel Processing: CONPAR 92-VAPP V. In: Second Joint International Conference on Vector and Parallel Processing, pp. 343–348 (1992)
Texas Instruments, Inc., TMS320C67x/C67x CPU and Instruction Set Reference Guide (2006)
Zory, J., Coelho, F.: Using algebraic transformations to optimize expression evaluation in scientific code. In: PACT ’98: Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (1998)

Download references

Author information

Authors and Affiliations

CS. Haifa University, Haifa, Israel
Fadi Abboud & Yosi Ben-Asher
IBM, HRL Haifa, Haifa, Israel
Yousef Shajrawi
Braude College, Carmiel, Israel
Esti Stein

Authors

Fadi Abboud
View author publications
You can also search for this author in PubMed Google Scholar
Yosi Ben-Asher
View author publications
You can also search for this author in PubMed Google Scholar
Yousef Shajrawi
View author publications
You can also search for this author in PubMed Google Scholar
Esti Stein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yosi Ben-Asher.

Additional information

This research was supported by THE ISRAEL SCIENCE FOUNDATION grant No. 585/09 and Israel Ministry of Science and Technology (MOST) grant No. 3-6496.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abboud, F., Ben-Asher, Y., Shajrawi, Y. et al. Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations. Int J Parallel Prog 40, 488–513 (2012). https://doi.org/10.1007/s10766-012-0196-7

Download citation

Received: 16 December 2010
Accepted: 17 April 2012
Published: 10 May 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s10766-012-0196-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations

Abstract

Access this article

Similar content being viewed by others

Dartagnan: Leveraging Compiler Optimizations and the Price of Precision (Competition Contribution)

GVLE: a highly optimized GPU-based implementation of variable-length encoding

Speeding Up Non-archimedean Numerical Computations Using AVX-512 SIMD Instructions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations

Abstract

Access this article

Similar content being viewed by others

Dartagnan: Leveraging Compiler Optimizations and the Price of Precision (Competition Contribution)

GVLE: a highly optimized GPU-based implementation of variable-length encoding

Speeding Up Non-archimedean Numerical Computations Using AVX-512 SIMD Instructions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation