skip to main content
10.1145/1356058.1356085acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Compiling for vector-thread architectures

Published: 06 April 2008 Publication History

Abstract

Vector-thread (VT) architectures exploit multiple forms of parallelism simultaneously. This paper describes a compiler for the Scale VT architecture, which takes advantage of the VT features. We focus on compiling loops, and show how the compiler can transform code that poses difficulties for traditional vector or VLIW processors, such as loops with internal control flow or cross-iteration dependences, while still taking advantage of features not supported by multithreaded designs, such as vector memory instructions. We evaluate the compiler using several embedded benchmarks and show that we can obtain substantial speedups over a single-issue, in-order scalar machine.

References

[1]
EEMBC. http://www.eembc.org/.
[2]
GCC, the GNU Compiler Collection. http://gcc.gnu.org/.
[3]
Scale Home Page. http://www--ali.cs.umass.edu/scale/.
[4]
J. R. Allen et al. Conversion of control dependence to data dependence. In POPL--10, pages 177--189, January 1983.
[5]
R. Allen and K. Kennedy. Optimizing compilers for modern architectures: a dependence--based approach. Morgan Kaufmann Publishers, 2001.
[6]
K. Asanovic et al. Energy-exposed instruction sets. In Power Aware Computing, chapter 5. Kluwer Academic/Plenum Publishers, June 2002.
[7]
C. Batten et al. Cache refill/access decoupling for vector machines. In MICRO--37, pages 331--342, December 2004.
[8]
T. Bernard et al. A microthreaded architecture and its compiler. In Proceedings of the 12th International Workshop on Compilers for Parallel Computers, pages 326--340, January 2006.
[9]
T. c. Chiueh. Multi--threaded vectorization. In ISCA--18, pages 352--361, May 1991.
[10]
L. N. Chakrapani et al. Trimaran: an infrastructure for research in instruction--level parallelism. Lecture Notes in Computer Science, 3602:32--41, 2005.
[11]
M. Chu, K. Fan, and S. Mahlke. Region--based hierarchical operation partitioning for multicluster processors. In PLDI 2003, pages 300--311, June 2003.
[12]
K. Coons et al. A spatial path scheduling algorithm for EDGE architectures. In ASPLOS--12, pages 129--140, October 2006.
[13]
A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In PACT--15, pages 33--42, September 2006.
[14]
A. E. Eichenberger et al. Optimizing compiler for the CELL processor. In PACT--14, pages 161--172, September 2005.
[15]
M. M. Islam et al. Limits on thread--level speculative parallelism in embedded applications. In INTERACT--11, pages 40--49, February 2007.
[16]
C. Jesshope. Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines. In Proceedings of the 6th Australasian Conference on Computer Systems Architecture, pages 80--88, January 2001.
[17]
A. Kejariwal et al. Challenges in exploitation of loop parallelism in embedded applications. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, pages 173--180, October 2006.
[18]
B. Khailany et al. Imagine: media processing with streams. IEEE Micro, 21(2):35--46, March/April 2001.
[19]
R. Krashinsky et al. The vector--thread architecture. In ISCA--31, pages 52--63, June 2004.
[20]
R. Krashinsky et al. The vector--thread architecture. IEEE Micro, 24(6):84--90, November 2004.
[21]
R. M. Krashinsky. Vector--thread architecture and implementation. PhD thesis, Massachusetts Institute of Technology, June 2007.
[22]
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI 2000, pages 145--156, June 2000.
[23]
S. Larsen, R. Rabbah, and S. Amarasinghe. Exploiting vector parallelism in software pipelined loops. In MICRO--38, pages 119--129, November 2005.
[24]
D. B. Loveman. Program improvement by source-to-source transformation. Journal of the ACM, 24(1):121--145, January 1977.
[25]
R. Nagarajan et al. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures. In PACT--13, pages 74--84, September-October 2004.
[26]
C. J. Newburn, A. S. Huang, and J. P. Shen. Balancing ne- and medium-grained parallelism in scheduling loops for the XIMD architecture. In Proceedings of the IFIP WG10.3 Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, pages 39--52, January 1993.
[27]
K. Sankaralingam et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ISCA--30, pages 422--433, June 2003.
[28]
J. Shin. Introducing control ow into vectorized code. In PACT--16, September 2007.
[29]
J. Shin, M. Hall, and J. Chame. Evaluating compiler technology for control-ow optimizations for multimedia extension architectures. In 6th Workshop on Media and Streaming Processors, December 2004.
[30]
J. Shin, M. Hall, and J. Chame. Superword--level parallelism in the presence of control ow. In CGO 2005, pages 165--175, March 2005.
[31]
A. Smith et al. Compiling for EDGE architectures. In CGO--4, pages 185--195, March 2006.
[32]
R. Tarjan. Depth first search and linear graph algorithms. SIAM Journal of Computing, 1(2):146--160, June 1972.
[33]
X. Tian et al. Exploiting thread-level and instruction-level parallelism for Hyper-Threading Technology. Intel Developer Update Magazine, January 2003.
[34]
R. P. Wilson et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Notices, 29(12):31--37, December 1994.
[35]
A. Wolfe and J. P. Shen. A variable instruction stream extension to the VLIW architecture. In ASPLOS--4, pages 2--14, April 1991.

Cited By

View all
  • (2023)OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593735(398-409)Online publication date: 21-Jun-2023
  • (2021)Temporal vectorization for stencilsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476149(1-13)Online publication date: 14-Nov-2021
  • (2018)Loop-nest Auto-vectorization Method Based on Benefit AnalysisProceedings of the 2nd International Conference on Advances in Image Processing10.1145/3239576.3239620(240-244)Online publication date: 16-Jun-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
April 2008
235 pages
ISBN:9781595939784
DOI:10.1145/1356058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code generation
  2. compilers
  3. vector processors

Qualifiers

  • Research-article

Conference

CGO '08

Acceptance Rates

CGO '08 Paper Acceptance Rate 21 of 66 submissions, 32%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUsProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593735(398-409)Online publication date: 21-Jun-2023
  • (2021)Temporal vectorization for stencilsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476149(1-13)Online publication date: 14-Nov-2021
  • (2018)Loop-nest Auto-vectorization Method Based on Benefit AnalysisProceedings of the 2nd International Conference on Advances in Image Processing10.1145/3239576.3239620(240-244)Online publication date: 16-Jun-2018
  • (2016)FlexVec: auto-vectorization for irregular loopsACM SIGPLAN Notices10.1145/2980983.290811151:6(697-710)Online publication date: 2-Jun-2016
  • (2016)FlexVec: auto-vectorization for irregular loopsProceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2908080.2908111(697-710)Online publication date: 2-Jun-2016
  • (2015)SIMD vectorization of nested loop based on strip mining2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)10.1109/SNPD.2015.7176176(1-7)Online publication date: Jun-2015
  • (2013)Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel AcceleratorsACM Transactions on Computer Systems10.1145/249146431:3(1-38)Online publication date: 1-Aug-2013
  • (2011)Exploring the tradeoffs between programmability and efficiency in data-parallel acceleratorsACM SIGARCH Computer Architecture News10.1145/2024723.200008039:3(129-140)Online publication date: 4-Jun-2011
  • (2011)Exploring the tradeoffs between programmability and efficiency in data-parallel acceleratorsProceedings of the 38th annual international symposium on Computer architecture10.1145/2000064.2000080(129-140)Online publication date: 4-Jun-2011
  • (2009)Stream Compilation for Real-Time Embedded Multicore SystemsProceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO.2009.27(210-220)Online publication date: 22-Mar-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media