research-article

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Authors:
Vasileios Porpodas

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Marcelo Cintra

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 48 Issue 5May 2013pp 45–54https://doi.org/10.1145/2499369.2465565

Published:20 June 2013Publication History

ACM SIGPLAN Notices

Abstract

Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the decision of mapping instructions to clusters be done by the compiler. State-of-the-art code generation for such architectures combines cluster-assignment and instruction scheduling in a single unified pass. The performance of the generated code, however, is very susceptible to the inter-cluster communication latency. This is due to the nature of the two clustering heuristics used. One is aggressive and works well for low inter-cluster latencies, while the other is more conservative and works well only for high latencies.

In this paper we propose LUCAS, a novel unified cluster-assignment and instruction-scheduling algorithm that adapts to the inter-cluster latency better than the existing state-of-the-art schemes. LUCAS is a hybrid scheme that performs fine-grain switching between the two state-of-the art clustering heuristics, leading to better scheduling than either of them. It generates better performing code for a wide range of inter-cluster latency values.

References

Gcc: Gnu compiler collection. http://gcc.gnu.org.Google Scholar
ski ia64 simulator. http://ski.sourceforge.net.Google Scholar
A. Aletà, J. Codina, J. Sánchez, A. González, and D. Kaeli. Agamos: A graph-based approach to modulo scheduling for clustered microarchitectures. IEEE Transactions on Computers, 2009. Google ScholarDigital Library
R. Canal, J. M. Parcerisa, A. GonzÃ¡lez, D. D. D. Computadors, and J. Girona. Dynamic cluster assignment mechanisms. In HPCA, 2000.Google Scholar
A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for vliws: A preliminary analysis of tradeoffs. In MICRO, 1992. Google ScholarDigital Library
J. Codina, J. Sanchez, and A. Gonzalez. A unified modulo scheduling and register allocation technique for clustered processors. In PACT 2001. Google ScholarDigital Library
G. Desoli. Instruction assignment for clustered vliw dsp compilers: A new approach. HP Laboratories Technical Report HPL, 1998.Google Scholar
J. Ellis. Bulldog: A compiler for vliw architectures. Technical report, Yale Univ., 1985.Google Scholar
P. Faraboschi, G. Brown et al. Lx: a technology platform for customizable vliw embedded processing. In ISCA, 2000. Google ScholarDigital Library
J. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981. Google ScholarDigital Library
J. Fritts, F. Steiling, and J. Tucek. Mediabench II video: expediting the next generation of video systems research. In SPIE, 2005.Google ScholarCross Ref
W. Havanki, S. Banerjia, and T. Conte. Treegion scheduling for wide issue processors. In HPCA, 1998. Google ScholarDigital Library
W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for vliw and superscalar compilation. The Journal of Supercomputing, 1993. Google ScholarDigital Library
K. Kailas, K. Ebcioglu, and A. Agrawala. CARS: a new code generation framework for clustered ilp processors. In HPCA, 2001. Google ScholarDigital Library
R. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 1999. Google ScholarDigital Library
V. Lapinskii, M. Jacome, and G. De Veciana. Cluster assignment for high-performance embedded vliw processors. ACM TODAES, 2002. Google ScholarDigital Library
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'donnell, and J. C. Ruttenberg. The multiflow trace scheduling compiler. Journal of Supercomputing, 1993. Google ScholarDigital Library
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In MICRO, 1992. Google ScholarDigital Library
C. McNairy and D. Soltis. Itanium 2 processor microarchitecture. IEEE Micro, 2003. Google ScholarDigital Library
S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann, 1997. Google ScholarDigital Library
E. Ozer, S. Banerjia, and T. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. In MICRO, 1998. Google ScholarDigital Library
S. Palacharla, N. Jouppi, and J. Smith. Complexity-effective superscalar processors. In ISCA, 1997. Google ScholarDigital Library
K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. Keckler, and C. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ISCA, 2003. Google ScholarDigital Library
H. Sharangpani and H. Arora. Itanium processor microarchitecture. IEEE Micro, 2000. Google ScholarDigital Library
M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J. Lee, W. Lee, et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. In IEEE Micro, 2002. Google ScholarDigital Library
A. Terechko and H. Corporaal. Inter-cluster communication in vliw architectures. ACM TACO, 2007. Google ScholarDigital Library
J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Modulo scheduling with integrated register spilling for clustered vliw architectures. In MICRO, 2001. Google ScholarDigital Library
X. Zhang, H. Wu, and J. Xue. An efficient heuristic for instruction scheduling on clustered vliw processors. In CASES, 2011. Google ScholarDigital Library

Index Terms

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the ...
Read More
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the ...
Read More
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 48, Issue 5
LCTES '13
May 2013
165 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2499369
Issue’s Table of Contents
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
June 2013
184 pages
ISBN:9781450320856
DOI:10.1145/2491899
General Chair:
Björn Franke
University of Edinburgh, UK
,
Program Chair:
Jingling Xue
University of New South Wales, Australia
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2013
Check for updates
Author Tags
cluster assignment
clustered VLIW
instruction scheduling
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 113
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors