Abstract
Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the decision of mapping instructions to clusters be done by the compiler. State-of-the-art code generation for such architectures combines cluster-assignment and instruction scheduling in a single unified pass. The performance of the generated code, however, is very susceptible to the inter-cluster communication latency. This is due to the nature of the two clustering heuristics used. One is aggressive and works well for low inter-cluster latencies, while the other is more conservative and works well only for high latencies.
In this paper we propose LUCAS, a novel unified cluster-assignment and instruction-scheduling algorithm that adapts to the inter-cluster latency better than the existing state-of-the-art schemes. LUCAS is a hybrid scheme that performs fine-grain switching between the two state-of-the art clustering heuristics, leading to better scheduling than either of them. It generates better performing code for a wide range of inter-cluster latency values.
- Gcc: Gnu compiler collection. http://gcc.gnu.org.Google Scholar
- ski ia64 simulator. http://ski.sourceforge.net.Google Scholar
- A. Aletà, J. Codina, J. Sánchez, A. González, and D. Kaeli. Agamos: A graph-based approach to modulo scheduling for clustered microarchitectures. IEEE Transactions on Computers, 2009. Google ScholarDigital Library
- R. Canal, J. M. Parcerisa, A. González, D. D. D. Computadors, and J. Girona. Dynamic cluster assignment mechanisms. In HPCA, 2000.Google Scholar
- A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for vliws: A preliminary analysis of tradeoffs. In MICRO, 1992. Google ScholarDigital Library
- J. Codina, J. Sanchez, and A. Gonzalez. A unified modulo scheduling and register allocation technique for clustered processors. In PACT 2001. Google ScholarDigital Library
- G. Desoli. Instruction assignment for clustered vliw dsp compilers: A new approach. HP Laboratories Technical Report HPL, 1998.Google Scholar
- J. Ellis. Bulldog: A compiler for vliw architectures. Technical report, Yale Univ., 1985.Google Scholar
- P. Faraboschi, G. Brown et al. Lx: a technology platform for customizable vliw embedded processing. In ISCA, 2000. Google ScholarDigital Library
- J. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981. Google ScholarDigital Library
- J. Fritts, F. Steiling, and J. Tucek. Mediabench II video: expediting the next generation of video systems research. In SPIE, 2005.Google ScholarCross Ref
- W. Havanki, S. Banerjia, and T. Conte. Treegion scheduling for wide issue processors. In HPCA, 1998. Google ScholarDigital Library
- W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for vliw and superscalar compilation. The Journal of Supercomputing, 1993. Google ScholarDigital Library
- K. Kailas, K. Ebcioglu, and A. Agrawala. CARS: a new code generation framework for clustered ilp processors. In HPCA, 2001. Google ScholarDigital Library
- R. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 1999. Google ScholarDigital Library
- V. Lapinskii, M. Jacome, and G. De Veciana. Cluster assignment for high-performance embedded vliw processors. ACM TODAES, 2002. Google ScholarDigital Library
- P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'donnell, and J. C. Ruttenberg. The multiflow trace scheduling compiler. Journal of Supercomputing, 1993. Google ScholarDigital Library
- S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In MICRO, 1992. Google ScholarDigital Library
- C. McNairy and D. Soltis. Itanium 2 processor microarchitecture. IEEE Micro, 2003. Google ScholarDigital Library
- S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann, 1997. Google ScholarDigital Library
- E. Ozer, S. Banerjia, and T. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. In MICRO, 1998. Google ScholarDigital Library
- S. Palacharla, N. Jouppi, and J. Smith. Complexity-effective superscalar processors. In ISCA, 1997. Google ScholarDigital Library
- K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. Keckler, and C. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ISCA, 2003. Google ScholarDigital Library
- H. Sharangpani and H. Arora. Itanium processor microarchitecture. IEEE Micro, 2000. Google ScholarDigital Library
- M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J. Lee, W. Lee, et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. In IEEE Micro, 2002. Google ScholarDigital Library
- A. Terechko and H. Corporaal. Inter-cluster communication in vliw architectures. ACM TACO, 2007. Google ScholarDigital Library
- J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Modulo scheduling with integrated register spilling for clustered vliw architectures. In MICRO, 2001. Google ScholarDigital Library
- X. Zhang, H. Wu, and J. Xue. An efficient heuristic for instruction scheduling on clustered vliw processors. In CASES, 2011. Google ScholarDigital Library
Index Terms
- LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
Recommendations
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsClustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the ...
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsClustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the ...
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded SystemsClustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and ...
Comments