skip to main content
research-article

LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Published:20 June 2013Publication History
Skip Abstract Section

Abstract

Clustered VLIW architectures are statically scheduled wide-issue architectures that combine the advantages of wide-issue processors along with the power and frequency scalability of clustered designs. Being statically scheduled, they require that the decision of mapping instructions to clusters be done by the compiler. State-of-the-art code generation for such architectures combines cluster-assignment and instruction scheduling in a single unified pass. The performance of the generated code, however, is very susceptible to the inter-cluster communication latency. This is due to the nature of the two clustering heuristics used. One is aggressive and works well for low inter-cluster latencies, while the other is more conservative and works well only for high latencies.

In this paper we propose LUCAS, a novel unified cluster-assignment and instruction-scheduling algorithm that adapts to the inter-cluster latency better than the existing state-of-the-art schemes. LUCAS is a hybrid scheme that performs fine-grain switching between the two state-of-the art clustering heuristics, leading to better scheduling than either of them. It generates better performing code for a wide range of inter-cluster latency values.

References

  1. Gcc: Gnu compiler collection. http://gcc.gnu.org.Google ScholarGoogle Scholar
  2. ski ia64 simulator. http://ski.sourceforge.net.Google ScholarGoogle Scholar
  3. A. Aletà, J. Codina, J. Sánchez, A. González, and D. Kaeli. Agamos: A graph-based approach to modulo scheduling for clustered microarchitectures. IEEE Transactions on Computers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Canal, J. M. Parcerisa, A. González, D. D. D. Computadors, and J. Girona. Dynamic cluster assignment mechanisms. In HPCA, 2000.Google ScholarGoogle Scholar
  5. A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for vliws: A preliminary analysis of tradeoffs. In MICRO, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Codina, J. Sanchez, and A. Gonzalez. A unified modulo scheduling and register allocation technique for clustered processors. In PACT 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Desoli. Instruction assignment for clustered vliw dsp compilers: A new approach. HP Laboratories Technical Report HPL, 1998.Google ScholarGoogle Scholar
  8. J. Ellis. Bulldog: A compiler for vliw architectures. Technical report, Yale Univ., 1985.Google ScholarGoogle Scholar
  9. P. Faraboschi, G. Brown et al. Lx: a technology platform for customizable vliw embedded processing. In ISCA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Fritts, F. Steiling, and J. Tucek. Mediabench II video: expediting the next generation of video systems research. In SPIE, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  12. W. Havanki, S. Banerjia, and T. Conte. Treegion scheduling for wide issue processors. In HPCA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: An effective technique for vliw and superscalar compilation. The Journal of Supercomputing, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Kailas, K. Ebcioglu, and A. Agrawala. CARS: a new code generation framework for clustered ilp processors. In HPCA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Lapinskii, M. Jacome, and G. De Veciana. Cluster assignment for high-performance embedded vliw processors. ACM TODAES, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'donnell, and J. C. Ruttenberg. The multiflow trace scheduling compiler. Journal of Supercomputing, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In MICRO, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. McNairy and D. Soltis. Itanium 2 processor microarchitecture. IEEE Micro, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Ozer, S. Banerjia, and T. Conte. Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures. In MICRO, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Palacharla, N. Jouppi, and J. Smith. Complexity-effective superscalar processors. In ISCA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. Keckler, and C. Moore. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In ISCA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Sharangpani and H. Arora. Itanium processor microarchitecture. IEEE Micro, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J. Lee, W. Lee, et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. In IEEE Micro, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Terechko and H. Corporaal. Inter-cluster communication in vliw architectures. ACM TACO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Modulo scheduling with integrated register spilling for clustered vliw architectures. In MICRO, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Zhang, H. Wu, and J. Xue. An efficient heuristic for instruction scheduling on clustered vliw processors. In CASES, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 5
            LCTES '13
            May 2013
            165 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2499369
            Issue’s Table of Contents
            • cover image ACM Conferences
              LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
              June 2013
              184 pages
              ISBN:9781450320856
              DOI:10.1145/2491899

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 June 2013

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader