ABSTRACT
Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with large number of register file ports. Efficient utilization of a clustered datapath requires careful binding of operations to clusters. The paper proposes a binding algorithm that effectively explores tradeoffs between in-cluster operation serialization and delays associated with data transfers between clusters. Extensive experimental evidence is provided showing that the algorithm generates high quality solutions for basic blocks, with up to 29% improvement over a state-of-the-art advanced binding algorithm.
- 1.A. Capitanio, N. Dutt, and A. Nicolau. Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 292-300, Portland, OR, Dec. 1992. Google ScholarDigital Library
- 2.R. Colwell, W. Hall, C. Joshi, D. Papworth, P. Rodman, and J. Tornes. Architecture and implementation of a VLIW supercomputer. In Proceedings of Supercomputing '90, pages 910 - 919, Branford, CT, Nov. 1990. Google ScholarDigital Library
- 3.G. Desoli. Instruction assignment for clustered VLIW DSP compilers: A new approach. Technical Report HPL-98-13, Hewlett-Packard Company, Feb. 1998.Google Scholar
- 4.P. Faraboschi, G. Brown, J. A. Fisher, and G. Desoli. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, Vancouver, British Columbia, Canada, June 2000. Google ScholarDigital Library
- 5.M. M. Fernandes, J. Llosa, and N. Topham. Distributed modulo scheduling. In Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, pages 130 - 134, Jan. 1999. Google ScholarDigital Library
- 6.E. Ifeachor and B. Jervis. Digital signal processing: A practical approach. Addison-Wesley, 1993. Google ScholarDigital Library
- 7.M. F. Jacome, G. de Veciana, and V. Lapinskii. Exploring performance tradeoffs for clustered VLIW datapaths. In Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design (ICCAD-2000), Nov. 5-9 2000. Google ScholarDigital Library
- 8.C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the Annual International Symposium on Microarchitecture, pages 330-335, 1997. Google ScholarDigital Library
- 9.R. Leupers. Instruction scheduling for clustered VLIW DSPs. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, Philadelphia, PA, Oct. 2000. Google ScholarDigital Library
- 10.E. Nystrom and A. E. Eichenberger. Effective cluster assignment for modulo scheduling. In Proceedings of the 31st Annual International Symposium on Microarchitecture, pages 3-13, Dallas, TX, Nov. 1998. Google ScholarDigital Library
- 11.E. Ozer, S. Banerjia, and T. Conte. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31th Annual Intern. Symposium on Microarchitectures, 1998. Google ScholarDigital Library
- 12.P. G. Paulin and J. P. Knight. Force-directed scheduling in automatic data path synthesis. In Proceedings of the 24th ACM/IEEE Design Automation Conference, pages 195-202, Miami Beach, FL, June 1987. Google ScholarDigital Library
- 13.S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens. Register organization for media processing. In Proceedings of the 26th International Symposium on High-Performance Computer Architecture, May 1999.Google ScholarCross Ref
- 14.J. Sanchez and A. Gonzalez. Instruction scheduling for clustered VLIW architectures. In Proceedings of the 13th International Symposium on System Systhesis (ISSS-13), Madrid, Spain, Sept. 2000. Google ScholarDigital Library
Index Terms
- High-quality operation binding for clustered VLIW datapaths
Recommendations
Application-specific clustered VLIW datapaths: early exploration on a parameterized design space
Specialized clustered very large instruction word (VLIW) processors combined with effective compilation techniques enable aggressive exploitation of the high instruction-level parallelism inherent in many embedded media applications, while unlocking a ...
Comments