Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files

Wu, Chung-Ju; Lin, Yu-Te; Lee, Jenq-Kuen

doi:10.1007/s11227-011-0671-8

Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files

Published: 30 August 2011

Volume 61, pages 1024–1047, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chung-Ju Wu¹,
Yu-Te Lin¹ &
Jenq-Kuen Lee¹

136 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

Digital signal processors (DSPs) with very long instruction word (VLIW) data-path architectures are increasingly being deployed on embedded devices in video and other multimedia processing applications. To reduce the power consumption and design cost of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports associated with register files. This presents challenges for compilers attempting to generate efficient codes. In this paper we present an instruction scheduling method and phase ordering framework for such an architecture based on the well-known PALF scheme. The PALF scheme first performs bank partitioning followed by register allocation and then instruction scheduling. Our contribution includes the insertion of a pseudo instruction scheduler that performs bank assignment analysis before PALF assignment. We also enhance the PALF scheme by utilizing the program graph with cycle information generated by our pseudo scheduler. Finally, a ping-pong-aware scheduling policy is used in the scheduling phases to address the issue of limited temporal connectivities among register banks for DSP processors. Experiments were performed on an instruction set simulator for Parallel Architecture Core DSP processors based on the Open64 compiler infrastructure. Preliminary experiments with the EEMBC and MiBench benchmarks show that a compiler based on our proposed scheme for handling hardware constraints of VLIW scheduling on distributed register files exhibits performance superior to that of the PALF scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

A Modern Primer on Processing in Memory

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

Article Open access 20 February 2024

References

Capitanio A, Dutt N, Nicolau A (1992) Partitioned register files for VLIW’s: a preliminary analysis of tradeoffs. In: Proceedings of the 25th annual international symposium on microarchitecture, December, pp 292–300
Google Scholar
Texas Instruments (2000) TMS320C64x technical overview. Texas Instruments, Feb
CEVA (2004) CEVA-X1620 datasheet. CEVA
Chang D, Baron M (2004) Taiwan’s roadmap to leadership in design. Microprocessor Report, In-Stat/MDR, December
Leupers R (2000) Instruction scheduling for clustered VLIW DSPs. In: Proceedings of international conference on parallel architecture and compilation techniques, October, pp 291–300
Google Scholar
Qian Y, Carr S, Sweany PH (2002) Optimizing loop performance for clustered VLIW architectures. In: International conference on parallel architectures and compilation techniques, September
Google Scholar
Lin Y-C, You Y-P, Lee J-K (2007) PALF: compiler supports for irregular register files in clustered VLIW DSP processors. In: Concurrency and computation: practice and experience
Google Scholar
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2):291–307
MATH Google Scholar
Pothen A, Simon HD, Liou K-P (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl, July
Karypis G, Kumar V (1999) Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th annual ACM/IEEE design automation conference
Google Scholar
Wu C-J, Chen S-Y, Lee J-K (2006) Copy propagation optimizations for vliw dsp processors with distributed register files. Lang Compilers Parallel Comput. doi:10.1007/978-3-540-72521-3_19
Lu C-H, Lin Y-C, You Y-P, Lee J-K (2008) LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with non-uniform register files. Concurr Comput Pract Exp. doi:10.1002/cpe.v21:1
Wu C-J, Lu C-H, Lee J-J (2009) Expression rematerialization for VLIW DSP processors with distributed register files. In: International workshop on compilers for parallel computing, January
Google Scholar
Lin TJ, Chang CC, Lee CC, Jen CW (2003) An efficient VLIW DSP architecture for baseband processing. In: Proceedings of the 21th international conference on computer design
Google Scholar
Lin T-J, Chao C-M, Liu C-H, Hsiao P-C, Chen S-K, Lin L-C, Liu C-W, Jen C-W (2005) Computer architecture: a unified processor architecture for RISC & VLIW DSP. In: Proceedings of the 15th ACM Great Lakes symposium on VLSI, April
Google Scholar
Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) MiBench: a free, commercially representative embedded benchmark suite. In: IEEE 4th annual workshop on workload characterization, December
Google Scholar
EDN Embedded Microprocessor Benchmark Consortium (2011) http://www.eembc.org
Cutcutache I, Wong W-F (2008) Fast, frequency-based, integrated register allocation and instruction scheduling. In: Software: practice and experience
Google Scholar
Ivanov DS (2010) Register allocation with instruction scheduling for VLIW-architectures. Program Comput Softw. doi:10.1134/S0361768810060058
Kim D-H, Lee H-J (2009) Fine-grain register allocation and instruction scheduling in a reference flow. Comput J. doi:10.1093/comjnl/bxp056
Ellis JR (1986) Bulldog: a compiler for VLIW architectures. MIT Press, Cambridge
Google Scholar
Lowney PG, Freudenberger SM, Karzes TJ, Lichtenstein WD, Nix RP, O’Donell JS, Ruttenberf JC (1993) The multiflow trace scheduling compiler. J Supercomput 7:51–142
Article Google Scholar
Capitanio A, Dutt N, Nicolau A (1993) Design considerations for limited connectivity VLIW architectures. Technical Report, department of information and computer science, University of California, Irvine
Mercaldi M, Swanson S, Petersen A, Putnam A, Schwerin A, Oskin M, Eggers SJ (2006) Instruction scheduling for a tiled dataflow architecture. In: Proceedings of the 12th international conference on architectural support for programming languages and operating systems
Google Scholar
Swanson S, Michelson K, Schwerin A, Oskin M (2003) WaveScalar. In: Proceedings of the international symposium on microarchitecture
Google Scholar
Desoli G (1998) Instruction assignment for clustered VLIW DSP compilers: a new approach. Technical Report, Hewlett-Packard Laboratories
Ozer E, Banerjia S, Conte TM (1998) Unified assign and schedule: a new approach to scheduling for clustered register files micro architectures. In: Proceedings of the 31st annual international symposium on microarchitecture, November
Google Scholar
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science. doi:10.1126/science.220.4598.671
Chaitin GJ, Auslander MA, Chandra AK, Cocke J, Hopkins ME, Markstein PW (1981) Register allocation via coloring. Comput Lang 6:47–57
Article Google Scholar
Chaitin GJ (1982) Register allocation and spilling via graph coloring. In: Proceedings of the ACM SIGPLAN 1982 symposium on compiler construction, pp 201–207
Google Scholar
Briggs P, Cooper KD, Torczon L (1992) Rematerialization. In: Conference on programming language design and implementation
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Tsing-Hua University, Hsinchu, 30013, Taiwan
Chung-Ju Wu, Yu-Te Lin & Jenq-Kuen Lee

Authors

Chung-Ju Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Te Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jenq-Kuen Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jenq-Kuen Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CJ., Lin, YT. & Lee, JK. Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files. J Supercomput 61, 1024–1047 (2012). https://doi.org/10.1007/s11227-011-0671-8

Download citation

Published: 30 August 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11227-011-0671-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files

Abstract

Access this article

Similar content being viewed by others

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

Comprehensive analysis of energy efficiency and performance of ARM and RISC-V SoCs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation