Skip to main content
Log in

Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Digital signal processors (DSPs) with very long instruction word (VLIW) data-path architectures are increasingly being deployed on embedded devices in video and other multimedia processing applications. To reduce the power consumption and design cost of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports associated with register files. This presents challenges for compilers attempting to generate efficient codes. In this paper we present an instruction scheduling method and phase ordering framework for such an architecture based on the well-known PALF scheme. The PALF scheme first performs bank partitioning followed by register allocation and then instruction scheduling. Our contribution includes the insertion of a pseudo instruction scheduler that performs bank assignment analysis before PALF assignment. We also enhance the PALF scheme by utilizing the program graph with cycle information generated by our pseudo scheduler. Finally, a ping-pong-aware scheduling policy is used in the scheduling phases to address the issue of limited temporal connectivities among register banks for DSP processors. Experiments were performed on an instruction set simulator for Parallel Architecture Core DSP processors based on the Open64 compiler infrastructure. Preliminary experiments with the EEMBC and MiBench benchmarks show that a compiler based on our proposed scheme for handling hardware constraints of VLIW scheduling on distributed register files exhibits performance superior to that of the PALF scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Capitanio A, Dutt N, Nicolau A (1992) Partitioned register files for VLIW’s: a preliminary analysis of tradeoffs. In: Proceedings of the 25th annual international symposium on microarchitecture, December, pp 292–300

    Google Scholar 

  2. Texas Instruments (2000) TMS320C64x technical overview. Texas Instruments, Feb

  3. CEVA (2004) CEVA-X1620 datasheet. CEVA

  4. Chang D, Baron M (2004) Taiwan’s roadmap to leadership in design. Microprocessor Report, In-Stat/MDR, December

  5. Leupers R (2000) Instruction scheduling for clustered VLIW DSPs. In: Proceedings of international conference on parallel architecture and compilation techniques, October, pp 291–300

    Google Scholar 

  6. Qian Y, Carr S, Sweany PH (2002) Optimizing loop performance for clustered VLIW architectures. In: International conference on parallel architectures and compilation techniques, September

    Google Scholar 

  7. Lin Y-C, You Y-P, Lee J-K (2007) PALF: compiler supports for irregular register files in clustered VLIW DSP processors. In: Concurrency and computation: practice and experience

    Google Scholar 

  8. Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2):291–307

    MATH  Google Scholar 

  9. Pothen A, Simon HD, Liou K-P (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl, July

  10. Karypis G, Kumar V (1999) Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th annual ACM/IEEE design automation conference

    Google Scholar 

  11. Wu C-J, Chen S-Y, Lee J-K (2006) Copy propagation optimizations for vliw dsp processors with distributed register files. Lang Compilers Parallel Comput. doi:10.1007/978-3-540-72521-3_19

  12. Lu C-H, Lin Y-C, You Y-P, Lee J-K (2008) LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with non-uniform register files. Concurr Comput Pract Exp. doi:10.1002/cpe.v21:1

  13. Wu C-J, Lu C-H, Lee J-J (2009) Expression rematerialization for VLIW DSP processors with distributed register files. In: International workshop on compilers for parallel computing, January

    Google Scholar 

  14. Lin TJ, Chang CC, Lee CC, Jen CW (2003) An efficient VLIW DSP architecture for baseband processing. In: Proceedings of the 21th international conference on computer design

    Google Scholar 

  15. Lin T-J, Chao C-M, Liu C-H, Hsiao P-C, Chen S-K, Lin L-C, Liu C-W, Jen C-W (2005) Computer architecture: a unified processor architecture for RISC & VLIW DSP. In: Proceedings of the 15th ACM Great Lakes symposium on VLSI, April

    Google Scholar 

  16. Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) MiBench: a free, commercially representative embedded benchmark suite. In: IEEE 4th annual workshop on workload characterization, December

    Google Scholar 

  17. EDN Embedded Microprocessor Benchmark Consortium (2011) http://www.eembc.org

  18. Cutcutache I, Wong W-F (2008) Fast, frequency-based, integrated register allocation and instruction scheduling. In: Software: practice and experience

    Google Scholar 

  19. Ivanov DS (2010) Register allocation with instruction scheduling for VLIW-architectures. Program Comput Softw. doi:10.1134/S0361768810060058

  20. Kim D-H, Lee H-J (2009) Fine-grain register allocation and instruction scheduling in a reference flow. Comput J. doi:10.1093/comjnl/bxp056

  21. Ellis JR (1986) Bulldog: a compiler for VLIW architectures. MIT Press, Cambridge

    Google Scholar 

  22. Lowney PG, Freudenberger SM, Karzes TJ, Lichtenstein WD, Nix RP, O’Donell JS, Ruttenberf JC (1993) The multiflow trace scheduling compiler. J Supercomput 7:51–142

    Article  Google Scholar 

  23. Capitanio A, Dutt N, Nicolau A (1993) Design considerations for limited connectivity VLIW architectures. Technical Report, department of information and computer science, University of California, Irvine

  24. Mercaldi M, Swanson S, Petersen A, Putnam A, Schwerin A, Oskin M, Eggers SJ (2006) Instruction scheduling for a tiled dataflow architecture. In: Proceedings of the 12th international conference on architectural support for programming languages and operating systems

    Google Scholar 

  25. Swanson S, Michelson K, Schwerin A, Oskin M (2003) WaveScalar. In: Proceedings of the international symposium on microarchitecture

    Google Scholar 

  26. Desoli G (1998) Instruction assignment for clustered VLIW DSP compilers: a new approach. Technical Report, Hewlett-Packard Laboratories

  27. Ozer E, Banerjia S, Conte TM (1998) Unified assign and schedule: a new approach to scheduling for clustered register files micro architectures. In: Proceedings of the 31st annual international symposium on microarchitecture, November

    Google Scholar 

  28. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science. doi:10.1126/science.220.4598.671

  29. Chaitin GJ, Auslander MA, Chandra AK, Cocke J, Hopkins ME, Markstein PW (1981) Register allocation via coloring. Comput Lang 6:47–57

    Article  Google Scholar 

  30. Chaitin GJ (1982) Register allocation and spilling via graph coloring. In: Proceedings of the ACM SIGPLAN 1982 symposium on compiler construction, pp 201–207

    Google Scholar 

  31. Briggs P, Cooper KD, Torczon L (1992) Rematerialization. In: Conference on programming language design and implementation

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenq-Kuen Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CJ., Lin, YT. & Lee, JK. Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files. J Supercomput 61, 1024–1047 (2012). https://doi.org/10.1007/s11227-011-0671-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0671-8

Keywords

Navigation