Skip to main content
Log in

Design and Implementation of a High-Performance and Complexity-Effective VLIW DSP for Multimedia Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

This paper presents the design and implementation of a novel VLIW digital signal processor (DSP) for multimedia applications. The DSP core embodies a distributed & ping-pong register file, which saves 76.8% silicon area and improves 46.9% access time of centralized ones found in most VLIW processors by restricting its access patterns. However, it still has comparable performance (estimated in cycles) with state-of-the-art DSP for multimedia applications. A hierarchical instruction encoding scheme is also adopted to reduce the program sizes to 24.1∼26.0%. The DSP has been fabricated in the UMC 0.13 μm 1P8M Copper Logic Process, and it can operate at 333 MHz while consuming 189 mW power. The core size is 3.2 × 3.15 mm2 including 160 KB on-chip SRAM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. P. Lapsley, J. Bier, and E. A. Lee, “DSP Processor Fundamentals - Architectures and Features,” IEEE Press, 1996.

  2. Y. H. Hu, “Programmable Digital Signal Processors—Architecture, Programming, and Applications,” Marcel Dekker, 2002.

  3. J. A. Fisher, P. Faraboschi, and C. Young, “Embedded Computing—A VLIW Approach to Architecture, Compiler, and Tools,” Morgan Kaufmann, 2005.

  4. S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J. D. Owens, “Register Organization for Media Processing,” in Proc. HPCA, 2000, pp. 375–386.

  5. A. Terechko, E. L. Thenaff, M. Garg, J. Eijndhoven and H. Corporaal, “Inter-Cluster Communication Models for Clustered VLIW Processors,” in Proc. HPCA, 2003, pp. 354–364.

  6. T. J. Lin, P. C. Hsiao, C. W. Liu, and C. W. Jen, “Area-Efficient Register Organization for Fully-Synthesizable VLIW DSP Cores,” Int. J. Electr. Eng., vol. 13, pp. 117–127, May 2006.

    Google Scholar 

  7. P. Faraboschi, G. Brown, J. A. Fisher, G. Desoll and F. M. O. Homewood, “Lx: A Technology Platform for Customizable VLIW Embedded Processing,” in Proc. ISCA, 2000, pp. 203–213.

  8. G. G. Pechanek and S. Vassiliadis, “The ManArray Embedded Processor Architecture,” in Proc. Euromicro Conf., 2000, pp. 348–355.

  9. TMS320C64x DSP Generation. http://www.ti.com.

  10. K. Arora, H. Sharangpani, and R. Gupta, “Copied Register Files for Data Processors Having Many Execution Units,” US Patent 6,629,232, Sep. 30, 2003.

  11. A. Kowalczyk et al., “The First MAJC Microprocessor: A Dual CPU System-On-a-Chip,” IEEE J. Solid-State Circuits, vol. 36, pp. 1609–1616, Nov. 2001.

    Article  Google Scholar 

  12. T. J. Lin et al., “Performance Evaluation of Ring-Structure Register File in Multimedia Applications,” in Proc. ICME, July 2003.

  13. A. V. Oppenheim, R. W. Schafer, and J. R. Buck, “Discrete-Time Signal Processing, 2nd ed.,” Prentice Hall, 1999.

  14. Independent JPEG Group. http://www.ijg.org.

  15. H. Pan and K. Asanovic, “Heads and Tails: A Variable-Length Instruction Format Supporting Parallel Fetch and Decode,” in Proc. CASES, 2001.

  16. T. Kumura, M. Ikekawa, M. Yoshida, and I. Kuroda, “VLIW DSP for Mobile Applications,” IEEE Signal Process. Mag., pp. 10–21, July 2002.

  17. G. Fettweis, M. Bolle, J. Kneip, and M. Weiss, “OnDSP: A New Architecture for Wireless LAN Applications,” Presented at Embedded Processor Forum, San Jose, 2002.

  18. T. J. Lin et al., “A Unified Processor Architecture for RISC & VLIW DSP,” in Proc. GLSVLSI, Apr. 2005.

  19. TMS320C55x DSP Generation. http://www.ti.com.

  20. R. K. Kolagotla et al., “High-Performance Dual-MAC DSP Architecture,” IEEE Signal Process. Mag., pp. 42–53, July 2002.

  21. J. P. Shen and M. H. Lipasti, “Modern Processor Design—Fundamental of Superscalar Processors,” McGraw-Hill, 2005.

  22. M. Keating and P. Bricaud, “Reuse Methodology Manual—For System-on-a-Chip Designs, 3rd ed.,” Kluwer, 2002.

  23. D. Chinnery and K. Keutzer, “Closing the Gap Between ASIC & Custom—Tools and Techniques for High-Performance ASIC Design,” Kluwer, 2002.

  24. J. Bhasker, “A SystemC Primer,” Star Galaxy Publishing, 2002.

  25. J. Bergeron, “Writing Testbenches—Functional Verification of HDL Models, 2nd ed.,” Kluwer, 2003.

  26. Versatile Platform Baseboard for ARM926EJ-S. http://www.arm.com/.

  27. I. E. G. Richardson, “H.264 and MPEG-4 Video Compression,” Wiley, 2003.

  28. J. L Hennessy and D. A. Patterson, Computer Architecture—A Quantitative Approach, 3rd ed.,” Morgan Kaufmann, 2002.

  29. W. B. Pennebaker and J. L. Mitchell, JPEG—Still Image Data Compression Standard, Van Nostrand Reinhold, 1993.

  30. Y. C. Lin, Y. P. You, and J. K. Lee, “Register Allocation for VLIW DSP Processors with Irregular Register Files,” in Proc. CPC, 2006.

  31. Intel 64 and IA-32 Architectures Software Developer’s Manual, Intel, Nov. 2006.

  32. The Thumb Architecture Extension. http://www.arm.com.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tay-Jyi Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, TJ., Chen, SK., Kuo, YT. et al. Design and Implementation of a High-Performance and Complexity-Effective VLIW DSP for Multimedia Applications. J Sign Process Syst Sign Image 51, 209–223 (2008). https://doi.org/10.1007/s11265-007-0061-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-007-0061-x

Keywords

Navigation