Skip to main content
Log in

Abstract

Very Long Instruction Word (VLIW) processor architectures for multimedia applications are discussed from an algorithm, hardware and system based point of view. VLIW processors show high flexibility and processing power, as well as a good utilization of resources by compiler-generated code, but their exclusive exploitation of instruction level parallelism (ILP) decreases in efficiency as the degree of parallelism increases. This is mainly caused by characteristics of multimedia algorithms, increasing wiring delays, compiler restrictions, and a widening gap between on-chip processing speed and available bandwidth to external memory. As new multimedia applications and standards continue to evolve (MPEG-4), the demand for higher processing power will continue. Therefore, parallel processing in all its available forms will have to be exploited to achieve significant performance improvements. We show that, due to the diminishing returns from a further increase in ILP, multimedia applications will benefit more from an additional exploitation of parallelism at thread-level. We examine how simultaneous multithreading (SMT), a novel architectural approach combining VLIW techniques with parallel processing of threads, can efficiently be used to further increase performance of typical multimedia workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. P.N. Glaskowsky, “First media processors reach the market,” Microprocessor Report, Vol. 11, No.1, Jan. 27, 1997.

  2. P. Pirsch, A. Freimann, and M. Berekovic, “Multimedia signal processors,” Multimedia Hardware Architectures, Vol. 11, No.1, Jan. 27, 1997.

  3. G.A. Slavenburg, S. Rathnam, and H. Diskstra, “The trimedia TM-1 PCI VLIW media processor,” Proceedings Notebook for Hot Chips VIII, Stanford, pp. 171-177, 1996.

  4. Texas Instruments, TMS320C62xx Technical Brief, 1997.

  5. R.B. Lee, “Subword parallelism with MAX-2,” IEEE Micro, Vol. 16, No.4, pp. 51-59, Aug. 1996.

    Article  Google Scholar 

  6. V. Bhaskaran, K. Konstantinides, R.B. Lee, and J.P. Beck, “Algorithmical and architectural enhancements for real-time MPEG-1 decoding on a general purpose RISC workstation,” IEEE Trans. Circuits Syst. Video Technol., Vol. 5, pp. 10-20, Aug. 1996.

    Google Scholar 

  7. L. Gwennap, “Digital, MIPS add multimedia extensions,” Microprocessor Report, Vol. 10, No.15, pp. 24-28, Nov. 1996.

    Google Scholar 

  8. A. Peleg and U. Weiser, “MMX technology extensions to the Intel architecture,” IEEE Micro, Vol. 16, No.4, pp. 42-50, Aug. 1996.

    Article  Google Scholar 

  9. K. Nadehara, I. Kurode, M. Daito, and T. Nakayama, “Low-power multimedia RISC,” IEEE Micro, Vol. 15, No.6, pp. 20- 29, Dec. 1995.

    Article  Google Scholar 

  10. L. Gwennap, “Digital 21264 sets newstandard,” Microprocessor Report, pp. 11-16, Oct. 1996.

  11. Texas Instruments, TMS320C62xx Technical Brief, 1997.

  12. Texas Instruments, TMS320C62xx Technical Documentation, www.ti.com/sc/docs/psheets/pids1.htm, 1997.

  13. J. Kneip, M. Ohmacht, K. Rönner, and P. Pirsch, “Architecture and C++-programming environment of a highly parallel image signal processor,” Microprocessing and Microprogramming, Vol. 41, pp. 391-408, 1995.

    Article  Google Scholar 

  14. Trimedia TM 1000 Data Book, www.trimedia.philips.com/docs/DATABOOK.ZIP, 1997.

  15. J. Kneip, J.P. Wittenburg, M. Berekovic, K. Rönner, and P. Pirsch, “An algorithm adapted autonomous controlling concept for a parallel single-chip digital signal processor,” Proc. of the 8th Int.Workshop on VLSI Signal Processing, Osaka, pp. 41- 50, 1995.

  16. Joseph A. Fisher, “Walk-time techniques catalyst for architectural change,” IEEE Computer, Vol. 30, No.9, Sept. 1997.

  17. Jaime H. Morenzo and Mayan Moudgill, “Scalable instruction level parallelism through tree-instructions,” IBM Research Report, RC20661, Dec. 1996.

  18. M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, S.W. Liao, E. Bugnion, and M.S. Lam, “Maximizing multiprocessor performance with the SUIF compiler,” IEEE Computer, Vol. 29, No.12, Dec. 1996.

  19. “New TI technology doubles transistor density,” TI Integration, Vol. 13, No.5, 1996.

  20. Y.N. Patt, S.J. Patel, M. Evers, D.H. Friendly, and J. Stark, “One billion transistors, one uniprocessor, one chip,” IEEE Computer, pp. 51-57, Sept. 1997.

  21. J.L. Hennnessy and D.A. Patterson, Computer Architecture: A Quantitative Approach,” 2nd edition, Morgan Kaufmann Publishers Inc., San Francisco, 1996.

    Google Scholar 

  22. ITU-T Recommendation H.261, “Video codec for audiovisual services at p x 64 kbits,” March 1993.

  23. ITU-T Draft Recommendation H.263, “Video coding for low bitrate communication,” July 1995.

  24. ISO/IEC 11172-1/-2/-3, 1993(E), “Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s,” (MPEG-1), Part1: Systems/Part2: Video/Part3: Audio, 1993.

  25. ISO/IEC 13818-2, “Generic coding of moving pictures and associated audio” (MPEG-2), Part 2: Video, Nov. 1993.

  26. “MPEG-4 video verification model V.8.0,” ISO/IEC JTC1/SC29/WG11, MPEG96/N1796, July 1997.

  27. M. Ikekawa, D. Ishii, E. Murata, K. Numata, Y. Takamizawa, and M. Tanaka, “A real-time software MPEG-2 decoder for multimedia PCs,” IEEE Int. Conf. on Consumer Electronics, 1997.

  28. R. Frase, “Entwurf eines flexiblen Compositors für MPEG-4,” Diplomarbeit, Universität Hannover, Juli 1997 (in German).

  29. M. Berekovic, G. Meyer, Y. Guo, and P. Pirsch, “A multimedia RISC core for efficient bitstream parsing and VLD,” Multimedia Hardware Architectures 98, San Jose, Jan. 1998.

  30. K. Rönner, “Eine für Bildverarbeitungsverfahren optimierte hochparallele RISC-Architektur,” Fortschrittsberichte, Reihe 9, No.211, VDI-Verlag 1995 (Ph.D. thesis, in German).

  31. D.M. Tullsen, S.J. Eggers, and H.M. Levy, “Simultaneous multithreading: Maximizing on-chip parallelism,” Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 392-403, 1995.

  32. D. Matzke, “Will physical scalability sabotage performance gains?,” IEEE computer, pp. 37-39, Sept. 1997.

  33. S. Dutta, A. Wolfe, W. Wolf, and K.J. O'Connor, “Design issues for very-long-instruction-word VLSI video signal processors,” Proc. 1996Workshop on VLSI Signal Processing, San Francisco, pp. 95-104, 1996.

  34. J. Lipman, “Postlayout EDA tools lock onto full-chip verification,” EDN, pp. 93-98, Oct. 1996.

  35. Atmel-ES2, ECPD10, ECPD07, ECDM05 Library Data Books.

  36. J.P. Wittenburg, M. Ohmacht, J. Kneip, W. Hinrichs, and P. Pirsch, “HiPAR-DSP: A parallel VLIW RISC processor for real-time image processing applications,” Proceedings ICA3P, Dec. 1997 (submitted).

  37. J. Kneip, “Objektorientierte cache-speicher für programmierbare monolitische multiprozessoren in der digitalen bildverarbeitung,” Ph.D. thesis (in German), Universität Hannover, 1997.

  38. R.L. Franch, J. Ji, and C.L. Chen, “A 640 ps, 0.25-µm CMOS 16 x 64-b three port register file,” IEEE Journal of Solid-State Circuits, pp. 1288-1292. Aug. 1997.

  39. H.-J. Stolberg, M. Ikekawa, and I. Kuroda, “Code positioning to reduce instruction cache misses in signal processing applications on multimedia RISC processors,” Proc. 1997 International Conference on Acoustics, Speech and Signal Processing, Munich, May 1997.

  40. S. Storino, A. Aippersbach, J. Borkenhagen, and S. Levenstein, IBM Corp., Rochester, MN, “A commercial multi-threaded RISC processor,” IEEE International Sold-State Circuits Conference, Feb. 1998.

    Google Scholar 

  41. Peter Song, “Multithreading comes of Age,” Microprocessor Report, Vol. 11, No.9, pp. 13-18, July, 1997.

    Google Scholar 

  42. R. Alverson et al., “The tera computer system,” Proc. Int'l Conf. Supercomputing, ACM, N.Y., pp. 1-6, 1990.

    Google Scholar 

  43. S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, R.L. Stamm, and D.M. Tullsen, “Simultaneous multithreading: A platform for next-generation processors,” IEEE Micro, pp. 12-19, Sept.-Oct. 1997.

  44. D.M. Tullsen, S.J. Eggers, et al., “Exploiting choices: Instruction fetch and issue on an implementable simultaneous multithreading processor,” Twenty-third Annual International Symposium on Computer Architecture, pp. 191-202, May 1996.

  45. H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa, “An elementary processor architecture with simulatneous instruction issuing from multiple threads,” Nineteenth Annual International Symposium on Computer Architecture, pp. 202-213, May 1992.

  46. Kai Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability, McGraw-Hill Inc., New York, pp. 491-504, 1993.

    Google Scholar 

  47. W. Gehrke and K. Gaedke, “Associative controlling of monolithic parallel processor architectures,” IEEE Trans. Circuits Syst. Video Technol., Vol. 5, No.5, pp. 453-464, Oct. 1995.

    Article  Google Scholar 

  48. R. Eickemeyer and R. Johnson, “Evaluation of multithreaded uniprocessors for commercial application environments,” Twenty-third Annual International Symposium on Computer Architecture, pp. 203-212, May 1996.

  49. Thomas Erdmann, “Untersuchung und Bewertung verschiedener branch-prediction Strategien für den HiPAR-DSP,” Diplomarbeit, Universität Hannover, Jan. 1997 (in German).

  50. IBMs CMOS 7S process, IBM press releaese, http://www.chips.ibm.com, Sept. 1997.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berekovic, M., Pirsch, P. & Kneip, J. An Algorithm-Hardware-System Approach to VLIW Multimedia Processors. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 20, 163–180 (1998). https://doi.org/10.1023/A:1008030709840

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008030709840

Keywords

Navigation