Abstract
Very Long Instruction Word (VLIW) processor architectures for multimedia applications are discussed from an algorithm, hardware and system based point of view. VLIW processors show high flexibility and processing power, as well as a good utilization of resources by compiler-generated code, but their exclusive exploitation of instruction level parallelism (ILP) decreases in efficiency as the degree of parallelism increases. This is mainly caused by characteristics of multimedia algorithms, increasing wiring delays, compiler restrictions, and a widening gap between on-chip processing speed and available bandwidth to external memory. As new multimedia applications and standards continue to evolve (MPEG-4), the demand for higher processing power will continue. Therefore, parallel processing in all its available forms will have to be exploited to achieve significant performance improvements. We show that, due to the diminishing returns from a further increase in ILP, multimedia applications will benefit more from an additional exploitation of parallelism at thread-level. We examine how simultaneous multithreading (SMT), a novel architectural approach combining VLIW techniques with parallel processing of threads, can efficiently be used to further increase performance of typical multimedia workloads.
Similar content being viewed by others
References
P.N. Glaskowsky, “First media processors reach the market,” Microprocessor Report, Vol. 11, No.1, Jan. 27, 1997.
P. Pirsch, A. Freimann, and M. Berekovic, “Multimedia signal processors,” Multimedia Hardware Architectures, Vol. 11, No.1, Jan. 27, 1997.
G.A. Slavenburg, S. Rathnam, and H. Diskstra, “The trimedia TM-1 PCI VLIW media processor,” Proceedings Notebook for Hot Chips VIII, Stanford, pp. 171-177, 1996.
Texas Instruments, TMS320C62xx Technical Brief, 1997.
R.B. Lee, “Subword parallelism with MAX-2,” IEEE Micro, Vol. 16, No.4, pp. 51-59, Aug. 1996.
V. Bhaskaran, K. Konstantinides, R.B. Lee, and J.P. Beck, “Algorithmical and architectural enhancements for real-time MPEG-1 decoding on a general purpose RISC workstation,” IEEE Trans. Circuits Syst. Video Technol., Vol. 5, pp. 10-20, Aug. 1996.
L. Gwennap, “Digital, MIPS add multimedia extensions,” Microprocessor Report, Vol. 10, No.15, pp. 24-28, Nov. 1996.
A. Peleg and U. Weiser, “MMX technology extensions to the Intel architecture,” IEEE Micro, Vol. 16, No.4, pp. 42-50, Aug. 1996.
K. Nadehara, I. Kurode, M. Daito, and T. Nakayama, “Low-power multimedia RISC,” IEEE Micro, Vol. 15, No.6, pp. 20- 29, Dec. 1995.
L. Gwennap, “Digital 21264 sets newstandard,” Microprocessor Report, pp. 11-16, Oct. 1996.
Texas Instruments, TMS320C62xx Technical Brief, 1997.
Texas Instruments, TMS320C62xx Technical Documentation, www.ti.com/sc/docs/psheets/pids1.htm, 1997.
J. Kneip, M. Ohmacht, K. Rönner, and P. Pirsch, “Architecture and C++-programming environment of a highly parallel image signal processor,” Microprocessing and Microprogramming, Vol. 41, pp. 391-408, 1995.
Trimedia TM 1000 Data Book, www.trimedia.philips.com/docs/DATABOOK.ZIP, 1997.
J. Kneip, J.P. Wittenburg, M. Berekovic, K. Rönner, and P. Pirsch, “An algorithm adapted autonomous controlling concept for a parallel single-chip digital signal processor,” Proc. of the 8th Int.Workshop on VLSI Signal Processing, Osaka, pp. 41- 50, 1995.
Joseph A. Fisher, “Walk-time techniques catalyst for architectural change,” IEEE Computer, Vol. 30, No.9, Sept. 1997.
Jaime H. Morenzo and Mayan Moudgill, “Scalable instruction level parallelism through tree-instructions,” IBM Research Report, RC20661, Dec. 1996.
M.W. Hall, J.M. Anderson, S.P. Amarasinghe, B.R. Murphy, S.W. Liao, E. Bugnion, and M.S. Lam, “Maximizing multiprocessor performance with the SUIF compiler,” IEEE Computer, Vol. 29, No.12, Dec. 1996.
“New TI technology doubles transistor density,” TI Integration, Vol. 13, No.5, 1996.
Y.N. Patt, S.J. Patel, M. Evers, D.H. Friendly, and J. Stark, “One billion transistors, one uniprocessor, one chip,” IEEE Computer, pp. 51-57, Sept. 1997.
J.L. Hennnessy and D.A. Patterson, Computer Architecture: A Quantitative Approach,” 2nd edition, Morgan Kaufmann Publishers Inc., San Francisco, 1996.
ITU-T Recommendation H.261, “Video codec for audiovisual services at p x 64 kbits,” March 1993.
ITU-T Draft Recommendation H.263, “Video coding for low bitrate communication,” July 1995.
ISO/IEC 11172-1/-2/-3, 1993(E), “Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s,” (MPEG-1), Part1: Systems/Part2: Video/Part3: Audio, 1993.
ISO/IEC 13818-2, “Generic coding of moving pictures and associated audio” (MPEG-2), Part 2: Video, Nov. 1993.
“MPEG-4 video verification model V.8.0,” ISO/IEC JTC1/SC29/WG11, MPEG96/N1796, July 1997.
M. Ikekawa, D. Ishii, E. Murata, K. Numata, Y. Takamizawa, and M. Tanaka, “A real-time software MPEG-2 decoder for multimedia PCs,” IEEE Int. Conf. on Consumer Electronics, 1997.
R. Frase, “Entwurf eines flexiblen Compositors für MPEG-4,” Diplomarbeit, Universität Hannover, Juli 1997 (in German).
M. Berekovic, G. Meyer, Y. Guo, and P. Pirsch, “A multimedia RISC core for efficient bitstream parsing and VLD,” Multimedia Hardware Architectures 98, San Jose, Jan. 1998.
K. Rönner, “Eine für Bildverarbeitungsverfahren optimierte hochparallele RISC-Architektur,” Fortschrittsberichte, Reihe 9, No.211, VDI-Verlag 1995 (Ph.D. thesis, in German).
D.M. Tullsen, S.J. Eggers, and H.M. Levy, “Simultaneous multithreading: Maximizing on-chip parallelism,” Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 392-403, 1995.
D. Matzke, “Will physical scalability sabotage performance gains?,” IEEE computer, pp. 37-39, Sept. 1997.
S. Dutta, A. Wolfe, W. Wolf, and K.J. O'Connor, “Design issues for very-long-instruction-word VLSI video signal processors,” Proc. 1996Workshop on VLSI Signal Processing, San Francisco, pp. 95-104, 1996.
J. Lipman, “Postlayout EDA tools lock onto full-chip verification,” EDN, pp. 93-98, Oct. 1996.
Atmel-ES2, ECPD10, ECPD07, ECDM05 Library Data Books.
J.P. Wittenburg, M. Ohmacht, J. Kneip, W. Hinrichs, and P. Pirsch, “HiPAR-DSP: A parallel VLIW RISC processor for real-time image processing applications,” Proceedings ICA3P, Dec. 1997 (submitted).
J. Kneip, “Objektorientierte cache-speicher für programmierbare monolitische multiprozessoren in der digitalen bildverarbeitung,” Ph.D. thesis (in German), Universität Hannover, 1997.
R.L. Franch, J. Ji, and C.L. Chen, “A 640 ps, 0.25-µm CMOS 16 x 64-b three port register file,” IEEE Journal of Solid-State Circuits, pp. 1288-1292. Aug. 1997.
H.-J. Stolberg, M. Ikekawa, and I. Kuroda, “Code positioning to reduce instruction cache misses in signal processing applications on multimedia RISC processors,” Proc. 1997 International Conference on Acoustics, Speech and Signal Processing, Munich, May 1997.
S. Storino, A. Aippersbach, J. Borkenhagen, and S. Levenstein, IBM Corp., Rochester, MN, “A commercial multi-threaded RISC processor,” IEEE International Sold-State Circuits Conference, Feb. 1998.
Peter Song, “Multithreading comes of Age,” Microprocessor Report, Vol. 11, No.9, pp. 13-18, July, 1997.
R. Alverson et al., “The tera computer system,” Proc. Int'l Conf. Supercomputing, ACM, N.Y., pp. 1-6, 1990.
S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, R.L. Stamm, and D.M. Tullsen, “Simultaneous multithreading: A platform for next-generation processors,” IEEE Micro, pp. 12-19, Sept.-Oct. 1997.
D.M. Tullsen, S.J. Eggers, et al., “Exploiting choices: Instruction fetch and issue on an implementable simultaneous multithreading processor,” Twenty-third Annual International Symposium on Computer Architecture, pp. 191-202, May 1996.
H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa, “An elementary processor architecture with simulatneous instruction issuing from multiple threads,” Nineteenth Annual International Symposium on Computer Architecture, pp. 202-213, May 1992.
Kai Hwang, Advanced Computer Architecture: Parallelism, Scalability, Programmability, McGraw-Hill Inc., New York, pp. 491-504, 1993.
W. Gehrke and K. Gaedke, “Associative controlling of monolithic parallel processor architectures,” IEEE Trans. Circuits Syst. Video Technol., Vol. 5, No.5, pp. 453-464, Oct. 1995.
R. Eickemeyer and R. Johnson, “Evaluation of multithreaded uniprocessors for commercial application environments,” Twenty-third Annual International Symposium on Computer Architecture, pp. 203-212, May 1996.
Thomas Erdmann, “Untersuchung und Bewertung verschiedener branch-prediction Strategien für den HiPAR-DSP,” Diplomarbeit, Universität Hannover, Jan. 1997 (in German).
IBMs CMOS 7S process, IBM press releaese, http://www.chips.ibm.com, Sept. 1997.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Berekovic, M., Pirsch, P. & Kneip, J. An Algorithm-Hardware-System Approach to VLIW Multimedia Processors. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 20, 163–180 (1998). https://doi.org/10.1023/A:1008030709840
Published:
Issue Date:
DOI: https://doi.org/10.1023/A:1008030709840