Abstract
In this paper the important issues in mapping data dominated multimedia applications on Very Long Instruction Word (VLIW) multimedia processors are addressed. The main design quality factors of applications realized on the target architecture platform are presented and their interactions are explored. Power consumption is the major cost factor while performance is the overriding constraint in realizations of multimedia applications on the target architecture platform. A methodology for the reduction of the data transfer and storage related power consumption, which forms an important part of the total power budget of the system, and the execution time of applications realized on VLIW multimedia processors, has been developed. The methodology is based on the application of a number of transformations, mainly oriented towards data transfer and storage optimization, to a high level description of the target application. The main focus of this paper is on the interaction of the proposed code transformations with the exploitation of subword parallelism (for example through the application of special performance improving arithmetic subword instructions present in modern VLIW multimedia processors). Experimental results from real-life data-dominated multimedia applications clearly demonstrate that the application of the proposed transformations is orthogonal to the exploitation of subword parallelism. A second conclusion is that the positive impact of the proposed code transformations on performance is typically even larger than the effect of the subword parallelism exploitation for the complete application. The effect of the subword parallelism exploitation is even enhanced after the application of the proposed code transformations.
Similar content being viewed by others
References
J.M. Rabaey and M. Pedram, Low Power Design Methodologies. Kluwer Academic Publishers 1995.
T. Seki, E. Itoh, C. Furukawa, I. Maeno, T. Ozawa, H. Sano, and N. Suzuki, “A 6-ns 1-Mb CMOS SRAM with Latched Sense Amplifier,” IEEE Journal of Solid State Circuits, vol. 28, no. 4, 1993, pp. 478-483.
S. Wuytack, F. Catthoor, L. Nachtergaele, and H. DeMan, “Power Exploration for Data Dominated Video Applications,” in Proc. IEEE Intn. Symposium on Low Power Design, Monterey CA, 1996, pp. 359-364.
H. DeMan, F. Catthoor, G. Goosens, J. Vanhoof, J. Van Meerbergen, S. Note, and J. Huisken, “Architecture-Driven Synthesis Techniques for VLSI Implementation of DSP Algorithms,” Proc. of the IEEE, special issue on “The Future of Computer-Aided Design,” vol. 78, no. 2, 1990, pp. 319-335.
P. Lippens, J. Van Meerbergen, W. Verhaegh, and A. Van Der Werf, “Allocation of Multiport Memories for Hierarchical Data Streams,” in Proc. IEEE Intnl. Conference on Computer-Aided Design, Santa Clara CA, Nov. 1993.
T.H. Meng, B. Gordon, E. Tsern, and A. Hung, “Portable Video-on-Demand in Wireless Communication,” special issue on “Low Power Design” of the Proceedings of the IEEE, vol. 83, no. 4, April 1995, pp. 659-680.
V. Tiwari, S. Malik, and A. Wolfe, “Power Analysis of Embedded Software: A First Step Towards Software Power Minimization,” IEEE Trans. on VLSI Systems, vol. 2, no. 4, 1994, pp. 437-445.
F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle, “Custom Memory Management Methodology—Exploration of Memory Organisation for Embedded Multimedia System Design,” Boston: Kluwer Acad. Publ. 1998, ISBN 0-7923-8288-9.
A. Raghunathan and N. Jha, “SCALP: An Iterative-Improvement-Based Low-Power Data Path Synthesis System,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 16, no. 11, 1997, pp. 1260-1277.
D. Kirovski and M. Potkonjak, “System-Level Synthesis of Low Power Hard Real-Time Systems,” in Proc. of Design Automation Conference 1997 (DAC'97).
C. Kulkarni, F. Catthoor, and H. DeMan, “Hardware Cache Optimization for Parallel Multimedia Applications,” in Proc. of EUROPAR-98, Southampton, UK, Sept. 1998.
K. Danckaert, F. Catthoor, and H. DeMan, “System-Level Memory Optimization for Hardware Software Co-Design,” International Workshop Hardware/Software Co-design, March 1997, pp. 55-59.
L. Nachtergaele, D. Moolenaar, B. Vanhoof, and F. Catthoor, “System-Level Power Optimization of Video Codecs on Embedded Cores: A Systematic Approach,” Journal of VLSI Signal Processing, Kluwer, Boston 1998.
P.R. Panda, N.D. Dutt, and A. Nicolau, Memory Issues in Embedded in Systems-on-Chip: Optimization and Exploration, Boston: Kluwer Academic Publishers, 1998.
W. Kelly and W. Pugh, “Generating Schedules and Code Within a Unified Reordering Transformation Framework,” Technical Report UMIACS-TR-92-126, CS-TR-2995, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, College Park, MD, 1992.
S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng, “The SUIF Compiler for Scalable Parallel Machines,” in Proc. of the 7th SIAM Conference on Parallel Processing for Scientific Computing, 1995.
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua, “Automatic Program Parallelisation,” Proceedings of the IEEE, invited paper, vol. 81, no. 2, 1993, pp. 211-243.
K. McKinley, M. Hall, T. Harvey, K. Kennedy, N. McIntosh, J. Oldham, M. Paleczny, and G. Roth, “Experiences Using the ParaScope Editor: An Interactive Parallel Programming Tool,” in Proc. of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, USA, May 1993.
http://www.imec.be/vsdm/projects/mm_comp/. The IMEC multimedia compilation project—ACROPOLIS.
P. Pirsch, H.-J. Stolberg, Y.-K. Chen, and S.Y. Kung, “Implementation of Media Processors,” IEEE Signal Processing Magazine, no. 4, July 1997, pp. 48-51.
F. Catthoor, “Energy-Delay Efficient Data Storage and Transfer Architectures: Circuit Technology versus Design Methodology Solutions,” in Proc. of DATE 98, 1998, pp. 709-714.
The Philips TriMedia Family of Processors, http://www.trimedia.philips.com.
P. Landman, “Low Power Architectural Design Methodologies,” Doctoral Dissertation, U.C. Berkeley, Aug. 1994.
M. Kamble and K. Ghose, “Analytical Energy Dissipation Models for Low Power Caches,” in Proc. of the 1997 International Symposium on Low Power Electronics and Design, Monterey CA, August 18–20, 1997.
P. Strobach, “A New Technique in Scene Adaptive Coding, in Proc. 4th Eur. Signal Processing Conf., EUSIPCO-88, Grenoble, France, Elsevier Publ., Amsterdam, Sept. 1988, pp. 1141-1144.
L.R. Rabiner and R.W. Schafer, “Digital Signal Processing of Speech Signals,” Englewood Cliffs, NJ: Prentice Hall International, 1988.
V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards, Kluwer Academic Publishers, 1994.
V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez, “Reducing Power in High-Performance Microprocessors,” in Proc. of Design Automation Conference 1998 (DAC'98).
M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam, “Maximizing Multiprocessor Performance with the SUIF Compiler,” IEEE Computer Magazine, vol. 30, no. 12, 1996, pp. 84-89.
D. Kulkarni and M. Stumm, “Loop and Data Transformations: A Tutorial,” Technical Report CSRI-337, Computer Systems Research Institute, University of Toronto, June 1993.
A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques and Tools. Reading Massachusetts: Addison-Wesley Publishing Company, 1986.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Masselos, K., Catthoor, F., Goutis, C.E. et al. Combined Application of Data Transfer and Storage Optimizing Transformations and Subword Parallelism Exploitation for Power Consumption and Execution Time Reduction in VLIW Multimedia Processors. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 37, 53–73 (2004). https://doi.org/10.1023/B:VLSI.0000017003.70829.34
Published:
Issue Date:
DOI: https://doi.org/10.1023/B:VLSI.0000017003.70829.34