Abstract
A systematic methodology for the application of data transfer and storageoptimizing code transformations to high-level descriptions of multimedia systemsrealized on instruction set processors is proposed. A detailed order for theapplication of different data transfer and storage optimizing transformationsis proposed in the context of combined execution time and power optimizations.A use methodology including a number of support steps that allow the efficientapplication of the data transfer and storage oriented transformations is proposedas well. Applicatio n of the proposed transformation-based methodology movesthe main part of the memory accesses from the large background memories (lyingpossibly off-chip) to smaller ones (on-chip) or even to foreground storage(registers). Data cache performance is improved thus reducing power consumptionin the data memory hierarchy and related interconnects. Execution time andthe power consumption due to instruction storage and transfers are reducedas well after the application of the proposed methodology. Experimental resultsfrom several real-life multimedia applications prove the effectiveness ofthe proposed methodology. The proposed approach has been applied in the contextof realizations on custom hardware processors as well with promising results.
Similar content being viewed by others
References
Rabaey, J.M., and M. Pedram. Low Power Design Methodologies. Kluwer Academic Publishers, 1995.
Seki, T., E. Itoh, C. Furukawa, I. Maeno, T. Ozawa, H. Sano, and N. Suzuki.A6-ns 1-Mb CMOS SRAM With Latched Sense Amplifier. IEEE Journal of Solid State Circuits, vol. 28, no. 4, pp. 478-483, (Apr.)1993.
Wuytack, S., F. Catthoor, L. Nachtergaele, and H. DeMan. Power Exploration for Data Dominated Video Applications. In Proc. IEEE Intnl. Symposium on Low Power Design, Monterey, CA, pp. 359-364, (Aug.)1996.
Meng, T.H., B. Gordon, E. Tsern, and A. Hung.Portable Video-on-Demand in Wireless Communication. Proceedings of the IEEE, special issue on Low Power Design, vol. 83, no. 4, pp. 659-680, (April)1995.
Tiwari, V., S. Malik, and A. Wolfe. Power Analysis of Embedded Software: A First Step Towards Software Power Minimization. IEEETrans. on VLSI Systems, vol. 2, no. 4, pp. 437-445, (Dec.)1994.
http://www.imec.be/vsdm/projects/mm comp/. The IMEC multimedia compilation project ACROPOLIS.
Catthoor, F., S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle. Custom Memory Management Methodology Exploration of Memory Organisation for Embedded Multimedia System Design. ISBN 0-7923-8288-9,Kluwer Acad. Publ., Boston, 1998.
Van Meerbergen, J., P. Lippens, W. Verhaegh, and A. Van Der Werf. PHIDEO: High-Level Synthesis for High Throughput Applications. Journal of VLSI Signal Processing, special issue on Design Environments for DSP. (I.Verbauwhede, J.Rabaey, eds.), Kluwer, Boston, vol. 9, nos.1/2, pp. 89-104, (Jan.)1995.
Ancourt, C., D. Barthou, C. Guettier, F. Irigoin, B. Jeannet, J. Jourdan, and J. Mattioli. Automatic Data Mapping of Signal Processing Applications. In Proc. of Intnl. Conference on Application Specific Array Processors, 1997.
Fang, J. Z., and M. Lu. An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing. IEEE Transactions on Computers, vol.C-42, no. 5, (May)1993.
Gannon, D., W. Jalby, and K. Gallivan. Strategies for Cache and Local Memory Management by Global Program Transformations. Journal of Parallel and Distributed Computing, vol. 5, pp. 568-586,1988.
Wolf, M., and M. Lam. A Data Locality Optimizing Algorithm. In Proc. of the SIGPLAN'91Conf. on Programming Language Design and Implementation, Toronto, ON, Canada, pp. 30-43, (June) 1991.
Panda, P.R., N.D. Dutt, and A. Nicolau. Memory Issues in Embedded in Systems-on-Chip: Optimization and Exploration, Kluwer Academic Publishers, Boston,1998.
Pirsch, P., H.-J. Stolberg, Y.-K. Chen, S.Y. Kung. Implementation of Media Processors. IEEE Signal Processing Magazine, no. 4, pp. 48-51, (July) 1997.
Landman, P.. Low Power Architectural Design Methodologies, Doctoral Dissertation, U.C. Berkeley, (Aug.)1994.
Kamble, M., and K. Ghose.Analytical Energy Dissipation Models for Low Power Caches. In Proc. of the 1997 International Symposium on Low Power Electronics and Design, Monterey, CA, (August 18-20).
Catthoor, F., M. Janssen, L. Nachtergaele, H. DeMan. System-Level Data-Flow Transformations for Power Reduction in Image and Video Processing.In Proc. of ICECS'96, pp.1025-1028.
Kulkarni, D., and M. tumm. Loop and Data Ttpdel 99ransformations: A Tutorial.Technical Report CSRI-337, Computer Systems Research Institute, University of Toronto, (June) 1993.
Diguet, J.P., S. Wuytack, F. Catthoor, and H. DeMan. Hierarchy Exploration in High level Memory Management. In Proc. of the 1997 International Symposium on Low Power Electronics and Design, Monterey, CA, (August 18-20).
DeGreef, E., F. Catthoor, and H. DeMan. Memory Size Reduction through Storage Order Optimization for Embedded Parallel Multimedia Applications. In Intl. Parallel Processing Symposium (IPPS) in Proc. Workshop on Parallel Processing and Multimedia, (April)1997, pp. 84-98.
Aho, A., R. Sethi, and J. Ullman. Compilers: Principles Techniques and Tools, Addison-Wesley Publishing Company, Reading, MA,1986.
Strobach, P.. A New Technique in Scene Adaptive Coding. In Proc. 4thEur. Signal Processing Conf.,EUSIPCO-88, Grenoble, France, Elsevier Publ., Amsterdam, pp. 1141-1144, (Sep.)1988.
Hall, M., J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing Multiprocessor Performance With the SUIF Compiler. IEEE Computer Magazine, vol. 30, no.12, pp. 84-89, (December)1996.
Kulkarni, C., F. Catthoor, and H. DeMan. Hardware Cache Optimization for Parallel Multimedia Applications. In EuroPar Conference, (September) 1998, pp. 668-676.
Lam, M., E. Rothberg, and M. Wolf.The Cache Performance and Optimizations of Blocked Algorithms. In Architectural Support for Programming Languages and Operating Systems Conference, pp. 63-74, (April)1991.
Rabiner, L. R., and R. WSchafer. Digital Signal Processing of Speech Signals, Prentice Hall International, Englewood Cliffs, NJ, 1988.
The PhilipsTriMedia Family of Processors, http://www.trimedia.philips.com.
Masselos, K., F. Catthoor, C.E. Goutis, H. DeMan. Interaction Between Sub-word Parallelism Exploitation and Low Power Code Transformations for VLIW Multi-Media Processors. In Volta Workshop, Italy, (March)1999.
Kelly, W., and W. Pugh.Generating Schedules and Code Within a Unified Reordering Transformation Framework. Technical Report UMIACS-TR-92-126, CS-TR-2995, Institute for Advanced Computer Studies, Department of Computer Science, University of Maryland, College Park, MD 20742, 1992.
Amarasinghe, S., J. Anderson, M. Lam, and C. Tseng. The SUIF Compiler for Scalable Parallel Machines. Proc. of the 7th SIAM Conference on Parallel Processing for Scientific Computing, 1995.
Banerjee, U., R. Eigemnann, A. Nicolau, and D. Padua. Automatic Program Parallelisation. Proceedings of the IEEE, invited paper, vol. 81, no. 2, pp. 211-243, (February) 1993.
McKinley, K., M. Hall, T. Harvey, K. Kennedy, N. McIntosh, J. Oldham, M. Paleczny, and G. Roth. Experiences Using the Para Scope Editor: An Interactive Parallel Programming Tool. In Proc. of the 4th ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego,USA,(May) 1993.
Brockmeyer, E., S. Wuytack, A. Vandecappelle, and F. Catthoor. Low Power Storage for Hierarchical Graphs. In Proc. of the 3rd ACM/IEEE Design and Test in Europe Conference, Paris, France, User Forum, pp. 249-254, (April) 2000.
Masselos, K., K. Danckaert, F. Catthoor, C.E. Goutis, and H. DeMan. A Methodology for Power Efficient Partitioning of Data Dominated Algorithm Specifications Within Performance Constraints. In Proc. of the 1999 International Symposium on Low Power Electronics and Design, pp. 270-272,CA,August 1999.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Masselos, K., Catthoor, F., Goutis, C.E. et al. Systematic Application of Data Transfer and Storage Optimizing Code Transformations for Power Consumption and Execution Time Reduction in ACROPOLIS: A Pre-Compiler for Multimedia Applications. Design Automation for Embedded Systems 8, 51–86 (2003). https://doi.org/10.1023/A:1022340119745
Issue Date:
DOI: https://doi.org/10.1023/A:1022340119745