skip to main content
research-article

Trade-offs in loop transformations

Published: 07 April 2009 Publication History

Abstract

Nowadays, multimedia systems deal with huge amounts of memory accesses and large memory footprints. To alleviate the impact of these accesses and reduce the memory footprint, high-level memory exploration and optimization techniques have been proposed. These techniques try to more efficiently utilize the memory hierarchy. An important step in these optimization techniques are loop transformations (LT). They have a crucial effect on later data memory footprint optimization steps and code generation. However, the state-of-the-art work has focused only on individual objectives. The main one in literature involves improving the locality of data accesses, and thus reducing the data memory footprint. It does not consider the trade-offs in the LT step in relation to successive optimization steps. Therefore, it is not globally efficient in mapping the application on the target platform.
In this article we will discuss several trade-offs during the loop transformations. To our knowledge, we are the first ones considering these global trade-offs. Previous work always gave mostly one solution, having the best locality and thus the optimized memory footprint, even though some research in two-dimensional trade-offs in this area exists as well. We start from this state-of-the-art solution with minimal footprint. We show that by sacrificing the footprint, we can obtain gains in data reuse (crucial for energy reduction) and reduce the control-flow complexity. We demonstrate our approach on a real-life application, namely the QSDPCM video coder. At the end, we show that considering trade-offs for this application leads to 16% energy reduction in a two-layer memory subsystem and 10% cycle reduction on the ARM platform.

References

[1]
Absar, J., Catthoor, F., and Das, K. 2003. Call-instance based function inlining for increasing data access related optimisation opportunities. Tech. rep., IMEC, Leuven, Belgium.
[2]
Allen, R. and Kennedy, K. 1987. Automatic translation of fortran programs to vector form. ACM Trans. Program. Lang. Syst. 9, 4, 491--542.
[3]
Ancourt, C., Barthou, D., Guettier, C., Irigoin, F., Jeannet, B., Jourdan, J., and Mattioli, J. 1997. Automatic data mapping of signal processing applications. In Proceedings of the IEEE International Conference on Application-Specific Processors, 350--362.
[4]
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proceedings of the International Symposium on Hardware/Software Codesign (CODES), 73--78.
[5]
Banerjee, U., Eigenmann, R., Nicolau, A., and Padua, D. 1993. Automatic program parallelization. Proc. IEEE 81, 2, 211--243.
[6]
Bastoul, C. 2004. Code generation in the polyhedral model is easier than you think. In IEEE International Conference on Parallel Architectureand Compilation Techniques (PACT'13), 7--16.
[7]
Bastoul, C., Cohen, A., Girbal, S., Sharma, S., and Temam, O. 2003. Putting polyhedral loop transformations to work. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computers (LCPC'16). Lecture Notes in Computer Science, vol. 2958, 209--225.
[8]
Brockmeyer, E., Miranda, M., Catthoor, F., and Corporaal, H. 2003. Layer assignment techniques for low energy in multi-layered memory organisations. In Proceedings of the 6th ACM/IEEE Conference on Design and Test in Europe, 1070--1075.
[9]
Carr, S., McKinley, K. S., and Tseng, C.-W. 1994. Compiler optimizations for improving data locality. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, 252--262.
[10]
Catthoor, F. 2005. Meta model for human interaction and decision-making: meta-concepts and balloonist vs road-builder metaphore. slide set.
[11]
Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P. G., Van Achteren, T., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Amsterdam.
[12]
Cierniak, M. and Li, W. 1995. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the Conference on Programming Language Design and Implementation (SIGPLAN'95), ACM, New York, 205--217.
[13]
Clauss, P. and Loechner, V. 1998. Parametric analysis of polyhedral iteration spaces. J. VLSI Signal Processing 19, 179--194.
[14]
Danckaert, K., Catthoor, F., and De Man, H. 2000. A preprocessing step for global loop transformations for data transfer and storage optimization. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems, 34--40.
[15]
Darte, A. and Robert, Y. 1995. Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. J. Parall. Distrib. Comput. 29, 1, 43--59.
[16]
De Greef, E., Catthoor, F., and De Man, H. 1997. Memory size reduction through storage order optimization for embedded parallel multimedia applications. J. Parall. Comput. 23, 12. (Special issue on Parallel Processing and Multimedia)
[17]
De Man, H., Catthoor, F., Goossens, G., Vanhoof, J., Van Meerbergen, J., Note, S., and Huisken, J. 1990. Architecture-driven synthesis techniques for vlsi implementation of dsp algorithms. Proc. IEEE, 78, 2. (Special issue on The Future of Computer-Aided Design) 319--335.
[18]
Dezan, C., Le Verge, H., Quinton, P., and Saouter, Y. 1992. The Alpha du CENTAUR experiment. In Algorithms and parallel VLSI architectures II, P. Quinton and Y. Robert (eds.), Elsevier, Amsterdam, 325--334.
[19]
Falk, H. and Marwedel, P. 2003. Control flow optimization by loop nest splitting at the source code level. In Proceedingsof the 6th ACM/IEEE Conference on Design and Test in Europe, 410--415.
[20]
Feautrier, P. 1992. Some efficient solutions to the affine scheduling problems. Int. J. Parall. Program. 21, 5, 389--420.
[21]
Fraboulet, A., Huard, G., and Mignotte, A. 1999. Loop alignment for memory access optimisation. In Proceedings of the 12th ACM/IEEE International Symposium on System-Level Synthesis (ISSS). 71--70.
[22]
Franke, B. and O'Boyle, M. 2003. Array recovery and high-level transformations for dsp applications. ACM Trans. Embed. Comput. Syst. 2, 2, 132--162.
[23]
Hu, Q. 2007. Hierarchical memory size estimation for loop transformation and data memory platform exploration. Ph.D. thesis, Norway University of Science and Technology.
[24]
Kandemir, M., Ramanujam, J., Choudhary, A., and Banerjee, P. 2001. A layout-conscious iteration space transformation technique. IEEE Trans. Computers 50, 12, 1321--1335.
[25]
Kelly, W. and Pugh, W. 1993. A framework for unifying reordering transformations. Tech. rep. CS-TR-3193, Department of Computer Science, Univ. of Maryland, College Park MD.
[26]
Kjeldsberg, P. G. 2001. Storage requirement estimation and optimization for data intensive applications. Ph.D. thesis, Norwegian University of Science and Technology.
[27]
Li, W. and Pingali, K. 1992. A singular loop transformation framework based on non-singular matrices. In Proceedings of the 5th Annual Workshop on Languages and Compilers for Parallelism.
[28]
Manjiakian, N. and Abdelrahman, T. 1995. Fusion of loops for parallelism and locality. Tech. rep. CSRI-315, Computer Systems Reserch Institute, University of Toronto.
[29]
Meng, T. H., Gordon, B., Tsern, E., and Hung, A. 1995. Portable video-on-demand in wireless communication. Proc. IEEE 83, 4 (Special Issue on Low Power Electronics). 659--680.
[30]
Olsen, R. and Gao, G. 1992. Collective analysis and transformation of loop clusters. Tech. rep. ACAPS Technical Memo 24, McGill University.
[31]
Palkovic, M., Corporaal, H., and Catthoor, F. 2006. Dealing with data dependent conditions to enable general global source code transformations. Int. J. Embed. Syst. To appear.
[32]
Polychronopoulos, C. 1988. Compiler optimizations for enhancing parallelism and their impact on the architecture design. IEEE Trans. Computers 37, 8, 991--1004.
[33]
Pugh, W. 1992. The omega test: A fast and practical integer programming algorithm for dependence analysis. Comm. ACM 35, 8.
[34]
Qin, W. and Malik, S. 2003. Flexible and formal modeling of microprocessors with application to retargetable simulation. In Proceedings of the Conference on Design Automation and Test in Europe (DATE 03). 556--561.
[35]
Quillere, F., Rajopadhye, S., and Wilde, D. 2000. Generation of efficient nested loops from polyhedra. Int. J. Parall. Program. 28, 5.
[36]
Sarkar, V. and Gao, G. R. 1991. Optimization of array accesses by collective loop transformations. In Proceedings of the 5th International Conference on Supercomputing. 194--205.
[37]
Schuster, T., Bougard, B., Raghavan, P., Priewasser, R., Novo, D., der Perre, L. V., and Catthoor, F. 2007. Design of a low power pre-synchronization asip for multimode sdr terminals. In Proceedings of the International Symposium on Systems, Architectures, Modeling and Simulation (SAMOS). 322--332.
[38]
Semeria, L. and De Micheli, G. 1998. Spc: synthesis of pointers in c. In Proceedings of the IEEE International Conference on Computer Aided Design. 340--346.
[39]
Song, Y., Xu, R., Wang, C., and Li, Z. 2001. Data locality enhancement by memory reduction. In International Conference on Supercomputing. 50--64.
[40]
Strobach, P. 1988. Qsdpcm: A new technique in scene adaptive coding. In Proceedings of the 4th European Signal Processing Conference (EUSIPCO-88). 1141--1144.
[41]
Thies, W., Vivien, F., Sheldon, J., and Amarasinghe, S. 2001. A unified framework for schedule and storage optimization. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. 232--242.
[42]
Van Achteren, T., Deconinck, G., Catthoor, F., and Lauwereins, R. 2002. Data reuse exploration methodology for loop-dominated applications. In Proceedings of the 5th ACM/IEEE Conference on Design and Test in Europe (DATE). 428--435.
[43]
van Swaaij, M., Franssen, F., Catthoor, F., and De Man, H. 1992. Modelling data and control flow for high-level memory management. In Proceedings of the 3rd ACM/IEEE European Design Automation Conference. 8--13.
[44]
Vanbroekhoven, P., Janssens, G., Bruynooghe, M., Corporaal, H., and Catthoor, F. 2003. Advanced copy propagation for arrays. In Proceedings of the SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'03). 24--33.
[45]
Vander Aa, T., Corporaal, H., Catthoor, F., and Deconinck, G. 2005. Combining data and instruction memory energy optimizations for embedded applications. In Proceedings of the 3rd workshop on Embedded Systems for Real-Time Multimedia. 121--126.
[46]
Verdoolaege, S., Catthoor, F., Bruynooghe, M., and Janssens, G. 2003. Multidimensional incremental loop fusion for data locality. In Proceedings of the IEEE 14th International Conference on Application-specific Systems, Architectures and Processors.
[47]
Wilde, D. 1993. A library for doing polyhedral operations. M.S. thesis, Oregon State University.
[48]
Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of the Conference on Programming Language Design and Implementation (SIGPLAN'91). 30--43.
[49]
Yang, P., Wong, C., Marchal, P., Catthoor, F., Desmet, D., Verkest, D., and Lauwereins, R. 2001. Energy-aware runtime scheduling for embedded multi-processor socs. IEEE Design Test of Comput. 18, 5. (Special issue on Application-Specific Multiprocessor Mapping), 46--58.

Cited By

View all
  • (2019)A methodology correlating code optimizations with data memory accesses, execution time and energy consumptionThe Journal of Supercomputing10.1007/s11227-019-02880-zOnline publication date: 13-May-2019
  • (2018)A methodology for efficient code optimizations and memory managementProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203274(105-112)Online publication date: 8-May-2018
  • (2013)Polyhedral-based data reuse optimization for configurable computingProceedings of the ACM/SIGDA international symposium on Field programmable gate arrays10.1145/2435264.2435273(29-38)Online publication date: 11-Feb-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 14, Issue 2
March 2009
384 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/1497561
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 07 April 2009
Accepted: 01 December 2008
Revised: 01 November 2008
Received: 01 December 2006
Published in TODAES Volume 14, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data transfer and storage exploration
  2. cost components
  3. loop transformations
  4. optimization
  5. trade-offs

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2019)A methodology correlating code optimizations with data memory accesses, execution time and energy consumptionThe Journal of Supercomputing10.1007/s11227-019-02880-zOnline publication date: 13-May-2019
  • (2018)A methodology for efficient code optimizations and memory managementProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203274(105-112)Online publication date: 8-May-2018
  • (2013)Polyhedral-based data reuse optimization for configurable computingProceedings of the ACM/SIGDA international symposium on Field programmable gate arrays10.1145/2435264.2435273(29-38)Online publication date: 11-Feb-2013
  • (2013)Verification of Loop and Arithmetic Transformations of Array-Intensive BehaviorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2013.227253632:11(1787-1800)Online publication date: 1-Nov-2013
  • (2012)Improving last level cache locality by integrating loop and data transformationsProceedings of the International Conference on Computer-Aided Design10.1145/2429384.2429398(65-72)Online publication date: 5-Nov-2012
  • (2012)Optimizing memory hierarchy allocation with loop transformations for high-level synthesisProceedings of the 49th Annual Design Automation Conference10.1145/2228360.2228586(1233-1238)Online publication date: 3-Jun-2012
  • (2011)Combined loop transformation and hierarchy allocation for data reuse optimizationProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132368(185-192)Online publication date: 7-Nov-2011
  • (2011)Equivalence Checking of Array-Intensive ProgramsProceedings of the 2011 IEEE Computer Society Annual Symposium on VLSI10.1109/ISVLSI.2011.61(156-161)Online publication date: 4-Jul-2011
  • (2011)Combined loop transformation and hierarchy allocation for data reuse optimizationProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105324(185-192)Online publication date: 7-Nov-2011
  • (2010)Experience with Widening Based Equivalence Checking in Realistic Multimedia SystemsJournal of Electronic Testing: Theory and Applications10.1007/s10836-009-5140-426:2(279-292)Online publication date: 1-Apr-2010

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media