Skip to main content
Log in

Abstract

In today’s embedded systems, memory hierarchy is rapidly becoming a major factor in terms of power, performance and area. This is especially true for embedded multimedia applications using temporary multi-dimensional arrays that are typically used to store intermediate results during multimedia processing. In this paper, we propose a new technique that optimizes the use of the cache and the registers. It consists in combining buffer and register allocation to reduce the size of the temporary arrays. Firstly we use the concept of live data to replace each array by a buffer of lower size. Then we replace references to these buffers by registers. The buffer allocation step keeps only useful data in memory and the register allocation step allows taking advantage of data reuse in internal loops. Codes considered in this paper are multimedia applications structured as a sequence of loop nests. The experiments are made on Unix environment and on the StepNP simulator (MPSoC platform of STMicroelctronics). They show that our technique yields significant reduction of the number of data cache and TLB misses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Baradaran and P. C. Diniz, “A Register Allocation Algorithm in the Presence of Scalar Replacement for Fine Grain Configurable Architectures,” in Design, Automation and Test in Europe (DATE’05), Kluwer, 2005, pp. 6–11.

  2. Y. Bouchebaba and F. Coelho, “Pavage Pour une séquence de nids de boucles,” Journal Technique et science informatiques, Parallélisme et systémes distribués, vol. 21, no. 5, 2002, pp. 579–603.

    Google Scholar 

  3. Y. Bouchebaba and F. Coelho, “Tiling and Memory Reuse for Sequences of Nested Loops,” in In Euro-par2002, Germany, August 2002, pp. 255–264.

  4. S. Carr and K. Kennedy, “Scalar Replacement in the Presence of Conditional Control Flow,” Softw. Pract. Exp., vol. 1, no. 24, 1994, pp. 51–77.

    Article  Google Scholar 

  5. F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, and A. Vandecappelle, “Custom Memory Management Methodology-Exploration of Memory Organization for Embedded Multimedia System Design,” Kluwer, 1988.

  6. A. Darte, “On the Complexity of Loop Fusion,” Parallel Comput., vol. 26, no. 9, 2000, pp. 1175–1193.

    Article  MathSciNet  MATH  Google Scholar 

  7. C. Eisenbeis, W. Jalby, D. Windheiser, and F. Bodin, “A Strategy for Array Management in Local Memory,” Journal of Mathematical Programming: Series A, vol. 63, no. 3, 1994, pp. 331–370.

    Article  MathSciNet  MATH  Google Scholar 

  8. A. Fraboulet et al., “Loop fusion for memory space optimization,” in Proceedings of the 14th international Symposium on Systems Synthesis, ISSS ’01, Montreal, Canada, 2001, pp. 95–100, September 30–October 3.

  9. F. Catthoor et al., “Global Communication and Memory Optimizing Transformations for Low Power Signal Processing Systems,” in The IEEE Workshop on VLSI Signal Processing, 1994, pp. 178–187.

  10. N. Ahmed et al., “Tiling Imperfectly-nested Loop Nests,” in Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, Dallas, Texas, United States, Novembre 2000.

  11. D. Gannon, W. Jalby, and K. Gallivan, “Strategies for Cache and Local Memory Management by Global Program Transformation,” J. Parallel Distrib. Comput., vol. 5, no. 10, 1988, pp. 587–616.

    Article  Google Scholar 

  12. C. H. Gebotys, “Low Energy Memory and Register Allocation using Network Flow,” in Design Automation Conference, Anaheim, California, United States, 1997, pp. 435–440.

  13. Eddy De Greef, “Storage Size Reduction for Multimedia Application,” PhD thesis, Katholieke Universiteit Leuven-IMEC, 1998.

  14. F. Irigoin and R. Triolet, “Supernode Partitioning,” in Proceedings of 15th Annual ACM Symposium on Principles of Programming Languages, San Diego, CA, 1988, pp. 319–329.

  15. M. Jimenez, J. M. Llaberia, A. Fernandez, and E. Morancho, “A General Algorithm for Tiling the Register Level,” in Proceeding of the 12th ACM International Conference on Supercomputing, Melbourne, Australia, 1998, pp. 133–140.

  16. M. Jiménez, “ Multilevel Tiling for Non-Rectangular Iteration Spaces,” PhD thesis. Universitat Politécnica Catalunia Spain, 1999.

  17. I. Kadayif and M. Kandemir, “Data Space-oriented Tiling for Enhancing Locality,” in Trans. on Embedded Computing Sys, May 2005, pp. 388–414.

  18. M. Kandemir et al., “Dynamic Management of Scratch-pad Memory Space,” in Proceedings of the 38th Conference on Design Automation, DAC ’01, Las Vegas, United States, June 18–22, 2001, pp. 690–695.

  19. M. Kandemir, A. N. Choudhary and J. Ramanujam, “I/O-Conscious Tiling for Disk-resident Data Sets,” in Euro-par99, Toulouse, 1999, pp. 430–439.

  20. M. Kandemir, I. Kadayif, A. Choudhary, and J. A. Zambreno, “Optimizing Inter-nest Data Locality,” in Proceedings of the 2002 International Conference Compilers, Architecture, and Synthesis For Embedded Systems, CASES ’02, Grenoble, France, October 8–11, 2002, pp. 127–135.

  21. M. Kandemir, “A Compiler-based Approach for Improving Intra-iteration Data Reuse,” in Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition, March 4–8, 2002, pp. 984–990.

  22. K. Kennedy, “ Fast Greedy Weighted Fusion,” Int. J. Parallel Program., vol. 29, no. 5, 2001, pp. 463–491.

    Article  MATH  Google Scholar 

  23. P. Marchal, J. I. Gómez, and F. Catthoor, “Optimizing the Memory Bandwidth with Loop Fusion,” in Proceedings of the 2nd IEEE/ACM/IFIP international Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS, Stockholm, Sweden, September 8–10, 2004, pp. 88–193.

  24. P. G. Paulin, C. Pilkington, M. Langevin, E. Bensoudane, and G. Nicolescu, “Parallel Programming Models for a Multi-processor SoC Platform Applied to High-speed Traffic Management,” in Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS, Stockholm, Sweden, September 8–10, 2004, pp. 48–53.

  25. F. Quillere, S. Rajopadhye and D. Wilde, “Generation of Efficient Nested Loops from Polyhedra,” Int. J. Parallel Program., vol. 28, no. 5, 2000, pp. 469–498.

    Article  Google Scholar 

  26. V. Sarkar, “Automatic Selection of High Order Transformations in the IBM XL FORTRAN Compilers. IBM J. Res. Develop., vol. 41, no. 3, 1997, pp. 233–264.

    Article  Google Scholar 

  27. S. Udayakumaran and R. Barua, “Compiler-decided Dynamic Memory Allocation for Scratch-pad Based Embedded Systems,” in Proceedings of the 2003 international Conference on Compilers, Architecture and Synthesis For Embedded Systems, CASES ’03, San Jose, United States, October 30–November 1, pp. 276–286, 2003.

  28. M. E. Wolf, “Improving Locality and Parallelism in Nested Loops,” PhD Thesis, University of Stanford, 1992.

  29. H. P. Zima and B. M. Chapman, “Supercompilers for Parallel and Vector Computers,” vol. 1. Addison-Wesley, 1990.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youcef Bouchebaba.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouchebaba, Y., Girodias, B., Coelho, F. et al. Buffer and Register Allocation for Memory Space Optimization. J VLSI Sign Process Syst Sign Im 49, 123–138 (2007). https://doi.org/10.1007/s11265-006-0001-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-006-0001-1

Keywords

Navigation