Skip to main content

Advertisement

Log in

Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The storage requirements in data-dominated signal processing systems, whose behavior is described by array-based, loop-organized algorithmic specifications, have an important impact on the overall energy consumption, data access latency, and chip area. This paper gives a tutorial overview on the existing techniques for the evaluation of the data memory size, which is an important step during the early stage of system-level exploration. The paper focuses on the most advanced developments in the field, presenting in more detail (1) an estimation approach for non-procedural specifications, where the reordering of the loop execution within loop nests can yield significant memory savings, and (2) an exact computation approach for procedural specifications, with relevant memory management applications – like, measuring the impact of loop transformations on the data storage, or analyzing the performance of different signal-to-memory mapping models. Moreover, the paper discusses typical memory management trade-offs – like, for instance, between storage requirement and number of memory accesses – taken into account during the exploration of the design space by loop transformations in the system specification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

Notes

  1. This occurs, for instance, when the rank of matrix T is equal to the number of its columns, as proven in [25].

  2. The difference between the two testing platforms does not affect in a significant way the relative differences between running times. According to [69], the former platform may be slightly faster (no more than 10−15%) than the latter, especially for applications exploiting the presence of the dual processor. The less amount of RAM of the second platform does not affect the performance of the K2 tool since this is not a critical resource.

References

  1. Catthoor, F., Wuytack, S., De Greef, E., Balasa, F., Nachtergaele, L., & Vandecappelle, A. (1998). Custom memory management methodology: Exploration of memory organization for embedded multimedia system design. Boston: Kluwer Academic Publishers.

    Google Scholar 

  2. Panda, P. R., Catthoor, F., Dutt, N., Dankaert, K., Brockmeyer, E., Kulkarni, C., et al. (2001). Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic System, 6(2), 149–206, April.

    Article  Google Scholar 

  3. Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P. G., Van Achteren, T., et al. (2002). Data access and storage management for embedded programmable processors. Boston: Kluwer Acad. Publ.

    MATH  Google Scholar 

  4. Van Achteren, T., Deconinck, G., Catthoor, F., & Lauwereins, R. (2002). Data reuse exploration methodology for loop-dominated applications. In Proc. ACM/IEEE design and test in Europe conf. (pp. 428–435). Paris, France (April).

  5. Hu, Q., Vandecappelle, A., Palkovic, M., Kjeldsberg, P. G., Brockmeyer, E., & Catthoor, F. (2006). Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications. In Proc. Asia & S.-Pacific design automation conf. (pp. 606–611). Yokohama, Japan (January)

  6. Luican, I. I., Zhu, H., & Balasa, F. (2006). Formal model of data reuse analysis for hierarchical memory organizations. In Proc. IEEE/ACM int. conf. comp.-aided design (pp. 595–600). San Jose CA (November).

  7. Brockmeyer, E., Miranda, M., Catthoor, F., & Corporaal, H. (2003). Layer assignment techniques for low energy in multi-layered memory organisations. In Proc. ACM/IEEE design automation and test in Europe conf. (pp. 1070–1075). Munich, Germany (March).

    Google Scholar 

  8. Tseng, C. J., & Siewiorek, D. (1986). Automated synthesis of data paths in digital systems. IEEE Transactions on Computer-Aided Design of ICs and Systems, CAD-5(3), 379–395, July.

    Article  Google Scholar 

  9. Goossens, G., Rabaey, J., Vandewalle, J., & De Man, H. (1987). An efficient microcode compiler for custom DSP processors. In Proc. IEEE int. conf. comp.-aided design (pp. 24–27). Santa Clara CA (November).

  10. Hashimoto, A., & Stevens, J. (1971). Wire routing by optimizing channel assignment within large apertures. In Proc. 8th design automation workshop (pp. 155–169).

  11. Kurdahi, F. J., & Parker, A. C. (1987). REAL: A program for register allocation. In Proc. 24th ACM/IEEE design automation conf. (pp. 210–215).

  12. Goossens, G. (1989). Optimization techniques for automated synthesis of application-specific signal-processing architectures. Ph.D. thesis, K.U. Leuven, Belgium.

  13. Paulin, P. G., & Knight, J. P. (1989). Force-directed scheduling for the behavioral synthesis of ASIC’s. IEEE Transactions on Computer-Aided Design of ICs and System, 8(6), 661–679, June.

    Article  Google Scholar 

  14. Gebotys, C. H., & Elmasry, M. I. (1992). Optimal VLSI architectural synthesis. Boston: Kluwer Academic Publ.

    Google Scholar 

  15. Stok, L., & Jess, J. (1992). Foreground memory management in data path synthesis. International Journal of Circuit Theory and Applications, 20, 235–255.

    Article  MATH  Google Scholar 

  16. Parhi, K. K. (1994). Calculation of minimum number of registers in arbitrary life time chart. IEEE Transactions on Circuits and Systems - II: Analog and Digital Signal Processing, 41(6), 434–436.

    Article  Google Scholar 

  17. Ohm, S. Y., Kurdahi, F. J., & Dutt, N. (1994). Comprehensive lower bound estimation from behavioral descriptions. In Proc. IEEE/ACM int. conf. on computer-aided design (pp. 182–187).

  18. Gajski, D., Vahid, F., Narayan, S., & Gong, J. (1994). Specification and design of embedded systems. Englewood Cliffs: Prentice Hall.

    MATH  Google Scholar 

  19. Verbauwhede, I., Scheers, C., & Rabaey, J. M. (1994). Memory estimation for high level synthesis. In Proc. 31st ACM/IEEE design automation conf. (pp. 143–148) (June).

    Google Scholar 

  20. Verbauwhede, I., Catthoor, F., Vandewalle, J., & De Man, H. (1989). Background memory management for the synthesis of algebraic algorithms on multi-processor dsp chips. In Proc. int. conf. on VLSI (pp. 209–218). Munich, Germany (August).

  21. Grun, P., Balasa, F., & Dutt, N. (1998). Memory size estimation for multimedia applications. In Proc. 6th int. workshop hardware/software co-design (pp. 145–149). Seattle WA (March).

  22. Zhao, Y., & Malik, S. (2000). Exact memory size estimation for array computations. IEEE Transactions on VLSI System, 8(5), 517–521.

    Article  Google Scholar 

  23. Ramanujam, J., Hong, J., Kandemir, M., & Narayan, A. (2001). Reducing memory requirements of nested loops for embedded systems. In Proc. 38th ACM/IEEE design automation conf. (pp. 359–364) (June).

  24. Balasa, F., Catthoor, F., & De Man, H. (1995). Background memory area estimation for multi-dimensional signal processing systems. IEEE Transactions on VLSI System, 3(2), 157–172, June.

    Article  Google Scholar 

  25. Balasa, F., Catthoor, F., & De Man, H. (1997). Practical solutions for counting scalars and dependences in ATOMIUM – a memory management system for multi-dimensional signal processing. IEEE Transactions on CAD of IC’s and System, 16(2), 133–145, February.

    Google Scholar 

  26. Darte, A., Schreiber, R., & Villard, G. (2005). Lattice-based memory allocation. IEEE Transactions on Computers, 54, 1242–1257, October.

    Article  Google Scholar 

  27. Banerjee, U. (1988). Dependence analysis for supercomputing. Boston: Kluwer Acad. Publ.

    Google Scholar 

  28. IMEC (2006). Atomium web site. http://www.imec.be/design/atomium/.

  29. Hu, Q., Vandecappelle, A., Kjeldsberg, P. G., Catthoor, F., & Palkovic, M. (2007). Fast memory footprint estimation based on dependency distance vector calculation. In Proc. ACM/IEEE design automation and test in Europe (pp. 379–384). Nice, France (April).

  30. Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2003). Data dependency size estimation for use in memory optimization. IEEE Transactions on CAD of IC’s and System, 22(7), 908–921, July.

    Google Scholar 

  31. Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2004). Storage requirement estimation for optimized design of data intensive applications. ACM Transactions on Design Automation of Electronic Systems, 9, 133–158, April.

    Article  Google Scholar 

  32. Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2001). Detection of partially simultaneously alive signals in storage requirement estimation for data-intensive applications. In Proc. 38th ACM/IEEE design automation conf. (pp. 365–370). Las Vegas NV (June).

  33. Moonen, M., Dooren, P. V., & Vandewalle, J. (1992). An SVD updating algorithm for subspace tracking. SIAM Journal on Matrix Analysis and Applications, 13(4), 1015–1038.

    Article  MATH  MathSciNet  Google Scholar 

  34. Schrijver, A. (1986). Theory of linear and integer programming. New York: Wiley.

    MATH  Google Scholar 

  35. Thiele, L. (1992). Compiler techniques for massive parallel architectures. In P. Dewilde (Ed.), State-of-the-art in computer science. Boston: Kluwer Acad. Publ.

    Google Scholar 

  36. Zhu, H., Luican, I. I., & Balasa, F. (2006). Memory size computation for multimedia processing applications. In Proc. Asia & South-Pacific design automation conf. (pp. 802–807). Yokohama, Japan (January).

  37. Balasa, F., Zhu, H., & Luican, I. I. (2007). Computation of storage requirements for multi-dimensional signal processing applications. IEEE Transactions on VLSI Systems, 15(4), 447–460, April.

    Article  Google Scholar 

  38. Pugh, W., & Wonnacott, D. (1993). An exact method for analysis of value-based array data dependences. In Proc. 6th int. workshop languages and compilers for parallel computing (pp. 546–566). Portland OR (August).

  39. Verdoolaege, S., Beyls, K., Bruynooghe, M., & Catthoor, F. (2005). Experiences with enumeration of integer projections of parametric polytopes. In R. Bodik (Ed.), Compiler construction: 14th int. conf. (Vol. 3443, pp. 91–105). Berlin: Springer.

    Google Scholar 

  40. Ph Clauss, Loechner, V. (1998). Parametric analysis of polyhedral iteration spaces. Journal of VLSI Signal Processing, 19(2), 179–194.

    Article  Google Scholar 

  41. Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., & Bruynooghe, M. (2004). Analytical computation of Ehrhart polynomials: Enabling more compiler analyses and optimizations. In Proc. int. conf. compilers arch. and synthesis for embedded syst. (pp. 248–258) (September).

  42. Dantzig, G. B., Eaves, B. C. (1973). Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A), 14, 288–297.

    Article  MATH  MathSciNet  Google Scholar 

  43. Pugh, W. (1992). A practical algorithm for exact array dependence analysis. Communications of the ACM, 35(8), 102–114, August.

    Article  Google Scholar 

  44. Barvinok, A. I. (1994). A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Mathematics of Operations Research, 19(4), 769–779, November.

    Article  MATH  MathSciNet  Google Scholar 

  45. De Greef, E., Catthoor, F., & De Man, H. (1997). Memory size reduction through storage order optimization for embedded parallel multimedia applications. special issue on “Parallel Processing and Multi-media. In A. Krikelis (Ed.), Parallel computing (Vol. 23, no. 12). Amsterdam: Elsevier (December).

    Google Scholar 

  46. Tronçon, R., Bruynooghe, M., Janssens, G., & Catthoor, F. (2002). Storage size reduction by in-place mapping of arrays. In A. Coresi (Ed.), Verification, model checking and abstract interpretation (pp. 167–181).

  47. Luican, I. I., Zhu, H., & Balasa, F. (2007). Signal-to-memory mapping analysis for multimedia signal processing. In Proc. Asia & South-Pacific design automation conf. (pp. 486–491). Yokohama, Japan (January).

  48. Absar, J., Catthoor, F., & Das K. (2003). Call-instance based function inlining for increasing data access related optimization opportunities. Technical report, IMEC, Leuven, Belgium.

  49. Dasygenis, M., Brockmeyer, E., Durinck, B., Catthoor, F., Soudris, D., & Thanailakis, A. (2004). Power, Performance and Area Exploration for Data Memory Assignment of Multimedia Applications. In A. Pimentel, & S. Vassiliadis (Eds.), Proc. systems, architectures, modeling, and simulation, LCNS (Vol. 3133, pp. 540–549). Samos: Springer-Verlag (June).

    Google Scholar 

  50. Shashidar, K. C., Vandecappelle, A., & Catthoor, F. (2001). Low power design of turbo decoder module with exploration of energy-performance trade-offs. In Proc. workshop on compilers and operating systems for low power in conjunction with Int. Conf. on parallel arch. and compilation techniques (pp. 10.1–10.6). Barcelona, Spain (September).

  51. Bastoul, C. (2004). Code generation in the polyhedral model is easier than you think. In Proc. int. conf. on parallel arch. and compilation techniques (pp. 7–16). Juan-les-Pins, France (September).

  52. Quillere, F., Rajopadhye, S., & Wilde, D. (2000). Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28(5), October.

  53. Banerjee, U., Eigenmann, R., Nicolau, A., & Padua, D. (1993). Automatic program parallelization. Proceedings of the IEEE, 81(2), 211–243, February.

    Article  Google Scholar 

  54. Feautrier, P. (1991). Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1), 23–52.

    Article  MATH  Google Scholar 

  55. Darte, A., & Robert, Y. (1995). Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. Journal on Parallel and Distributed Computing, 29(1), 43–59.

    Article  Google Scholar 

  56. Kandemir, M., Ramanujam, J., Choudhary, A., & Banerjee, P. (2001). A layout-conscious iteration space transformation technique. IEEE Transactions on Computers, 50(12), 1321–1335.

    Article  Google Scholar 

  57. Kelly, W., & Pugh, W. (1993). A framework for unifying reordering transformations. Univ. Maryland College Park, CS-TR-3193 (April).

  58. Wolf, M. E., & Lam, M. S. (1991). A data locality optimizing algorithm. In Proc. SIGPLAN conf. on programming language design and implementation (pp. 30–43) Toronto, Canada (June).

  59. Bastoul, C., Cohen, A., Girbal, A., Sharma, S., & Temam, O. (2003). Putting polyhedral loop transformations to work. In Proc. int. workshop languages & compilers for parallel comput. (pp. 209–225) (September).

  60. Semeria, L., & De Micheli, G., (1998). SpC: Synthesis of pointers in C. In Proc. IEEE/ACM int. conf. comp.-aided design (pp. 340–346). Santa Clara CA (November).

  61. Franke, B., & O’Boyle, M. (2003). Array recovery and high-level transformations for DSP applications. ACM Transactions Embedded Computing Systems, 2(2), 132–162, May.

    Article  Google Scholar 

  62. Vanbroekhoven, P., Janssens, G., Bruynooghe, M., Corporaal, H., & Catthoor, F. (2005). Transformation to dynamic single assignment using a simple data flow analysis. In Proc. 3rd Asian symp. on programming languages and syst., Tsukuba, Japan, and in Lecture Notes Comp. Sc. (Vol. 3780, pp. 330–346). Springer Verlag (November).

  63. Vanbroekhoven, P., Janssens, G., Bruynooghe, M., Corporaal, H., & Catthoor, F. (2003). Advanced copy propagation for arrays. In Proc. SIGPLAN conf. languages, compilers, and tools for embedded syst. (pp. 24–33). San Diego CA (June).

  64. Palkovic, M., Corporaal, H., & Catthoor, F. (2005). Global memory optimisation for embedded systems allowed by code duplication. In Proc. 9th int. Wsh. on software and compilers for embedded systems (pp. 72–80) (September).

  65. Palkovic, M., Corporaal, H., & Catthoor, F. (2008). Dealing with data dependent conditions to enable general global source code transformations. International Journal of Embedded Systems, Interscience Publ. (in press).

  66. Palkovic, M., Brockmeyer, E., Vanbroekhoven, P., Corporaal, H., & Catthoor, F. (2006). Systematic pre processing of data dependent constructs for embedded systems. Journal of Low Power Electronics, 2(1), 9–17, April.

    Article  Google Scholar 

  67. Catthoor, F., & Brockmeyer, E. (2000). Unified meta-flow summary for low-power data-dominated applications. In F. Catthoor (Ed.), Unified low-power design flow for data-dominated multi-media and telecom applications (pp. 7–23). Boston: Kluwer Acad. Publ.

    Google Scholar 

  68. Pareto, V. (1896). Cours D’Economie Politique, volume I–II. Lausanne.

  69. Tom’s Hardware (2003). Benchmark Marathon: 65 CPUs from 100 MHz to 3066 MHz (Online). Available:http://www.tomshardware.com/2003/02/17/benchmark_marathon/index.html

  70. Strobach, P. (1988). QSDPCM – A new technique in scene adaptive coding. In Proc. 4th Eur. signal processing conf. (pp. 1141–1144). Grenoble, France, Amsterdam: Elsevier Publ. (September).

    Google Scholar 

  71. Palkovic, M. (2007). Enhanced applicability of loop transformations (Chapter 6). Ph.D. Thesis, Eindhoven University of Technology, Dept. of Electrical Eng. (September).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Balasa.

Additional information

This research was sponsored in part by the U.S. National Science Foundation (DAP 0133318).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balasa, F., Kjeldsberg, P.G., Vandecappelle, A. et al. Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications. J Sign Process Syst Sign Image Video Technol 53, 51–71 (2008). https://doi.org/10.1007/s11265-008-0244-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0244-0

Keywords

Navigation