Skip to main content

Advertisement

Log in

Reducing Off-Chip Memory Access via Stream-Conscious Tiling on Multimedia Applications

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

The iteration space of a loop nest is the set of all loop iterations bounded by the loop limits. Tiling the iteration space can effectively exploit the available parallelism, which is essential to multiprocessor compiling and pipelined architecture design. Another improvement brought by tiling is the better data locality that can dramatically reduce memory access and, consequently, the relevant memory access energy consumptions. However, previous studies on tiling were based on the data dependence, thus arrays without dependencies such as input arrays (data streams) were not considered. In this paper, we extend the tiling exploration to also accommodate those dependence-free arrays, and propose a stream-conscious tiling scheme for off-chip memory access optimization. We show that input arrays are as important, if not more, as the arrays with data dependencies when the focus is on memory access optimization instead of parallelism extraction. Our approach is verified on TI’s low power C55X DSP with popular multimedia applications, exhibiting off-chip memory access reduction by 67% on average over the traditional iteration space tiling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Andonov, H. Bourzoufi, and S. Rajopadhye, Two-dimensional Orthogonal Tiling: From Theory to Practice. in Proceedings of HPC ’96, pp. 225–231 (1996).

  2. L. Carter, J. Ferrante, and S. F. Hummel, Hierarchical Tiling for Improved Superscalar Performance. in Proceedings of IPPS ’95, pp. 239–245 (1995).

  3. Fei Chen and E. Sha. Loop Scheduling and Partitions for Hiding Memory Latencies. in Proceedings of ISSS ’99, pp. 64–70 (1999).

  4. Karp R.M., Miller R.E., Winograd S. (July 1967). The Organization of Computations for Uniform recurrence equations. J. ACM 14(3):563–590

    Article  MATH  MathSciNet  Google Scholar 

  5. U. Banerjee, Loop Transformations for Restructuring Compilers. Kluwer Academic Publishers (1993).

  6. J. Ramanujam and P. Sadayappan, Tiling Multidimensional Iteration Spaces for Nonshared Memory Machines. in Proceedings Supercomputing ’91, pp. 111–120 (1991).

  7. Wang Q., Sha E., Passos N.L (Dec. 1996). Optimal Data Scheduling for Uniform Multi-Dimensional Applications. IEEE Trans. Computers 45(12):1439–1444

    Article  MATH  Google Scholar 

  8. M. Wolfe, High Performance Compilers for Parallel Computing, Addison Wesley Publishing Company (1996).

  9. P. -Y. Calland, J. Dongarra, and Y. Robert, Tiling with Limited Resources. in Proceedings ASSAP ’97, pp. 229–238 (1997).

  10. P. R. Panda, N. D. Dutt, and A. Nicolau, Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications, in Proceedings of EDTC ’97, pp. 7–11 (1997).

  11. P. Marwedel, L. Wehmeyer, M. Verma, Stefan Steinke, and Urs Helmig, Fast, Predictable and Low Energy Memory References Through Architecture-Aware Compilation. in Proceedings of ASP-DAC’04, pp. 4–11 (2004).

  12. Kandemir M., Ramanujam J., Irwin M., Narayanan V., Kadayif I., Parikh A. (Feb. 2004). A Compiler-Based Approach for Dynamically Managing Scratch-Pad Memories in Embedded Systems. IEEE Trans. CAD 23(2):243–260

    Google Scholar 

  13. Kadayif I., Kandemir M. (May 2005). Data Space-Oriented Tiling for Enhancing Locality. ACM Trans on Embedded Comput Sys 4(2):388–414

    Article  Google Scholar 

  14. A. Darte and G. Huard. Complexity of Multi-Dimensional Loop Alignment. in Proceedings of STACS’02, pp. 179–191 (2002).

  15. J. J. Navarro, E. G. Diego, and J. R. Herrero. Data Prefetching and Multilevel Blocking for Linear Algebra Operations. in Proceedings of Supercomputing ’96, pp. 109–116 (1996).

  16. TMS320C55x DSP Functional Overview, Texas Instruments Inc., http://focus.ti.com/lit/ug/spru307a/spru307a.pdf.

  17. ADSP-21xx Processor, Analog Devices Inc., http://www.analog.com/processors/processors/ADSP/.

  18. Texas Instruments, Inc. TMS320VC5510 Power Consumption Summary (SPRA972) (2003).

  19. Peir J.-K., Cytron R. (1989). Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors. IEEE Trans. on Comp. 38(8):1203–1211

    Article  Google Scholar 

  20. J. Xue, Loop Tiling for Parallelism. Kluwer Academic Publishers (2000).

  21. P. C. Shields. Elementary Linear Algebra. Worth Publishers, Inc. (1980).

  22. Darte A., Silber G.-A., Vivien F. (1997). Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling. Parallel Process. Lett. 7(4):379–392

    Article  Google Scholar 

  23. M. W. Hall, S. Hiranandani, K. Kennedy, and C. W. Tseng. Inter-Procedural Compilation of Fortran D for MIMD Distributed-Memory Machines. in Proceedings of Supercomputing ’92, pp. 522–534 (1992).

  24. D. J. Palermo, E. Su, J. A. Chandy, and P. Banerjee, Communication Optimizations Used in the PARADIGM Compiler for Distributed Memory Multicomputers. in Proceedings of Supercomputing ’94, pp. 1–10 (1994).

  25. V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, 2nd edn., Kluwer Academic (1997).

  26. Code Composer Studio Product, Texas Instruments Inc., http://www.go-dsp.com/mm-help/swfs/profiler.htm.

  27. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press (1992).

  28. Stan M.R., Burleson W.P. (Mar. 1995). Bus-invert coding for low-power i/o. IEEE Trans. VLSI 3(1):49–58

    Article  Google Scholar 

  29. S. Wuytack, F. Catthoor, L. Nachtergaele, and H. De Man, Power Exploration for Data Dominated Video Applications. in Proceedings of ISLPED’96, pp. 359–364 (1996).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunhui Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Kurdahi, F. Reducing Off-Chip Memory Access via Stream-Conscious Tiling on Multimedia Applications. Int J Parallel Prog 35, 63–98 (2007). https://doi.org/10.1007/s10766-006-0027-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-006-0027-9

Keywords

Navigation