Skip to main content

Advertisement

Log in

Exploration of 3D grid caching strategies for ray-shooting

  • Special Issue
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Technology evolution gives an easy access to high performance dedicated computing machines using, for example, GPUs or FPGAS. When designing algorithms dealing with highly structured multidimensional data, the real bottleneck is often linked to memory access. The strategies implemented in standard CPU cache architectures are no longer efficient due to the parallelism level and the inherent structure of data. This article presents the so-called “n-Dimensional Adaptive and Predictive Cache” (nD-AP Cache) architecture aiming at efficient data access for grid traversal. A theoretical model of the 3D version of the cache was setup in order to predict the cache efficiency for given statistical characteristics of the access sequences and for given parameters of the cache. The practical example of ray shooting algorithms has been chosen in order to carefully explore the design space and exercise the 3D-AP cache. For this purpose, a simulation model as well as a fully functional emulation platform have been designed. Thanks to the proven efficiency of the architecture further improvement and applications of the nD-AP Cache are discussed. Comparisons with standard caches show that the nD-AP Cache allows to be two times more efficient than an “ideal” associative cache and, this, with four times less memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. International Technology Roadmap for Semiconductors, http://www.itrs.net

  2. which may be emitted or re-emitted light (rendering), density (PET reconstruction), attenuation (X-ray based reconstruction), …

  3. The 2D-AP Cache deals with 2D array and behaves similarly to the 3D-AP Cache.

References

  1. Amanatides, J., Woo, A.: A fast voxel traversal algorithm for ray tracing. In: Eurographics ’87, pp. 3–10. Elsevier Science Publishers, Amsterdam (1987)

  2. Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., Dwarkadas, S.: A dynamically tunable memory hierarchy. IEEE Trans. Comput. (2003)

  3. Catthoor, F., Danckart, K., Kulkarn, C., et al.: Data Access and Storage Management for Embedded Programmable Processors. Kluwer, Dordrecht (2002)

  4. Cucchiara, R., Piccardi, M., Prati, A.: Neighbor cache prefetching for multimedia image and video processing. IEEE Trans. Multimedia (2004)

  5. Cunat, C., Gobert, J., Mathieu, Y.: A coprocessor for real-time mpeg4 facial animation on mobiles. In: Proceedings of ESTIMedia (2003)

  6. Doweck, J.: Intel\(\textregistered\) smart memory access: minimizing latency on intel\(\textregistered\) coretm microarchitecture. Technology @Intel Magazine (2006)

  7. Dutta, H., Hannig, F., Teich, J.: Hierarchical partitioning for piecewise linear algorithms. In: IEEE PARELEC’06 (2006)

  8. Fu, J., Patel, J., Janssens, B.: Stride directed prefetching in scalar processors. In: Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992. MICRO 25., pp. 102–110 (1992)

  9. Gac, N., Mancini, S., Desvignes, M., Houzet, D.: High speed 3D tomography on cpu, gpu and fpga. EURASIP J. Embedded Syst. (2008)

  10. Guenther, T., Poliwoda, C., Reinhart, C., Hesser, J., Maenner, R., Meinzer, H., Baur, H.: Virim: A massively parallel processor for real-time volume visualization in medicine. Technical report, University of Mannheim (1995)

  11. Igehy, H., Eldridge, M., Proudfoot, K.: Prefetching in a texture cache architecture. In: HWWS ’98: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, pp. 133-ff. ACM, New York (1998)

  12. Kanus, U., Wetekam, G., Hirche, J.: Voxelcache: a cache-based memory architecture for volume graphics. In B. Mark, A. Schilling (eds.) Graphics Hardware, pp. 76–83. Eurographics Association (2003)

  13. Kaufmann P., Plessl C., Platzner M.: Evocaches: application-specific adaptation of cache mappings. In: NASA/ESA Conference on Adaptive Hardware and Systems, pp. 11–18 (2009)

    Article  Google Scholar 

  14. Kim D., Managuli R., Kim Y.: Data cache and direct memory access in programming mediaprocessors. IEEE Micro. 21, 33–42 (2001)

    Article  Google Scholar 

  15. Kistler M., Perrone M., Petrini F.: Cell multiprocessor communication network: built for speed. IEEE Micro. 26, 10–23 (2006)

    Article  Google Scholar 

  16. Köse C., Chalmers A.: Profiling for efficient parallel volume visualisation. Parallel Comput. 23(7), 943–952

    Article  MATH  Google Scholar 

  17. Kudithipudi D., Petko S., John E. (2008) Caches for multimedia workloads: power and energy tradeoffs. IEEE Trans. Multimedia 10(6), 1013–1021

    Article  Google Scholar 

  18. Mancini, S., Desvignes, M.: Ray casting on a SoPC platform: algorithm and memory tradeoff. In: IEEE Conference on Computer Information Technology, Seoul, Korea (2006)

  19. Mancini, S., Eveno, N.: An IIR based 2D adaptive and predictive cache for image processing. In: DCIS 2004, p. 85 (2004)

  20. Mancini, S., Pierrefeu, L., Larabi, Z., Mathieu, Y.: Calibrating a predictive cache emulator for soc design. In: AHS’2010 Proceedings (2010)

  21. NVIDIA. http://developer.download.nvidia.com/

  22. Osborne, R., Pfister, H., Lauer, H., McKenzie, N., Gibson, S., Hiatt, W., Ohkami, H.: Em-cube: an architecture for low-cost real-time volume rendering (1997)

  23. Patterson, D., Hennessy,J.: Computer Architecture: A Quantitative Approach, 2nd edn. Morgan Kaufmann, San Francisco (1996)

  24. Pfister, H., Kaufman, A.: Cube-4: A scalable architecture for real-time volume rendering. In: Proceedings of the 1996 symposium on Volume visualization, pp. 47–54 (1996)

  25. Pfister, H., Kaufman, A., cker Chiueh, T.: Cube-3: A real-time architecture for high-resolution volume visualization. In: In ACM/IEEE Symposium on Volume Visualization, pp. 75–83 (1994)

  26. Qadri, M.Y., McDonald-Maier, K.D.: Data cache-energy and throughput models: design exploration for embedded processors. EURASIP J. Embed. Syst. (2009)

  27. Silpa, B.V.N., Patney, A., Krishna, T., Panda, P.R., Visweswaran, G.S.: Texture filter memory: a power-efficient and scalable texture memory architecture for mobile graphics processors. In: ICCAD ’08: Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, pp. 559–564. IEEE Press, Piscataway (2008)

  28. Smith A.J.: Caches memories. Comput. Surv. 14, 473–530 (1982)

    Article  Google Scholar 

  29. Toczek, T., Mancini, S.: Efficient memory management for uniform and recursive grid traversal. In: Algorithm-Architecture Matching for Signal and Image Processing. Springer, Berlin (2010) (Accepted)

  30. Wechsler, O.: Inside Intel\(\textregistered\) Core Microarchitecture, Setting New Standards for Energy-Efficient Performance. Technical report, http://www.intel.com, 2010

  31. Wetekam, G., Staneker, D., Kanus, U., Wand, M.: A hardware architecture for multi-resolution volume rendering. In: HWWS ’05: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics hardware, pp. 45–51. ACM, New York (2005)

  32. Xilinx. Virtex-5 fpga user guide. http://www.xilinx.com/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zahir Larabi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mancini, S., Larabi, Z., Mathieu, Y. et al. Exploration of 3D grid caching strategies for ray-shooting. J Real-Time Image Proc 7, 3–19 (2012). https://doi.org/10.1007/s11554-010-0176-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-010-0176-3

Keywords

Navigation