Abstract
Memory accesses contribute sunstantially to aggregate system delays. It is critical for designers to ensure that the memory subsystem is designed efficiently, and much work has been done on the exploitation of data re-use for algorithms that exhibit static memory access patterns in FPGAs. The proposed scheme enables the exploitation of data re-use for both static and non-static parallel memory access patterns through the use of a multi-port cache, where parameters can be determined at compile time and matched to the statistical properties of the application, and where sub-cache contentions are arbitrated with a semaphore-based system. A complete hardware implementation demonstrates that, for a motion vector estimation benchmark, the proposed caching scheme results in a cycle count reduction of 51% and execution time reduction of up to 24%, using a Xilinx XC2V6000 FPGA on a Celoxica RC300 board. Hardware resource usage and clock frequency penalties are analyzed while varying the number of ports and cache size. Consequently, it is demonstrated how the optimum cache size and number of ports may be established for a given datapath.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Issenin, I., Dutt, N.: Automatic generation of affine functions for memory optimizations. In: Proceedings of the conference on Design, Automation and Test in Europe, pp. 808–813 (2005)
Kandemir, M., Choudhary, A.: Compiler-directed scratch-pad memory hierarchy design and management. In: Proceedings of the Design Automation Conference, pp. 628–633 (2002)
Udayakumaran, A., Barua, R.: Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In: Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pp. 276–279 (2003)
Chalidabhongse, J., Kuo, C.: Fast motion vector estimation using multiresolution-spatio-temporal correlations. IEEE transactions on circuits and systems for video technology 7(3), 477–488 (1997)
Patterson, D.A., Hennessy, J.L.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco (1996)
Kulkarni, C., Catthoor, F., Man, H.D.: Data and memory optimization techniques for embedded systems. In: Proceedings of the IPDPS Workshops on Parallel and Distributed Processing, pp. 186–193 (2000)
Panda, P., Catthoor, F., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., Kjeldsberg, P.: Data and memory optimization techniques for embedded systems. IEEE Transactions on Very Large Scale Integr. Syst. 6(2), 149–206 (2001)
Ishihara, T., Fallah, F.: A way memoization technique for reducing power consumption in caches in Application Specific Integrated Procesors. In: Proceedings of the conference on Design, Automation and Test in Europe, pp. 358–363 (2005)
Nastaran, B., Park, J., Diniz, P.: A compiler analysis and algorithm for exploiting data reuse in configurable architectures with RAM blocks. In: Proceedings of the Field-Programmable Logic and Applications, pp. 1113–1115 (2004)
Guo, Z., Buyukkurt, B., Najjar, W., Vissers, K.: Optimized generation of data-paths from C codes for FPGAs. In: Proceedings of the conference on Design, Automation and Test in Europe, pp. 112–118 (2005)
Sohi, G.S., Franklin, M.: High-bandwidth data memory systems for superscalar processors. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 53–62 (1991)
Edmondson, J., Rubinfield, P., Bannon, P., Benschneider, B., Berstein, D., Castelino, R., Cooper, E., Dever, D., Donchin, D., Fischer, T., Jain, A., Mehta, S., Meyer, J., Preston, R., Rajagopalan, V., Somanathan, C., Taylor, S., Wolrich, G.: Internal organization of the Alpha 21164 a 300 MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal 7(1), 119–135 (1995)
Page, I., Luk, W.: Compiling Occam into FPGAs. In: Proceedings of the Field-Programmable Logic and Applications, pp. 271–283 (1991)
Intel: Understanding memory access characteristics of motion estimation algorithms (accessed October 1, 2005), http://www.intel.com/cd/ids/developer/asmo-na/eng/182345.htm?page=2
Celoxica: DK compiler (accessed October 1, 2005), http://www.celoxica.com
Celoxica: RC300 board (accessed October 1, 2005), http://www.celoxica.com/rc300/default.asp
Xilinx: Virtex 2 datasheet (accessed October 1, 2005), http://www.xilinx.com/bvdocs/publications/ds031.pdf
Celoxica: RC300 manual (accessed October 1, 2005), http://www.celoxica.com/techlib/CEL-WO4110816VG-316.pdf
Bouganis, C.S., Constantinides, G., Cheung, P.Y.K.: A novel 2-D design methodology for heterogeneous devices. In: Proceedings of the IEEE International Symposium on Field Programmable Custom Computing Machines, pp. 1–10 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ang, SS., Constantinides, G., Cheung, P., Luk, W. (2006). A Flexible Multi-port Caching Scheme for Reconfigurable Platforms. In: Bertels, K., Cardoso, J.M.P., Vassiliadis, S. (eds) Reconfigurable Computing: Architectures and Applications. ARC 2006. Lecture Notes in Computer Science, vol 3985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11802839_29
Download citation
DOI: https://doi.org/10.1007/11802839_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36708-6
Online ISBN: 978-3-540-36863-2
eBook Packages: Computer ScienceComputer Science (R0)