skip to main content
10.1145/2684746.2689103acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
poster

Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only)

Published: 22 February 2015 Publication History

Abstract

The memory architecture has a significant effect on the flexibility and performance of a coarse-grained reconfigurable array (CGRA), which can be restrained due to configuration overhead and large latency of data transmission. Multi-context structure and data preloading method are widely used in popular CGRAs as a solution to bandwidth bottlenecks of context and data. However, these two schemes cannot balance the computing performance, area overhead, and flexibility. This paper proposed group-based context cache and multi-level data memory architectures to alleviate the bottleneck problems. The group-based context cache was designed to dynamically transfer and buffer context inside CGRA in order to relieve the off-chip memory access for contexts at runtime. The multi-level data memory was designed to add data memories to different CGRA hierarchies, which were used as data buffers for reused input data and intermediate data. The proposed memory architectures are efficient and cost-effective so that performance improvement can be achieved at the cost of minor area overhead. Experiments of H.264 video decoding program and scale invariant feature transform algorithm achieved performance improvements of 19% and 23%, respectively. Further, the complexity of the applications running on CGRA is no longer restricted by the capacity of the on-chip context memory, thereby achieving flexible configuration for CGRA. The memory architectures proposed in this paper were based on a generic CGRA architecture derived from the characteristics found in the majority of existing popular CGRAs. As such, they can be applied to universal CGRAs.

References

[1]
B.F. Mei, F.J. Veredas, B. Masschelein, "Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture," in Proceedings of International Conference on Field Programmable Logic and Applications, 2005. 622--625. DOI= http://dx.doi.org/10.1109/FPL.2005.1515799.
[2]
F. Campi, A. Deledda, M. Pizzotti, et al., "A dynamically adaptive DSP for heterogeneous reconfigurable platforms," in Design, Automation & Test in Europe Conf., 2007. 1--6. DOI= http://dx.doi.org/10.1109/DATE.2007.364559.
[3]
M.K.A. Ganesan, S. Singh, F. May, et al., "H. 264 decoder at HD resolution on a coarse grain dynamically reconfigurable architecture," in Proceedings of IEEE International Conference on Field Programmable Logic and Applications, 2007. 467--471. DOI= http://dx.doi.org/10.1109/FPL.2007.4380691.
[4]
L.B. Liu, C.C. Deng, D. Wang, et al., "An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications," in Proceedings of IEEE Custom Integrated Circuits Conference, 2013. 1--4. DOI= http://dx.doi.org/10.1109/CICC.2013.6658434.
[5]
F.J. Veredas, M. Scheppler, W. Moffat, et al., "Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes," in Proceedings of IEEE International Conference on Field Programmable Logic and Applications, 2005. 106--111. DOI= http://dx.doi.org/10.1109/FPL.2005.1515707.
[6]
G. Dimitroulakos, M. Galanis, C. Goutis, "Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays," in Proceedings of 16th IEEE International Conference on Application-Specific Systems, Architecture Processors, Samos, Greece, 2005. 161--168. DOI= http://dx.doi.org/10.1109/ASAP.2005.12.
[7]
Y. Kim, J. Lee, A. Shrivastava, et al., "High throughput data mapping for coarse-grained reconfigurable architectures," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2011, 30: 1599--1609. DOI= http://dx.doi.org/10.1109/TCAD.2011.2161217.
[8]
D. Lowe, "Distinctive image features from scale-invariant key points," International journal of computer vision, 2004, vol.60, no.2: 91--110. DOI= http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94.
[9]
H. Xu, J. Tanabe, H. Usui, et al., "A low power many-core SoC with two 32-core clusters connected by tree based NoC for multimedia applications," in IEEE Symposium on VLSI Circuits, 2012, 150--151. DOI= http://dx.doi.org/10.1109/VLSIC.2012.6243834.
[10]
T.D. Chuang, P.K. Tsung, P.C. Lin, et al., "A 59.5mW scalable/multi-view video decoder chip for quad/3D full HDTV and video streaming applications," in Proceedings of IEEE International Solid-State Circuits Conference, 2010. 262--263. DOI= http://dx.doi.org/10.1109/ISSCC.2010.5433908.
[11]
V. Bonato, E. Marques, and G.A. constantinides, "A parallel hardware architecture for scale and rotation invariant feature detection," IEEE Transactions on Circuits and Systems for Video Technology, Dec.2008, vol.18, no.12, 1703--1712. DOI= http://dx.doi.org/10.1109/TCSVT.2008.2004936.
[12]
L. Yao, H. Feng, Y. Zhu, et al., "An architecture of optimized SIFT feature detection for an FPGA implementation of an image matcher," in IEEE International conference on Field-Programmable Technology, 2009, 30--37. DOI= http://dx.doi.org/10.1109/FPT.2009.5377651.
[13]
P. Ouyang, S. Yin, H. Gao, et al., "Parallelization of computing-intensive tasks of SIFT algorithm on a reconfigurable architecture system," IEICE Trans. Inf. Syst., 2011, vol.e94-a, no.1, 1--10. DOI= http://dx.doi.org/10.1587/transfun.E96.A.1393.

Index Terms

  1. Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only)

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
      February 2015
      292 pages
      ISBN:9781450333153
      DOI:10.1145/2684746
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 February 2015

      Check for updates

      Author Tags

      1. cache prefetch
      2. cgra
      3. context cache
      4. data memory
      5. memory architecture

      Qualifiers

      • Poster

      Funding Sources

      • Science and Technology Project of Jiangxi Province China
      • China National High Technologies Research Program
      • Projects from State Grid Corporation of China

      Conference

      FPGA '15
      Sponsor:

      Acceptance Rates

      FPGA '15 Paper Acceptance Rate 20 of 102 submissions, 20%;
      Overall Acceptance Rate 125 of 627 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Mar 2025

      Other Metrics

      Citations

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media