poster

Exploiting algorithmic-level memory parallelism in distributed logic-memory architecture through hardware-assisted dynamic graph (abstract only)

Authors:

Yu Bai,

Abigail Fuentes,

Mingjie Lin,

Mike RieraAuthors Info & Claims

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Page 273

https://doi.org/10.1145/2435264.2435333

Published: 11 February 2013 Publication History

Abstract

Emerging FPGA device, integrated with abundant RAM blocks and high-performance processor cores, offers an unprecedented opportunity to effectively implement single-chip distributed logic-memory (DLM) architectures. Being "memory-centric", the DLM architecture can significantly improve the overall performance and energy efficiency of many memory-intensive embedded applications, especially those that exhibit irregular array data access patterns at algorithmic level. However, implementing DLM architecture poses unique challenges to an FPGA designer in terms of 1) organizing and partitioning diverse on-chip memory resources, and 2) orchestrating effective data transmission between on-chip and off-chip memory. In this paper, we offer our solutions to both of these challenges. Specifically, 1) we propose a stochastic memory partitioning scheme based on the well-known simulated annealing algorithm. It obtains memory partitioning solutions that promote parallelized memory accesses by exploring large solution space; 2) we augment the proposed DLM architecture with a reconfigure hardware graph that can dynamically compute precedence relationship between memory partitions, thus effectively exploiting algorithmic level memory parallelism on a per-application basis. We evaluate the effectiveness of our approach (A3) against two other DLM architecture synthesizing methods: an algorithmic-centric reconfigurable computing architectures with a single monolithic memory (A1) and the heterogeneous distributed architectures synthesized according to (A2). All experiments have been conducted with a Virtex-5 (XCV5LX155T-2) FPGA. On average, our experimental results show that our proposed A3 architecture outperforms A2 and A1 by 34% and 250%, respectively. Within the performance improvement of A3 over A2, more than 70% improvement comes from the hardware graph-based memory scheduling.

Index Terms

Exploiting algorithmic-level memory parallelism in distributed logic-memory architecture through hardware-assisted dynamic graph (abstract only)
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

Impact of Parallelism and Memory Architecture on FPGA Communication Energy
Regular Papers and Special Section on Field Programmable Gate Arrays (FPGA) 2015

The energy in FPGA computations is dominated by data communication energy, either in the form of memory references or data movement on interconnect. In this article, we explore how to use data placement and parallelism to reduce communication energy. We ...
Exploiting Memory-Level Parallelism in Reconfigurable Accelerators
FCCM '12: Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines

As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism ...
Area Evaluation of Memory Array Based PLD Architecture by Mapping Arithmetic Circuits
IWIA '11: Proceedings of the 2011 International Workshop on Innovative Architecture for Future Generation Processors and Systems

We are developing a reconfigurable device MPLD whose basic structure is arrays of memory that function as Look-up Tables (LUTs) with n inputs and n outputs. In our MPLD architecture, these LUTs are directly interconnected with each other, so it doesn't ...

Comments

Information & Contributors

Information

Published In

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

February 2013

294 pages

ISBN:9781450318877

DOI:10.1145/2435264

General Chair:
Brad Hutchings
Brigham Young University, USA
,
Program Chair:
Vaughn Betz
University of Toronto, Canada

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

FPGA '13

Sponsor:

SIGDA

FPGA '13: The 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 11 - 13, 2013

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Sponsor:
sigda

The 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 27 - March 1, 2025

Monterey , CA , USA

Index Terms

Recommendations

Impact of Parallelism and Memory Architecture on FPGA Communication Energy

Exploiting Memory-Level Parallelism in Reconfigurable Accelerators

Area Evaluation of Memory Array Based PLD Architecture by Mapping Arithmetic Circuits

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Abstract

Index Terms

Recommendations

Impact of Parallelism and Memory Architecture on FPGA Communication Energy

Exploiting Memory-Level Parallelism in Reconfigurable Accelerators

Area Evaluation of Memory Array Based PLD Architecture by Mapping Arithmetic Circuits

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Share

Share this Publication link

Share on social media

Affiliations