Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration

Yang, Qianming; Wen, Mei; Wu, Nan; Zhang, Chunyuan

doi:10.1007/s11227-012-0828-0

Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration

Published: 11 December 2012

Volume 63, pages 508–537, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Qianming Yang¹,
Mei Wen¹,
Nan Wu¹ &
…
Chunyuan Zhang¹

272 Accesses
3 Citations
Explore all metrics

Abstract

Recent research has shown that field programmable gate arrays (FPGAs) have a large potential for accelerating demanding applications, such as high performance digital signal process applications with low-volume market. The loss of generality in the architecture is one disadvantage of using FPGAs, however, the reconfigurability of FPGAs allow reprogramming for other applications. Therefore, a uniform FPGA-based architecture, an efficient programming model, and a simple mapping method are paramount for the wide acceptance of FPGA technology. This paper presents MASALA, a dynamically reconfigurable FPGA-based accelerator for parallel programs written in thread-intensive and explicit memory management (TEMM) programming models. Our system uses a TEMM programming model to parallelize demanding applications, including application decomposition into separate thread blocks and compute and data load/store decoupling. Hardware engines are included into MASALA using partial dynamic reconfiguration modules, each of which encapsulates a thread process engine that implements the hardware’s thread functionality. A data dispatching scheme is also included in MASALA to enable the explicit communication of multiple memory hierarchies such as interhardware engines, host processors, and hardware engines. Finally, this paper illustrates a multi-FPGA prototype system of the presented architecture: MASALA-SX. A large synthetic aperture radar image formatting experiment shows that MASALA’s architecture facilitates the construction of a TEMM program accelerator by providing greater performance and less power consumption than current CPU platforms, without sacrificing programmability, flexibility, and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Reconfigurable Buffer Structures for Coarse-Grained Reconfigurable Arrays

A Novel Memory Subsystem and Computational Model for Parallel Reconfigurable Architectures

References

Fatahalian J, Knight TJ et al (2006) Sequoia: programming the memory hierarchy. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing
Google Scholar
Mattson P (2002) A programming system for the imagine media processor. Dissertation, Stanford University
NVIDIA Corporation (2010) CUDA programming guide, version 2.1
Buck I, Foley T et al (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph 23(3):777–786
Article Google Scholar
Sukhwani B et al (2009) Effective floating point applications on FPGAs: examples from molecular modeling. In: High performance embedded computing workshop
Google Scholar
Xilinx Inc (2008) Early access partial reconfiguration user guide (UG208 v1.2). http://www.xilinx.com
Alpern B, Carter L, Ferrante J (1993) Modeling parallel computers as memory hierarchies. In: Proceedings of the programming models for massively parallel computers
Google Scholar
Rixner S, Dally WJ et al (1998) A bandwidth-efficient architecture for media processing. In: Proceedings of 31st annual ACM/IEEE international symposium on microarchitecture
Google Scholar
Bikshandi G, Guo et al (2006) Programming for parallelism and locality with hierarchically tiled arrays. In: Proceedings of the eleventh ACM SIGPLAN symposium on principles and practice of parallel programming
Google Scholar
Charles P, Grothoff C et al (2005) X10: an object-oriented approach to nonuniform cluster computing. In: OOPSLA’05: proceedings of the 20th annual ACM SIGPLAN conference on object oriented programming systems languages and applications
Google Scholar
Callahan D, Chamberlain BL, Zima HP (2004) The Cascade high productivity language. In: Ninth international workshop on high-level parallel programming models and supportive environments
Google Scholar
Krasteva Y, Jimeno A, Torre E, Riesgo T (2005) Straight method for reallocation of complex cores by dynamic reconfiguration in Virtex II FPGAs. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping, Montreal, Canada
Google Scholar
Xilinx Inc (2007) XPS HWICAP (v1.00.a) product specification (DS586). http://www.xilinx.com
Liu M, Kuehn W, Lu Z, Jantsch A (2009) Run-time partial reconfiguration speed investigation and architectural design space exploration. In: Proceedings of IEEE international conference on field programmable logic and applications
Google Scholar
Mcdonald EJ (2008) Runtime FPGA partial reconfiguration. In: Proceedings of 2008 IEEE aerospace conference. March 2008
Google Scholar
Claus C, Zhang B, Stechele W et al (2008) A multiplatform controller allowing for maximum dynamic partial reconfiguration throughput. In: Proceedings of the international conference on field programmable logic and applications. September 2008
Google Scholar
Pi Y, Long H, Huang S (2002) A SAR parallel processing algorithm and its implementation. In: FIEOS conference
Google Scholar
Chan YK, Koo VC (2008) Modified algorithm for real time SAR signal processing. Prog Electromagn Res C 1:159–168
Article Google Scholar
Kuusilinna K et al (2003) Designing BEE: a hardware emulation engine for signal processing in low-power wireless applications. EURASIP J Appl Signal Process
Heithecker S et al (2007) A high-end real-time digital film processing reconfigurable platform. EURASIP J Embed Syst
Chang C (2005) Design and applications of a reconfigurable computing system for high performance digital signal processing. Dissertation, University of California, Berkeley
Manuel S, Daniel N, Emanuel R, Paul C (2006) Configuration and programming of heterogeneous multiprocessors on a multi-FPGA system using TMD-MPI. IEEE, New York
Google Scholar
Lysaght P, Blodget B, Mason J, Young J, Bridgford B (2006) Enhanced architectures, design methodologies and CAD tools for dynamic reconfiguration of Xilinx FPGAs. In: International conference on field programmable logic and applications
Google Scholar
Sedcole P, Blodget B, Becker T, Anderson J, Lysaght P (2006) Modular dynamic reconfiguration in Virtex FPGAs. In: IEE proceedings on computers and digital techniques
Google Scholar
Jian H, Matthew P, Jooheung L, Ronald FD (2008) Scalable FPGA-based architecture for DCT computation using dynamic partial reconfiguration. ACM Trans Embed Comput Syst 1–18
Claus C, Zeppenfeld J, MÄuller F, Stechele W (2007) Using partial-run-time reconfigurable hardware to accelerate video processing in driver assistance system. In: Proceedings of the conference on design, automation and test in Europe, San Jose, CA, USA
Google Scholar
Mateusz M, Jürgen T, Ali A, Christophe B (2007) The Erlangen slot machine: a dynamically reconfigurable FPGA-based computer. J VLSI Signal Process 47(1)
Chi-Keung L, Sunpyo H, Hyesoon K (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture
Google Scholar
Xiao L et al (2008) Implementation for high resolution SAR parallel imaging. Inf Electron Eng 6(1)
Carlston P et al (2009) Optimizing an innovative SAR post-processing algorithm for multi-core processors: a case study. In: High performance embedded computing workshop
Google Scholar
Lundgren W et al (2007) Programming examples that expose efficiency issues for the cell broadband engine architecture. In: High performance embedded computing workshop
Google Scholar
John LH, David AP (2002) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann, San Mateo
MATH Google Scholar
http://sequoia.stanford.edu/, 2010
http://scottmcpeak.com/elkhound/sources/elsa/, 2010
FFT Xilinx Logicore (2010) http://www.xilinx.com/products/ipcenter/FFT.htm

Download references

Acknowledgements

This research was supported by the NSFC under Grant No. 61033008, 60903041, and 61103080, SRFDP under Grant No. 20104307110002, the Hunan Provincial Innovation Foundation For Postgraduate under Grant No. CX2010B028, and the Fund of Innovation in Graduate School of NUDT under Grant No. B100603.

Author information

Authors and Affiliations

Computer School, National University of Defense Technology, Changsha, 410073, P.R. China
Qianming Yang, Mei Wen, Nan Wu & Chunyuan Zhang

Authors

Qianming Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mei Wen
View author publications
You can also search for this author in PubMed Google Scholar
Nan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chunyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qianming Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Q., Wen, M., Wu, N. et al. Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration. J Supercomput 63, 508–537 (2013). https://doi.org/10.1007/s11227-012-0828-0

Download citation

Published: 11 December 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s11227-012-0828-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration

Abstract

Access this article

Similar content being viewed by others

Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Reconfigurable Buffer Structures for Coarse-Grained Reconfigurable Arrays

A Novel Memory Subsystem and Computational Model for Parallel Reconfigurable Architectures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration

Abstract

Access this article

Similar content being viewed by others

Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Reconfigurable Buffer Structures for Coarse-Grained Reconfigurable Arrays

A Novel Memory Subsystem and Computational Model for Parallel Reconfigurable Architectures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation