A Data-Driven Approach for Executing the CG Method on Reconfigurable High-Performance Systems

Nowak, Fabian; Besenfelder, Ingo; Karl, Wolfgang; Schmidtobreick, Mareike; Heuveline, Vincent

doi:10.1007/978-3-642-36424-2_15

Fabian Nowak²⁰,
Ingo Besenfelder²⁰,
Wolfgang Karl²⁰,
Mareike Schmidtobreick²¹ &
…
Vincent Heuveline²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7767))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1612 Accesses
1 Citations

Abstract

Employing reconfigurable computing systems for numerical applications poses an interesting and promising approach toward increased performance. We study the applicability of the Convey HC-1 for numerical applications by decomposing a preconditioned conjugate gradient (CG) method into several independent kernels that can operate concurrently. To allow overlapped execution and to minimize data transfers, we stream the data between the kernel units using a central buffer set. A microprogrammable control unit orchestrates memory accesses, buffer writes/reads and kernel execution, and allows for further algorithms to be executedon the available kernel units. Solving the Poisson problem can thereby be accelerated up to 10 times compared to a single-threaded software version on the HC-1 and up to 1.2 times compared to a 2-socket hex-core Intel Xeon Westmere system with 24 hardware threads for large problem sizes with only a single application engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, R.S., Yung, E.K.N., Chan, C., Wang, D.X., Fang, D.G.: Application of the SSOR preconditioned CG algorithm to the vector FEM for 3D full-wave analysis of electromagnetic-field boundary-value problems. IEEE Transactions on Microwave Theory and Techniques 50(4), 1165–1172 (2002)
Article Google Scholar
Kunkel, J.M., Nerge, P.: System Performance Comparison of Stencil Operations with the Convey HC-1. Technical report, Research Group: Scientific Computing, University of Hamburg (November 2010)
Google Scholar
Augustin, W., Weiss, J.P., Heuveline, V.: Convey HC-1 Hybrid Core Computer – The Potential of FPGAs in Numerical Simulation. In: HipHac 2011, pp. 1–8. KIT Scientific Publishing (2011)
Google Scholar
Nagar, K., Bakos, J.: A Sparse Matrix Personality for the Convey HC-1. In: FCCM 2011, pp. 1–8. IEEE Computer Society (2011)
Google Scholar
Morris, G.R., Prasanna, V.K., Anderson, R.D.: A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer. In: FCCM 2006, pp. 3–12. IEEE Computer Society (2006)
Google Scholar
Maslennikow, O., Lepekha, V., Sergyienko, A.: FPGA Implementation of the Conjugate Gradient Method. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 526–533. Springer, Heidelberg (2006)
Chapter Google Scholar
DuBois, D., DuBois, A., Boorman, T., Connor, C., Poole, S.: An Implementation of the Conjugate Gradient Algorithm on FPGAs. In: FCCM 2008, pp. 296–297. IEEE Computer Society (2008)
Google Scholar
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: Proc. of the 2006 Workshop on Memory System Performance and Correctness, pp. 51–60. ACM (2006)
Google Scholar
Augustin, W., Heuveline, V., Weiss, J.-P.: Optimized Stencil Computation Using In-Place Calculation on Modern Multicore Systems. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 772–784. Springer, Heidelberg (2009)
Chapter Google Scholar
Gaster, B.R., Howes, L.: Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck? IEEE Computer 45, 42–52 (2012)
Article Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009)
Chapter Google Scholar
Vo, H.T., Comba, J.L., Geveci, B., Silva, C.T.: Streaming-Enabled Parallel Data Flow Framework in the Visualization ToolKit. IEEE Computing in Science Engineering 13(5), 72–83 (2011)
Article Google Scholar
Willcock, J.J., Hoefler, T., Edmonds, N.G., Lumsdaine, A.: Active Pebbles: Parallel Programming for Data-Driven Applications. In: ICS 2011, pp. 235–244. ACM (2011)
Google Scholar
Bomar, B.W.: Implementation of Microprogrammed Control in FPGAs. IEEE Transactions on Industrial Electronics 49(2), 415–422 (2002)
Article Google Scholar
Saad, Y.: Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics (SIAM) (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair for Computer Architecture, Karlsruhe Institute of Technology, Germany
Fabian Nowak, Ingo Besenfelder & Wolfgang Karl
Engineering Mathematics and Computing Lab, Karlsruhe Institute of Technology, Germany
Mareike Schmidtobreick & Vincent Heuveline

Authors

Fabian Nowak
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Besenfelder
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Karl
View author publications
You can also search for this author in PubMed Google Scholar
Mareike Schmidtobreick
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Heuveline
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FIT, Czech Technical University, Thákurova 9, 160 00, Prague 6, Czech Republic
Hana Kubátová
Elektrotechnik und Informationstechnik, TU Darmstadt, Merckstraße 25, 64283, Darmstadt, Germany
Christian Hochberger
Department of Signal Processing, Institute of Information Theory and Automation, Pod Vodárenskou věží 4, 18208, Prague 8, Czech Republic
Martin Daněk
Intelligent Embedded Systems, University of Kassel, Wilhelmshöher Allee 73, 34121, Kassel, Germany
Bernhard Sick

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nowak, F., Besenfelder, I., Karl, W., Schmidtobreick, M., Heuveline, V. (2013). A Data-Driven Approach for Executing the CG Method on Reconfigurable High-Performance Systems. In: Kubátová, H., Hochberger, C., Daněk, M., Sick, B. (eds) Architecture of Computing Systems – ARCS 2013. ARCS 2013. Lecture Notes in Computer Science, vol 7767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36424-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-36424-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36423-5
Online ISBN: 978-3-642-36424-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics