Abstract
Employing reconfigurable computing systems for numerical applications poses an interesting and promising approach toward increased performance. We study the applicability of the Convey HC-1 for numerical applications by decomposing a preconditioned conjugate gradient (CG) method into several independent kernels that can operate concurrently. To allow overlapped execution and to minimize data transfers, we stream the data between the kernel units using a central buffer set. A microprogrammable control unit orchestrates memory accesses, buffer writes/reads and kernel execution, and allows for further algorithms to be executedon the available kernel units. Solving the Poisson problem can thereby be accelerated up to 10 times compared to a single-threaded software version on the HC-1 and up to 1.2 times compared to a 2-socket hex-core Intel Xeon Westmere system with 24 hardware threads for large problem sizes with only a single application engine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, R.S., Yung, E.K.N., Chan, C., Wang, D.X., Fang, D.G.: Application of the SSOR preconditioned CG algorithm to the vector FEM for 3D full-wave analysis of electromagnetic-field boundary-value problems. IEEE Transactions on Microwave Theory and Techniques 50(4), 1165–1172 (2002)
Kunkel, J.M., Nerge, P.: System Performance Comparison of Stencil Operations with the Convey HC-1. Technical report, Research Group: Scientific Computing, University of Hamburg (November 2010)
Augustin, W., Weiss, J.P., Heuveline, V.: Convey HC-1 Hybrid Core Computer – The Potential of FPGAs in Numerical Simulation. In: HipHac 2011, pp. 1–8. KIT Scientific Publishing (2011)
Nagar, K., Bakos, J.: A Sparse Matrix Personality for the Convey HC-1. In: FCCM 2011, pp. 1–8. IEEE Computer Society (2011)
Morris, G.R., Prasanna, V.K., Anderson, R.D.: A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer. In: FCCM 2006, pp. 3–12. IEEE Computer Society (2006)
Maslennikow, O., Lepekha, V., Sergyienko, A.: FPGA Implementation of the Conjugate Gradient Method. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 526–533. Springer, Heidelberg (2006)
DuBois, D., DuBois, A., Boorman, T., Connor, C., Poole, S.: An Implementation of the Conjugate Gradient Algorithm on FPGAs. In: FCCM 2008, pp. 296–297. IEEE Computer Society (2008)
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Implicit and explicit optimizations for stencil computations. In: Proc. of the 2006 Workshop on Memory System Performance and Correctness, pp. 51–60. ACM (2006)
Augustin, W., Heuveline, V., Weiss, J.-P.: Optimized Stencil Computation Using In-Place Calculation on Modern Multicore Systems. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 772–784. Springer, Heidelberg (2009)
Gaster, B.R., Howes, L.: Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck? IEEE Computer 45, 42–52 (2012)
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009)
Vo, H.T., Comba, J.L., Geveci, B., Silva, C.T.: Streaming-Enabled Parallel Data Flow Framework in the Visualization ToolKit. IEEE Computing in Science Engineering 13(5), 72–83 (2011)
Willcock, J.J., Hoefler, T., Edmonds, N.G., Lumsdaine, A.: Active Pebbles: Parallel Programming for Data-Driven Applications. In: ICS 2011, pp. 235–244. ACM (2011)
Bomar, B.W.: Implementation of Microprogrammed Control in FPGAs. IEEE Transactions on Industrial Electronics 49(2), 415–422 (2002)
Saad, Y.: Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics (SIAM) (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nowak, F., Besenfelder, I., Karl, W., Schmidtobreick, M., Heuveline, V. (2013). A Data-Driven Approach for Executing the CG Method on Reconfigurable High-Performance Systems. In: Kubátová, H., Hochberger, C., Daněk, M., Sick, B. (eds) Architecture of Computing Systems – ARCS 2013. ARCS 2013. Lecture Notes in Computer Science, vol 7767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36424-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-36424-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36423-5
Online ISBN: 978-3-642-36424-2
eBook Packages: Computer ScienceComputer Science (R0)