Abstract
The solution of large systems of linear equations is typically achieved by iterative methods. The rate of convergence of these methods can be substantially improved by the use of preconditioners, which can be either applied in a black-box fashion to the linear system, or exploit properties specific to the underlying problem for maximum efficiency. However, with the shift towards multi- and many-core computing architectures, the design of sufficiently parallel preconditioners is increasingly challenging.
This work presents a parallel preconditioning scheme for a state-of-the-art semiconductor device simulator and allows for the acceleration of the iterative solution process of the resulting system of linear equations. The method is based on physical properties of the underlying system of partial differential equations and results in a block preconditioner scheme, where each block can be computed in parallel by established serial preconditioners. The efficiency of the proposed scheme is confirmed by numerical experiments using a serial incomplete LU factorization preconditioner, which is accelerated by one order of magnitude on both multi-core central processing units and graphics processing units with the proposed scheme.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Boost C++ libraries, http://www.boost.org/
Bordawekar, R., Bondhugula, U., Rao, R.: Can CPUs Match GPUs on Performance with Productivity? Experiences with Optimizing a FLOP-intensive Application on CPUs and GPU. Technical report, IBM T. J. Watson Research Center (2010)
Cusp Library, http://code.google.com/p/cusp-library/
Gnudi, A., Ventura, D., Baccarani, G.: One-dimensional Simulation of a Bipolar Transistor by means of Spherical Harmonics Expansion of the Boltzmann Transport Equation. In: Proc. SISDEP, vol. 4, pp. 205–213 (1991)
Gnudi, A., Ventura, D., Baccarani, G., Odeh, F.: Two-dimensional MOSFET Simulation by Means of a Multidimensional Spherical Harmonics Expansion of the Boltzmann Transport Equation. Solid-State Electr. 36(4), 575–581 (1993)
Grote, M.J., Huckle, T.: Parallel Preconditioning with Sparse Approximate Inverses. SIAM J. Sci. Comput. 18, 838–853 (1997)
Haase, G., Liebmann, M., Douglas, C., Plank, G.: A Parallel Algebraic Multigrid Solver on Graphics Processing Units. In: Zhang, W., Chen, Z., Douglas, C.C., Tong, W. (eds.) HPCA 2009. LNCS, vol. 5938, pp. 38–47. Springer, Heidelberg (2010)
Heuveline, V., Lukarski, D., Weiss, J.P.: Enhanced Parallel ILU(p)-based Preconditioners for Multi-core CPUs and GPUs – The Power(q)-pattern Method. EMCL Preprint 2011-08, EMCL (2011)
Hong, S.M., Jungemann, C.: A Fully Coupled Scheme for a Boltzmann-Poisson Equation Solver based on a Spherical Harmonics Expansion. J. Comp. Electr. 8, 225–241 (2009)
Jungemann, C., Pham, A.T., Meinerzhagen, B., Ringhofer, C., Bollhöfer, M.: Stable Discretization of the Boltzmann Equation based on Spherical Harmonics, Box Integration, and a Maximum Entropy Dissipation Principle. J. Appl. Phys. 100(2), 024502–+ (2006)
Khronos Group. OpenCL, http://www.khronos.org/opencl/
MAGMA library, http://icl.cs.utk.edu/magma/
Nath, R., Tomov, S., Dongarra, J.: An Improved MAGMA GEMM For Fermi Graphics Processing Units. Intl. J. HPC Appl. 24(4), 511–515 (2010)
NVIDIA CUDA, http://www.nvidia.com/
OpenMP, http://openmp.org/
Rupp, K., Jüngel, A., Grasser, T.: Matrix Compression for Spherical Harmonics Expansions of the Boltzmann Transport Equation for Semiconductors. J. Comp. Phys. 229(23), 8750–8765 (2010)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics (2003)
Vassilevski, P.S.: Multilevel Block Factorization Preconditioners. Springer, Heidelberg (2008)
ViennaCL, http://viennacl.sourceforge.net/
van der Vorst, H.A.: Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Non-Symmetric Linear Systems. SIAM J. Sci. and Stat. Comp. 12, 631–644 (1992)
Weinbub, J., Rupp, K., Selberherr, S.: Distributed Heterogenous High-Performance Computing with ViennaCL. In: Abstracts Intl. Conf. LSSC, pp. 88–90 (2011)
Xu, K., Ding, D.Z., Fan, Z.H., Chen, R.S.: FSAI Preconditioned CG Algorithm combined with GPU Technique for the Finite Element Analysis of Electromagnetic Scattering Problems. Finite Elem. Anal. Des. 47, 387–393 (2011)
Zang, W., Du, G., Li, Q., Zhang, A., Mo, Z., Liu, X., Zhang, P.: A 3D Parallel Monte Carlo Simulator for Semiconductor Devices. In: Proc. IWCE, pp. 1–4 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rupp, K., Jüngel, A., Grasser, T. (2012). A GPU-Accelerated Parallel Preconditioner for the Solution of the Boltzmann Transport Equation for Semiconductors. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore - Challenge II. Lecture Notes in Computer Science, vol 7174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30397-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-30397-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30396-8
Online ISBN: 978-3-642-30397-5
eBook Packages: Computer ScienceComputer Science (R0)