Abstract
The effect of two soft fault error models on the convergence of the parallel flexible GMRES (FGMRES) iterative method solving an elliptical PDE problem on a regular grid is evaluated. We consider two types of preconditioners: an incomplete LU factorization with dual threshold (ILUT), and an algebraic recursive multilevel solver (ARMS) combined with random butterfly transformation (RBT). The experiments quantify the difference between two soft fault error models considered in this study and compare their potential impact on the convergence.
This work was supported in part by the Air Force Office of Scientific Research under the AFOSR award FA9550-12-1-0476 by the U.S. Department of Energy, Office of Advanced Scientific Computing Research, through the Ames Laboratory, operated by Iowa State University under contract No. DE-AC02-07CH11358, and by the U.S. Department of Defense High Performance Computing Modernization Program, through a HASI grant, and the ILIR/IAR program at NSWC Dahlgren. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and of Old Dominion University operating the Turing High Performance Computing Cluster.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report, UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)
Baboulin, M., Dongarra, J., Herrmann, J., Tomov, S.: Accelerating linear system solutions using randomization techniques. ACM Trans. Math. Softw. 39(2), 8:1–8:13 (2013)
Baboulin, M., Jamal, A., Sosonkina, M.: Using random butterfly transformations in parallel Schur complement-based preconditioning. In: 2015 Federated Conference on Computer Science and Information Systems, pp. 649–654 (2015)
Bridges, P.G., Ferreira, K.B., Heroux, M.A., Hoemmen, M.: Fault-tolerant linear solvers via selective reliability. arXiv preprint arXiv:1206.1390 (2012)
Bronevetsky, G., de Supinski, B.: Soft error vulnerability of iterative linear algebra methods. In: Proceedings of the of the 22nd Annual International Conference on Supercomputing, pp. 155–164. ACM (2008)
Cappello, F., Geist, A., Gropp, W., Kale, S., Kramer, B., Snir, M.: Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1(1), 5–28 (2014)
Coleman, E., Sosonkina, M.: Evaluating a persistent soft fault model on preconditioned iterative methods. In: Proceedings of the 22nd Annual International Conference on Parallel and Distributed Processing Techniques and Applications (2016)
Coleman, E., Sosonkina, M., Chow, E.: Fault tolerant variants of the fine-grained parallel incomplete LU factorization. In: Proceedings of the 2017 Spring Simulation Multiconference. Society for Computer Simulation International (2017)
Elliott, J., Hoemmen, M., Mueller, F.: Evaluating the impact of SDC on the GMRES iterative solver. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1193–1202. IEEE (2014)
Elliott, J., Hoemmen, M., Mueller, F.: Tolerating silent data corruption in opaque preconditioners (2014). arXiv:1404.5552
Elliott, J., Hoemmen, M., Mueller, F.: A numerical soft fault model for iterative linear solvers. In: Proceedings of the 24nd International Symposium on High-Performance Parallel and Distributed Computing (2015)
Elliott, J., Mueller, F., Stoyanov, M., Webster, C.: Quantifying the impact of single bit flips on floating point arithmetic. preprint (2013)
Elliott, J., Hoemmen, M., Mueller, F.: Resilience in numerical methods: a position on fault models and methodologies (2014). arXiv:1401.3013
Jamal, A., Baboulin, M., Khabou, A., Sosonkina, M.: A hybrid CPU/GPU approach for the parallel algebraic recursive multilevel solver pARMS. In: 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2016, Timisoara, Romania, pp. 411–416, 24–27 Sept 2016
Li, Z., Saad, Y., Sosonkina, M.: pARMS: a parallel version of the algebraic recursive multilevel solver. Numer. Linear Algebra Appl. 10(5–6), 485–509 (2003)
Saad, Y.: Iterative Methods for Sparse Linear Systems. Siam, Philadelphia (2003)
Saad, Y., Suchomel, B.: ARMS: an algebraic recursive multilevel solver for general sparse linear systems. Numer. Linear Algebra Appl. 9(5), 359–378 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Coleman, E., Jamal, A., Baboulin, M., Khabou, A., Sosonkina, M. (2018). A Comparison of Soft-Fault Error Models in the Parallel Preconditioned Flexible GMRES. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-78024-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)