Abstract
Parallel dynamic meshes are essential for computational simulations of large-scale scientific applications involving motion. To address this need, we propose parallel LBWARP, a parallel log barrier-based tetrahedral mesh warping algorithm for distributed memory machines. Our algorithm is a general-purpose, geometric mesh warping algorithm that parallelizes the sequential LBWARP algorithm proposed by Shontz and Vavasis. The first step of the algorithm involves computation of a set of local weights for each interior node which describe the relative distances of the node to each of its neighbors. The weight computation step is the most time consuming in the parallel algorithm. Based on our choice of the mesh partition and the corresponding distribution of data and assignment of tasks to processors, communication among processors is avoided in an embarrassingly parallel computation of the weights. Once this representation of the initial mesh is determined, a target deformation of the boundary is applied, also in an embarrassingly parallel manner. Finally, new coordinates of the interior nodes are obtained by solving a system of linear equations with multiple right-hand sides that is based on the weights and boundary deformation. This linear system can be solved using one of three parallel sparse linear solvers, i.e., the distributed block BiCG, block GMRES, or LU algorithm, all of which support the solution of linear systems with multiple right-hand side vectors. Our numerical results demonstrate good efficiency and strong scalability of parallel LBWARP on up to 64 processors, as the experiments show close to linear speedup in all cases. Weak scalability is also demonstrated. The performance of the parallel sparse linear solvers is dependent on factors such as the mesh size, the amount of available memory, and the number of processors. For example, the distributed LU algorithm gives better performance on small meshes, whereas the distributed block BiCG and distributed block GMRES algorithms yield better performance when the amount of available memory is limited. Finally, we demonstrate the parallel LBWARP performance for a sequence of mesh deformations which can significantly reduce the runtime of the overall algorithm. When applied to k deformations, parallel LBWARP reuses the weight matrix, that was computed during the first deformation, when the distributed LU linear solver is employed. This gives close to k-time performance for sufficiently many deformations.
Similar content being viewed by others
Notes
The factor LU exists only if A is nonsingular. In the event an element on the diagonal of A is zero or nearly zero, partial pivoting is required.
The MR mesh quality metric is given by
$$\begin{aligned} \eta&= \frac{12(3v)^{2/3}}{\sum _{0 \le i < j \le 3} l_{ij}^2}~[29], \end{aligned}$$(5)where v and \(l_{ij}\) denote the volume and various edge lengths of the tetrahedron, respectively. Note the ideal mesh quality occurs when \(\eta = 1,\) and 0 denotes a degenerate tetrahedron. The range of the metric is 0 to 1. Higher values denote better quality.
References
Alan G (1973) Nested dissection of a regular finite element mesh. SIAM J Numer Anal 10:345–363
Amano A, Kanda K, Shibayama T, Kamei Y, Matsuda T (2007) Model generation interface for simulation of left ventricular motion. Electron Commun Jpn (Part II: Electronics) 90(12):87–98
Antonopoulos C, Ding X, Chernikov A, Blagojevic F, Nikolopoulos D, Chrisochoides N (2005) Multigrain parallel Delaunay mesh generation. In: Proceedings of the 19th annual international conference on supercomputing. ACM Press, New York, pp 367–376
Barrett R, Berry MW, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods, vol 43. SIAM, Philadelphia
Bavier E, Hoemmen M, Rajamanickam S, Thornquist H (2012) Amesos2 and Belos: direct and iterative solvers for large sparse linear systems. Sci Program 20(3):241–255
Benítez D, Rodríguez E, Escobar JM, Montenegro R (2014) Performance evaluation of a parallel algorithm for simultaneous untangling and smoothing of tetrahedral meshes. In: Proceedings of the 22nd international meshing roundtable. Springer, Cham, pp 579–598
Bertsekas DP (1982) Projected Newton methods for optimization problems with simple constraints. SIAM Journal on Control and Optimization 20(2):221–246
Botsch M, Bommes D, Kobbelt, L (2005) Efficient linear system solvers for mesh processing. In: Proceedings of the 11th IMA International conference on the mathematics of surfaces, pp 62–83
Castanos J, Savage J (1999) Pared: a framework for the adaptive solution of PDEs. In: Proceedings of the 8th IEEE symposium on high performance distributed computing, pp 133–140
Chernikov A, Chrisochoides N (2006) Parallel guaranteed quality Delaunay uniform mesh refinement. SIAM J Sci Comput 28:1907–1926
Chrisochoides N (2005) A survey of parallel mesh generation methods. Tech. Rep. SC-2005-09, Brown University
Chrisochoides N, Chernikov A, Fedorov A, Kot A, Linardakis L, Foteinos P (2009) Towards exascale parallel Delaunay mesh generation. In: Proceedings of the 18th international meshing roundtable, pp 319–336
CyberSTAR: a scalable terascale advanced resource for discovery through computing. The Pennsylvania State University. https://ics.psu.edu/advanced-cyberinfrastructure/ics-aci-infrastructure/
De Cougny H, Shephard M (1999) Parallel refinement and coarsening of tetrahedral meshes. Int J Numer Methods Eng 46(7):1101–1125
Estruch O, Lehmkuhl O, Borrell R, Segarra CP, Oliva A (2013) A parallel radial basis function interpolation method for unstructured dynamic meshes. Comput Fluids 80:44-54. (Selected contributions of the 23rd international conference on parallel fluid dynamics ParCFD2011)
Fletcher R (1976) Conjugate gradient methods for indefinite systems. In: Numerical analysis. Springer, Berlin, Heidelberg, pp 73–89
Galtier J, George P (1997) Prepartitioning as a way to mesh subdomains in parallel. In: Proceedings of the ASME/ASCE/SES Summer Meeting, special symposium on trends in unstructured mesh generation, pp 107–122
Gerhold T, Neumann J (2008) The parallel mesh deformation of the DLR TAU-code. In: New results in numerical and experimental fluid mechanics VI, notes on numerical fluid mechanics and multidisciplinary design, vol 96. Springer, Berlin, Heidelberg, pp 162–169
Gorman GJ, Rokos G, Southern J, Kelly PH (2015) Thread-parallel anisotropic mesh adaptation. In: New challenges in grid generation and adaptivity for scientific computing. Springer, pp 113–137
GrabCAD. https://grabcad.com
Guennebaud G, Jacob B et al (2010) Eigen v3. http://eigen.tuxfamily.org
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bureau Stand 49(6):409–436
Hunter PJ, Pullan AJ, Smaill BH (2003) Modeling total heart function. Annu Rev Biomed Eng 5(1):147–177
Ijiri T, Ashihara T, Umetani N, Igarashi T, Haraguchi R, Yokota H, Nakazawa K (2012) A kinematic approach for efficient and robust simulation of the cardiac beating motion. PLoS One 7(5):e36,706
Interoperable technologies for advanced petascale simulations (ITAPS) center (2010). http://www.scidac.gov/math/ITAPS.html
Karypis G, Kumar V (1999) A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392
Lachat C, Dobrzynski C, Pellegrini F (2014) Parallel mesh adaptation using parallel graph partitioning. In: 5th European conference on computational mechanics, vol 3. CIMNE-International Center for Numerical Methods in Engineering, pp 2612–2623
Li XS, Demmel JW (2003) SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans Math Softw 29(2):110–140
Liu A, Joe B (1994) Relationship between tetrahedron shape measures. BIT Numer Math 34(2):268–287
Löhner, R (2013) A 2nd generation parallel advancing front grid generator. In: Proceedings of the 21st international meshing roundtable, pp. 457–474
Löhner R, Camberos J, Marsha M (1990) Unstructured scientific computation on scalable multiprocessors. In: Hehrotra P, Saltz J (eds) Parallel unstructured grid generation. MIT Press, Cambridge, MA, pp 31–64
Löhner R, Cebral J (1999) Parallel advancing front grid generation. In: Proceedings of the 8th international meshing roundtable, pp 67–74
Lu Q, Shephard MS, Tendulkar S, Beall MW (2014) Parallel mesh adaptation for high-order finite element methods with curved element geometry. Eng Comput 30(2):271–286
Luke E, Collins E, Blades E (2012) A fast mesh deformation method using explicit interpolation. J Comput Phys 231:586–601
Nave D, Chrisochoides N, Chew L (2004) Guaranteed-quality parallel Delaunay refinement for restricted polyhedral domains. Comput Geom Theory Appl 28:191–215
O’Leary DP (1980) The block conjugate gradient algorithm and related methods. Linear Algebra Appl 29:293–322
Oliker L, Biswas R, Gabow H (2000) Parallel tetrahedral mesh adaptation with dynamic load balancing. Parallel Comput J 26:1583–1608
Park J, Shontz S, Drapaca C (2013) A combined level set/mesh warping algorithm for tracking brain and cerebrospinal fluid evolution in hydrocephalic patients. In: Image-based geometric modeling and mesh generation, Lecture notes in computational vision and biomechanics, vol 3, pp 107–141
Rivara M, Carlderon C, Pizaro D, Fedorov A, Chrisochoides N (2006) Parallel decoupled terminal-edge bisection algorithm for 3D meshes. Eng Comput 22:111–119
Rivara M, Pizarro D, Chrisochoides N (2004) Parallel refinement of tetrahedral edges using terminal-edge bisection algorithm. In: Proceedings of the 13th international meshing roundtable
Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7(3):856–869
Sastry SP, Shontz SM (2014) A parallel log-barrier method for mesh quality improvement and untangling. Eng Comput 30(4):503–515
Selwood P, Berzins M, Dew P (1997) 3D parallel mesh adaptivity: data-structures and algorithms. In: Proceedings of the 8th SIAM conference on parallel processing for scientific computing. SIAM
Selwood P, Verhoeven N, Nash J, Berzins M, Weatherill N, Dew P, Morgan K (1996) Parallel mesh generation and adaptivity: partitioning and analysis. In: Proceedings of 1996 parallel CFD conference
Shephard M, Flaherty J, Bottasso C, de Cougny H, Özturan C, Simone M (1997) Parallel automated adaptive analysis. Parallel Comput 23:1327–1347
Shontz S, Vavasis S (2003) A mesh warping algorithm based on weighted Laplacian smoothing. In: Proceedings of the 12th international meshing roundtable, pp 147–158
Shontz S, Vavasis S (2010) Analysis of and workarounds for element reversal for a finite element-based algorithm for warping triangular and tetrahedral meshes. BIT Numer Math 50:863–884
Si H (2015) Tetgen: A Delaunay-based quality tetrahedral mesh generator. ACM Trans Math Softw 41:11
Simoncini V (1997) A stabilized QMR version of block BICG. SIAM J Matrix Anal Appl 18(2):419–434
Simoncini V, Gallopoulos E (1996) Convergence properties of block GMRES and matrix polynomials. Linear Algebra Appl 247:97–119
Trilinos Project. http://trilinos.org/
Tsai H, Wong A, Cai J, Zhu Y, Liu F (2001) Unsteady flow calculations with a parallel multiblock moving mesh algorithm. AIAA J 39:1021–1029
Williams R (1991) Adaptive parallel meshes with complex geometry. In: Numerical grid generation in computational fluid dynamics and related fields
Acknowledgements
The work of the first author was funded by the Royal Thai Government scholarship. The work of the second author was supported in part by NSF Grants CNS-0720749 and NSF CAREER Award ACI-1500487 (formerly ACI-1330054 and ACI-1054459). This work was also supported in part through instrumentation funded by the National Science Foundation through Grant ACI0821527. The authors wish to thank the two anonymous referees for their careful reading of the paper and for their helpful suggestions which strengthened it.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Panitanarak, T., Shontz, S.M. A parallel log barrier-based mesh warping algorithm for distributed memory machines. Engineering with Computers 34, 59–76 (2018). https://doi.org/10.1007/s00366-017-0521-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00366-017-0521-2