Skip to main content
Log in

Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique

  • Original Paper
  • Published:
Engineering with Computers Aims and scope Submit manuscript

Abstract

Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Shewchuk J (2002) What is a good linear element? Interpolation, conditioning, and quality measures. In: Proceedings of the 11th international meshing roundtable, pp 115–126

  2. Sagan H (1994) Space-filling curves. Springer, New York

    Book  MATH  Google Scholar 

  3. Shontz S, Vavasis S (2010) Analysis of and workarounds for element reversal for a finite element-based algorithm for warping triangular and tetrahedral meshes. BIT Numer Math 50:863–884

    Article  MathSciNet  MATH  Google Scholar 

  4. Park J, Shontz S, Drapaca C (2012) A combined level set/mesh warping algorithm for tracking brain and cerebrospinal fluid evolution in hydrocephalic patients. In: Zhang Y (ed) Image-based modeling and mesh generation. Lecture notes in computational vision and biomechanics, vol 3. Springer, London, pp 107–141

  5. Park J, Shontz SM, Drapaca CS (2012) Automatic boundary evolution tracking via a combined level set method and mesh warping technique: Application to hydrocephalus. In: Proceedings of the mesh processing in medical image analysis 2012—MICCAI 2012 international workshop, MeshMed 2012, pp 122–133

  6. Antaki J, Blelloch G, Ghattas O, Malcevic I, Miller G, Walkington N (2000) A parallel dynamic-mesh Lagrangian method for simulation of flows with dynamic interfaces. In: Proceedings of the 2000 supercomputing conference

  7. Adams M, Demmel JW (2000) Parallel multigrid solvers for 3D unstructured element problems in large deformation elasticity and plasticity. Int J Numer Methods Eng 48(8):1241–1262

  8. Adeli H, Kamal O (1992) Concurrent analysis of large structures-I: algorithms. Comput Struct 42(3):413–424

    Article  MATH  Google Scholar 

  9. Adeli H, Kamal O (1992) Concurrent analysis of large structures-II: applications. Comput Struct 42(3):425–432

    Article  MATH  Google Scholar 

  10. Rezende M, Paiva J (2000) A parallel algorithm for stiffness matrix assembling in a shared memory environment. Comput Struct 76(5):593–602

    Article  Google Scholar 

  11. Chien L, Sun C (1989) Parallel processing techniques for finite element analysis of nonlinear large truss structures. Comput Struct 31(6):1023–1029

    Article  MATH  Google Scholar 

  12. Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of 24th national conference, ACM Press, pp 157–172

  13. Heber G, Biswas R, Gao G, Guang, Gao R (2000) Self-avoiding walks over adaptive unstructured grids. Concurrency: Pract Exp 12:85–109

  14. Zhou M, Sahni O, Shephard M, Carothers C, Jansen K (2010) Adjacency-based data reordering algorithm for acceleration of finite element computations. Sci Prog 18:107–123

    Google Scholar 

  15. Han H, Tseng C (2006) Exploiting locality for irregular scientific codes. IEEE Trans Parallel Distrib Syst 17(7):606–618

    Article  Google Scholar 

  16. Strout M, Hovland P (2004) Metrics and models for reordering transformations. In: Proceedings of the second ACM SIGPLAN workshop on memory system performance (MSP), pp 23–34

  17. Oliker L, Li X, Husbands P, Biswas R (2002) Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Rev 44(3):373–393

    Article  MathSciNet  MATH  Google Scholar 

  18. Oliker L, Li X, Heber G, Biswas R (2000) Parallel conjugate gradient: effects of ordering strategies, programming paradigms, and architectural platforms.  IEEE Trans Parallel Distrib Syst

  19. Shontz S, Knupp P (2008) The effect of vertex reordering on 2D local mesh optimization efficiency. In: Proceedings of the 17th international meshing roundtable, pp 107–124

  20. Park J, Knupp P, Shontz S (2010) Static vertex reordering schemes for local mesh quality improvement. Technical report, Sandia National Laboratories

  21. Chatterjee S, Jain V, Lebeck A, Mundhra S, Thottethodi M (1999) Nonlinear array layouts for hierarchical memory systems. In: Proceedings of the 1999 ACM international conference on supercomputing, pp 444–453

  22. Vo T, Silva T, Scheidegger F, Pascucci V (2012) Simple and efficient mesh layout with space-filling curves. J Graph Tools 16(1):25–39

    Article  Google Scholar 

  23. Behrens J, Zimmermann J (2000) Parallelizing an unstructured grid generator with a space-filling curve approach. In: EURO-PAR 2000. Springer, London, pp 815–823

  24. Alauzet F, Loseille A (2009) On the use of space filling curves for parallel anisotropic mesh adaptation. In: Proceedings of the 18th international meshing roundtable, pp 337–357

  25. Yzelman A, Bisseling R (2012) A cache-oblivious sparse matrixvector multiplication scheme based on the hilbert curve. In: Progress in industrial mathematics at ECMI 2010, vol 17 of mathematics in industry. Springer, Berlin, Heidelberg, pp 627–633

  26. Mellor-Crummey J, Whalley D, Kennedy K (2001) Improving memory hierarchy performance for irregular applications using data and computation reorderings. Int J Parallel Prog 29(3):217–247

    Article  MATH  Google Scholar 

  27. Gerhold T, Neumann J (2008) The parallel mesh deformation of the DLR TAU-code. In: New results in numerical and experimental fluid mechanics VI, vol 96 of notes on numerical fluid mechanics and multidisciplinary design. Springer, Berlin, Heidelberg, pp 162–169

  28. Tsai HM, Wong ASF, Cai J, Zhu Y, Liu F (2001) Unsteady flow calculations with a parallel multiblock moving mesh algorithm. AIAA J 39:1021–1029

    Article  Google Scholar 

  29. George J, Liu J (1981) Computer solution of large sparse positive definite systems. Prentice-Hall, London

    MATH  Google Scholar 

  30. Logan D (2000) A first course in the finite element method, 2nd edn. Brooks/Cole Publishing Co., Pacific Grove

    Google Scholar 

  31. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  32. Kinney TB (2006) Inferior vena cava filters. Semin Intervent Radiol 23:230–239

    Article  Google Scholar 

  33. Si H (2013) TetGen: a quality tetrahedral mesh generator and three-dimensional Delaunay triangulator. http://tetgen.berlios.de/

  34. Magnusson S, Christensson M, Eskilson J, Forsgren D, Hållbergv G, Högberg J, Larsson F, Moestedt A, Werner B (2002) Simics: a full system simulation platform. Computer 35(2):50–58

    Article  Google Scholar 

  35. Nishtala R, Vuduc R, Demmel J, Yelick K (2004) Performance modeling and analysis of cache blocking in sparse matrix vector multiply. Technical report, University of California, Berkeley

  36. Gupta A, Kumar V, Sameh A (1995) Performance and scalability of preconditioned conjugate gradient methods on parallel computers. Technical report, Department of Computer Science, University of Minnesota

Download references

Acknowledgments

The authors would like to thank Rick Schraf and Todd Fetterolf for creating the CAD model of the IVC filter domain. The work of the first author is supported in part by the NSF Grant CNS-0720749, NSF CAREER Award OCI-1054459, NIH/NIGMS Center for Integrative Biomedical Computing, 2P41 RR0112553-12, and DOE NET DE-EE0004449 grants. The work of the third author was supported in part by NSF Grant CNS-0720749 and NSF CAREER Award ACI-1330056 (formerly OCI-1054459). This work of the second and fourth authors was supported in part by NSF grants 1147388, 1152479, 1017882, 0963839, 0720645, 0811687, 0702519, and a grant from Microsoft Corporation. The authors would also like to thank the two anonymous referees for their comments which improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shankar P. Sastry.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sastry, S.P., Kultursay, E., Shontz, S.M. et al. Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique. Engineering with Computers 30, 535–547 (2014). https://doi.org/10.1007/s00366-014-0363-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00366-014-0363-0

Keywords

Navigation