Skip to main content
Log in

Interprocessor communication for high performance, explicit time integration

  • Original Article
  • Published:
Engineering with Computers Aims and scope Submit manuscript

Abstract

Parallel, explicit finite element analysis is based almost exclusively on point-to-point interprocessor communication. However, point-to-point communication on multicore architectures results in large performance variability because of shared caches and sockets. The interprocessor communication required during the solution phase must be designed to achieve a high degree of scalability and performance for explicit time integration operators. An analysis of point-to-point communication on different hardware platforms, communication library implementations, and message sizes demonstrates the need for a flexible software design that allows for optimization. Autotuning modules and preliminary performance tests are necessary to identify the optimal combination of calls. Performance differences of point-to-point messaging on multicore machines are illustrated with a test that uses combinations of MPI communication calls. The differences are apparent when cache and sockets are shared among the cores and for message sizes up to 1.5 MB. Alternative communication schemes are shown to perform faster depending on the architecture and message size. Nearly linear scalability results for explicit time integration are demonstrated using the design techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: Towards a realistic model of parallel computation. In: Proceedings 4th ACM SIGPLAN symposium on principles and practice of parallel programming. http://citeseer.ist.psu.edu/culler93logp.html

  2. Danielson KT, Namburu RR (1998) Nonlinear dynamic finite element analysis on parallel computers using FORTRAN 90 and MPI. Adv Eng Softw 29(3–6):179–186

    Article  Google Scholar 

  3. Demmel JW (1997) Applied numerical linear algebra. SIAM, Philadelphia

    MATH  Google Scholar 

  4. Karypis G, Kumar V (1999) Parallel multilevel k-way partitioning scheme for irregular graphs. SIAM Rev 41(2):278–300

    Article  MATH  MathSciNet  Google Scholar 

  5. Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48:96–129

    Article  Google Scholar 

  6. Krysl P, Bittner Z (2001) Parallel explicit solid dynamics with domain decomposition and message passing: dual partitioning scalability. Comput Struct 79(3):345–360

    Article  Google Scholar 

  7. Kumar S, Kale LV (2004) Scaling all-to-all multicast on fat-tree networks. In: ICPADS ’04: proceedings of the parallel and distributed systems, tenth international conference 2004, IEEE Computer Society, Washington, DC, USA

  8. Lawrence Livermore National Laboratory, MPI Performance Topics (2008) http://computing.llnl.gov/tutorials/mpi_performance

  9. McKenna FT, Fenves GL (2005) Open system for earthquake engineering simulation, Pacific Earthquake Engineering Research Center, University of California, Berkeley. http://opensees.berkeley.edu

  10. Message Passing Interface Forum, MPI (1995) A Message Passing Interface Standard. http://www.mpi-forum.org

  11. PMB PingPong (2008) http://www.lfbs.rwth-aachen.de/content/index.php?ctl_pos=392

  12. MVAPICH: MPI over InfiniBand and iWARP. http://mvapich.cse.ohio-state.edu/

  13. National Center for Supercomputing Applications at the University of Illinois. http://www.ncsa.uiuc.edu/

  14. Open MPI: Open Source High Performance Computing. http://www.open-mpi.org/

  15. Petropoulos G, Fenves GL (2008) Large-scale simulation of soil-foundation interaction on building response in a region. In: The 14th world conference on earthquake engineering, 12–17 October, Beijing, China (14-0061)

  16. Rao RM (2006) Explicit nonlinear dynamic finite element analysis on homogeneous/heterogeneous parallel computing environment. Adv Eng Softw 37:701–720

    Article  Google Scholar 

  17. San Diego Supercomputer Center. http://www.sdsc.edu/

  18. Texas Advanced Computing Center. http://www.tacc.utexas.edu/

  19. Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in MPICH. Int J High Perform Comput Appl 19:49–66

    Article  Google Scholar 

  20. Toselli A, Windluff O (2005) Domain decomposition methods—algorithms and theory. Springer, Berlin

  21. Virtual Machine Interface 2.1. http://vmi.ncsa.uiuc.edu

Download references

Acknowledgments

This research has been supported by the National Science Foundation under grants EEC-0121989 and OCI-0749227. The simulations where performed under an allocation approved by the Cyberinfrastructure Partnership for TeraGrid resources under award ECS080001. The awards and grants are greatly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Petropoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Petropoulos, G., Fenves, G.L. Interprocessor communication for high performance, explicit time integration. Engineering with Computers 26, 149–157 (2010). https://doi.org/10.1007/s00366-010-0174-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00366-010-0174-x

Keywords

Navigation