Skip to main content
Log in

Dynamic Data Migration for Structured AMR Solvers

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement (AMR). The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing necessary. Due to the dynamically changing memory access pattern caused by the runtime adaption, it is a challenging task to achieve a high degree of geographical locality. The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration of misplaced data, (3) that a migrate-on-next-touch directive works well whereas the first-touch strategy is less advantageous for programs exhibiting a dynamically changing memory access patterns, and (4) that the overhead for such migration is low compared to the total execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wilson, K. M. Aglietti, B. B.: Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-C. In: Supercomputing ’01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, pp. 33–33. ACM Press, New York, NY, USA (2001)

  2. Corbalan, J., Martorell, X., Labarta, J.: Evaluation of the memory page migration influence in the system performance: the case of the SGI O2000. In: Proceedings of the 17th Annual International Conference on Supercomputing, pp. 121–129. ACM Press (2003)

  3. Holmgren S., Nordén M., Rantakokko J., Wallin D. (2002). Performance of PDE solvers on a self-optimizing NUMA architecture. Parallel Algor. Appl. 17(4): 285–299

    Article  Google Scholar 

  4. Mark Bull, J., Johnson, C.: Data Distribution, Migration and Replication on a cc-NUMA Architecture. In: Proceedings of the Fourth European Workshop on OpenMP. http://www.caspur.it/ewomp2002/ (2002)

  5. Rendleman C.A. (2000). Parallelization of structured, hierarchical adaptive mesh refinement algorithms. Comput Visual Sci 3: 147–157

    Article  MATH  Google Scholar 

  6. Deiterding, R.: Construction and application of an amr algorithm for distributed memory computers. In: Adaptive Mesh Refinement – Theory and Applications, Proc. of the Chicago Workshop on Adaptive Mesh Refinement Methods, pp. 361–372. Springer (2003)

  7. MacNeice P. (2000). Paramesh: a parallel adaptive mesh refinement community toolkit. Comput phys communi 126: 330–354

    Article  MATH  Google Scholar 

  8. Parashar, M., Browne, J.: System engineering for high performance computing software: the hdda/dagh infrastructure for implementation of parallel structured adaptive mesh refinement. In: IMA Volume on Structured Adaptive Mesh Refinement (SAMR) Grid Methods, pp. 1–18 (2000)

  9. Colella, P., Graves, D.T., Ligocki, T.J., Martin, D.F., Modiano, D., Serafini, D.B., Straalen, B.V.:Chombo Software Package for AMR Applications – Design Document. Applied Numerical Algorithms Group, NERSC Division, Lawrence Berkeley National Laboratories (2000)

  10. Wissink, A.M., Hornung, R.D., Kohn, S.R., Smith, S.S., Elliott, N.: Large scale parallel structured amr calculations using the samrai framework. In: proceedings of SC2001 (2001)

  11. Steensland, J.: Efficient partitioning of structured dynamic grid hierarchies. Doctoral thesis. Scientific Computing, Department of Information Technology, University of Uppsala. Uppsala dissertations from the Faculty of Science and Technology 44 (2002)

  12. Schloegel, K., Karypis, G., Kumar, V.: A unified algorithm for load-balancing adaptive scientific simulations. In: Proceedings Supercomputing 2000 (2000)

  13. Dreher J., Grauer R. (2005). Racoon: a parallel mesh-adaptive framework for hyperbolic conservation laws. Parallel Comput. 31: 913–932

    Article  Google Scholar 

  14. Maerten, B.: Drama: a library for parallel dynamic load balancing of finite element applications. In: Lecture Notes in Computer Science, Vol. 1685, pp. 313–316 (1999)

  15. Walshaw C., Cross M., Everett M.G. (1997). Parallel dynamic graph partitioning for adaptive unstructured meshes. Parallel Distributed Comput. 47(2): 102–108

    Article  Google Scholar 

  16. Rantakokko J. (2000). Partitioning strategies for structured multiblock grids. Parallel Comput. 26: 1661–1680

    Article  MATH  Google Scholar 

  17. Steensland, J., Söderberg, S., Thuné, M.: A comparison of partitioning schemes for blockwise parallel samr algorithms. In: Lecture Notes in Computer Science, Vol. 1947, pp. 160–169 (2001)

  18. Balsara D.S., Norton C.D. (2001). Highly parallel structured adaptive mesh refinement using parallel language-based approaches. Parallel Comput. 27: 37–70

    Article  MATH  Google Scholar 

  19. Rantakokko, J.: Comparison of parallelization models for structured adaptive mesh refinement. In: Lecture Notes in Computer Science, Vol. 3149, pp. 615–623 (2004)

  20. Blikberg, R.: Nested Parallelism in OpenMP with Application to Adaptive Mesh Refinement. PhD thesis, Parallab/Department of Informatics, University of Bergen, Norway, Februariy 2003 (2003)

  21. Blikberg R., Sørevik T. (2005). Load balancing and openmp implementation of nested parallelism. Parallel Comput. 31(10-12): 984–998

    Article  Google Scholar 

  22. Ferm L., Lötsetdt P. (2006). Space–time adaptive solutions of first order pdes. J. Sci. Comput. 26(1): 83–110

    Article  MATH  Google Scholar 

  23. Karypsis G., Kumar V. (1999). A fast and highly qualitymultilevel scheme for partitioning irregular gra phs. SIAM J. Sci. Comput. 20(1): 359–392

    Article  Google Scholar 

  24. Sun Microsystems, http://www.sun.com/servers/wp/docs/mpo_v7_CUSTOMER.pdf. Solaris Memory Placement Optimization and Sun Fire servers, January 2003 (2003)

  25. Teller P.J. (1990). Translation-lookaside buffer consistency. Computer 23(6): 26–36

    Article  Google Scholar 

  26. Löf, H., Holmgren, S.: Affinity-on-next-touch: increasing the performance of an industrial pde solver on a cc-numa system. In: ICS ’05: Proceedings of the 19th Annual International Conference onSupercomputing, pp. 387–392. ACM Press, New York, NY, USA (2005)

  27. Bircsak J., Craig P., Crowell R., Cvetanovic Z., Harris J., Alexander Nelson C., Offner C.D. (2000). Extending OpenMP for NUMA machines. Sci. Program, 8: 163–181

    Google Scholar 

  28. Laudon, J., Lenoski, D.: The SGI Origin: a ccNUMA highly scalable server. In: Proceedings of the 24th Annual International Symposium on Computer architecture, pp. 241–251. ACM Press (1997)

  29. Tikir, M.M., Hollingsworth, J.K.: Using hardware counters to automatically improve memory performance. In: SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 46. IEEE Computer Society, Washington, DC, USA (2004)

  30. Spiegel, A., an Mey, D.: Hybrid Parallelization with Dynamic Thread Balancing on a ccNUMA system. In: Brorson M. (ed.) Proceedings of the 6th European Workshop on OpenMP, pp. 77–81. Royal Institute of Technology (KTH), Sweden (2004)

  31. Löf H., Nordén M., Holmgren S. (2004). Improving geographical locality of data for shared memory implementations of PDE solvers. In: Sloth, P.M.A., Tan, C.J.K., Dongarra, J.J., and Hoekstra, A.G. (eds) Computational Science – ICCS 2004, Part II, pp 9–16. Springer-Verlag, Berlin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sverker Holmgren.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nordén, M., Löf, H., Rantakokko, J. et al. Dynamic Data Migration for Structured AMR Solvers. Int J Parallel Prog 35, 477–491 (2007). https://doi.org/10.1007/s10766-007-0056-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-007-0056-z

Keywords

Navigation