Abstract
This Paper describes and evaluates a system of dynamic memory migraton for codes executing in a Non-Uniform Memory Access environment. This system of migration applies information about the load-imbalance within a workload in order to determine the affinity between threads of the application and regions of memory. This information then serves as the basis of migration decisions, with the object of minimising the NUMA distance between code and the memory it accesses. Results are presented which demonstrate the effectiveness of this technique in reducing the runtime of a set of representative HPC kernels.
Chapter PDF
References
Jiang, D., Singh, J.P.: Scaling application performance on a cache-coherent multiprocessor. In: ISCA 1999: Proceedings of the 26th annual international symposium on Computer architecture, pp. 305–316. IEEE Computer Society, Los Alamitos (1999)
Nordén, M., Löf, H., Rantakokko, J., Holmgren, S.: Geographical locality and dynamic data migration for OpenMP implementations of adaptive PDE solvers. In: Müller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315. Springer, Heidelberg (2008)
Scheurich, C., Dubois, M.: Dynamic page migration in multiprocessors with distributed global memory. IEEE Trans. Comput. 38(8), 1154–1163 (1989)
Bull, J.M.: Feedback guided dynamic loop scheduling: Algorithms and experiments. In: Pritchard, D., Reeve, J.S. (eds.) Euro-Par 1998. LNCS, vol. 1470, pp. 377–382. Springer, Heidelberg (1998)
Bartal, Y., Charikar, M., Indyk, P.: On page migration and other relaxed task systems. Theoretical Computer Science 268(1), 43–66 (2001)
Nikolopoulos, D.S., Papatheodorou, T.S., Polychronopoulos, C.D., Labarta, J., Ayguado, E.: A case for user-level dynamic page migration. In: ICS 2000: Proceedings of the 14th international conference on Supercomputing, pp. 119–130. ACM Press, New York (2000)
Corbalan, J., Martorell, X., Labarta, J.: Evaluation of the memory page migration influence in the system performance: the case of the SGI Origin 2000. In: ICS 2003: Proceedings of the 17th annual International Conference on Supercomputing, pp. 121–129. ACM Press, New York (2003)
LaRowe Jr., R.P., Wilkes, J.T., Ellis, C.S.: Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor. In: Proceedings of the 3rd ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, Williamsburg, VA, April 1991, vol. 26(7), pp. 122–132 (1991)
Chandra, R., Devine, S., Verghese, B., Gupta, A., Rosenblum, M.: Scheduling and page migration for multiprocessor compute servers. In: ASPLOS-VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, pp. 12–24. ACM Press, New York (1994)
SGI Incorporated: Speedshop user’s guide. Technical Report 007-3311-003, SGI, Mountain View, CA (2003)
Verghese, B., Devine, S., Gupta, A., Rosenblum, M.: Operating system support for improving data locality on ccNUMA compute servers. In: ASPLOS-VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, pp. 279–289. ACM Press, New York (1996)
Black, D., Sleator, D.: Competitive algorithms for replication and migration problems. Technical Report CMU-CS-89-201, Department of Computer Science, Carnegie-Mellon University (1989)
Petersen, K., Li, K.: An evaluation of multiprocessor cache coherence based on virtual memory support. In: Proceedings of the 8th International Symposium on Parallel Processing, pp. 158–164. IEEE Computer Society, Los Alamitos (1994)
Tikir, M.M., Hollingsworth, J.K.: Using hardware counters to automatically improve memory performance. In: SC 2004: Proceedings of the ACM/IEEE SC2004 Conference (SC 2004), p. 46. IEEE Computer Society, Los Alamitos (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Slavin, P., Freeman, L. (2008). Integrating Dynamic Memory Placement with Adaptive Load-Balancing for Parallel Codes on NUMA Multiprocessors. In: Luque, E., Margalef, T., Benítez, D. (eds) Euro-Par 2008 – Parallel Processing. Euro-Par 2008. Lecture Notes in Computer Science, vol 5168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85451-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-85451-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85450-0
Online ISBN: 978-3-540-85451-7
eBook Packages: Computer ScienceComputer Science (R0)