ABSTRACT
Experience with Intel Xeon Phi suggests that NUMA alone is inadequate for assignment of pages to devices in heterogeneous memory systems. We argue that this is because NUMA is based on a single distance metric between all domains (i.e., number of devices "in between" the domains), while relationships between heterogeneous domains can and should be characterized by multiple metrics (e.g., latency, bandwidth, capacity). We therefore propose elaborating the concept of NUMA distance to give better and more intuitive control of placement of pages, while retaining most of the simplicity of the NUMA abstraction. This can be based on minor modification of the Linux kernel, with the possibility for further development by hardware vendors.
- 2013. Advanced Configuration and Power Interface. (2013). http://www.acpi.info/spec50a.htmGoogle Scholar
- François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst. 2010. hwloc: A generic framework for managing hardware affinities in HPC applications. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on. IEEE, 180--186. Google ScholarDigital Library
- Christopher Cantalupo, Vishwanath Venkatesan, Jeff Hammond, Krzysztof Czurlyo, and Simon David Hammond. 2015. memkind: An Extensible Heap Memory Manager for Heterogeneous Memory Platforms and Mixed Memory Policies. Technical Report. Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States).Google Scholar
- George Chrysos. 2014. Intel® Xeon Phi coprocessor-the architecture. Intel Whitepaper 176 (2014).Google Scholar
- Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the BSDCan Conference, Ottawa, Canada.Google Scholar
- Anshuman Khandual. 2017. Hierarchical NUMA. In Proceedings of the Linux Plumbers Conference.Google Scholar
- Andi Kleen. 2005. A NUMA API for linux. Novel Inc (2005).Google Scholar
Index Terms
- NUMA Distance for Heterogeneous Memory
Recommendations
NUMA obliviousness through memory mapping
DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New HardwareWith the rise of multi-socket multi-core CPUs a lot of effort is being put into how to best exploit their abundant CPU power. In a shared memory setting the multi-socket CPUs are equipped with their own memory module, and access memory modules across ...
Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory
DAMON '18: Proceedings of the 14th International Workshop on Data Management on New HardwareWith High-Bandwidth Memory (HBM), an additional opportunity on hardware side for performance benefits is given. The large amount of available bandwidth compared to regular DRAM allows the execution of high numbers of threads in parallel masking ...
NUMAlloc: A Faster NUMA Memory Allocator
ISMM 2023: Proceedings of the 2023 ACM SIGPLAN International Symposium on Memory ManagementThe NUMA architecture accommodates the hardware trend of an increasing number of CPU cores. It requires the cooperation of memory allocators to achieve good performance for multithreaded applications. Unfortunately, existing allocators do not support ...
Comments