research-article

A study of connected object locality in NUMA heaps

Author:

Khaled AlnowaiserAuthors Info & Claims

MSPC '14: Proceedings of the workshop on Memory Systems Performance and Correctness

Article No.: 1, Pages 1 - 9

https://doi.org/10.1145/2618128.2618132

Published: 13 June 2014 Publication History

Abstract

Reference locality is vital to the performance of parallel Garbage Collection (GC) running on Non-Uniform Memory Access (NUMA) machines. A GC thread may trace remotely placed objects that descend from the root set or, for load balance, a GC thread may steal non-local objects from other threads' work lists. Processing distant live objects could introduce contention in the interconnect links between memory nodes and it could increase memory access latency. Researchers have proposed various techniques to improve GC tracing locality. However, few studies attempt to optimize the locality of connected objects in NUMA object graph. In this paper, we study the locality of a rooted subgraph, a unit of object connectivity in the object graph. A rooted subgraph is a set of references containing one root reference, and every reference is reachable from the root. We empirically study the locality of rooted subgraphs of DaCapo and SPECjbb2005 benchmark suites. The results show that more than 80% of objects in a rooted subgraph are located in the same memory node as the root object. We then propose a GC locality optimization that uses the root memory node as a heuristic to guide GC threads processing local objects.

References

[1]

T. A. Anderson. Optimizations in a Private Nursery-based Garbage Collector. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 21--30, New York, NY, USA, 2010. ACM.

Digital Library

[2]

S. Auhagen, L. Bergstrom, M. Fluet, and J. Reppy. Garbage Collection for Multicore NUMA Machines. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 51--57, New York, NY, USA, 2011. ACM.

Digital Library

[3]

M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the Problems and Opportunities Posed by Multiple On-chip Memory Controllers. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 319--330, New York, NY, USA, 2010. ACM.

Digital Library

[4]

K. Barabash and E. Petrank. Tracing Garbage Collection on Highly Parallel Platforms. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 1--10, New York, NY, USA, 2010. ACM.

Digital Library

[5]

S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The Dacapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM.

Digital Library

[6]

C. J. Cheney. A Nonrecursive List Compacting Algorithm. Communications of the ACM, 13(11):677--678, 1970.

Digital Library

[7]

Y. Chicha and S. M. Watt. A Localized Tracing Scheme Applied to Garbage Collection. Programming Languages and Systems, 4279:323--339, 2006.

Digital Library

[8]

T. Domani, G. Goldshtein, E. K. Kolodner, E. Lewis, E. Petrank, and D. Sheinwald. Thread-local Heaps for Java. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 76--87, New York, NY, USA, 2002. ACM.

Digital Library

[9]

T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pages 1--14, San Jose, CA, 1997.

Digital Library

[10]

H. Eran and E. Petrank. A Study of Data Structures with a Deep Heap Shape. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 21--28, Seattle, Washington, 2013. ACM.

Digital Library

[11]

C. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium, page 21, Monterey, California, 2001. USENIX Association.

Digital Library

[12]

L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM.

Digital Library

[13]

M. Hirzel, J. Henkel, A. Diwan, and M. Hind. Understanding the Connectivity of Heap Objects. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 36--49, New York, NY, USA, 2002. ACM.

Digital Library

[14]

X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The Garbage Collection Advantage: Improving Program Locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 69--80, New York, NY, USA, 2004. ACM.

Digital Library

[15]

R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman & Hall/CRC, 1st edition, 2011.

Digital Library

[16]

R. Jones and A. King. A Fast Analysis for Thread-local Garbage Collection with Dynamic Class Loading. In Fifth IEEE International Workshop on Source Code Analysis and Manipulation, pages 129--138, 2005.

Digital Library

[17]

M. S. Lam, P. R. Wilson, and T. G. Moher. Object Type Directed Garbage Collection To Improve Locality. In Proceedings of the International Workshop on Memory Management, IWMM '92, pages 404--425, London, UK, UK, 1992. Springer-Verlag.

Digital Library

[18]

Z. Majo and T. R. Gross. Memory Management in NUMA Multicore Systems: Trapped Between Cache Contention and Interconnect Overhead. In Proceedings of the international symposium on Memory management - ISMM '11, page 11, New York, NY, USA, 2011. ACM.

Digital Library

[19]

S. Marlow and S. Peyton Jones. Multicore Garbage Collection with Local Heaps. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 21--32, New York, NY, USA, 2011. ACM.

Digital Library

[20]

C. E. Oancea, A. Mycroft, and S. M. Watt. A New Approach to Parallelising Tracing Algorithms. In Proceedings of the 2009 International Symposium on Memory Management, ISMM '09, pages 10--19, New York, NY, USA, 2009. ACM.

Digital Library

[21]

T. Ogasawara. Numa-aware Memory Manager with Dominant-thread-based Copying GC. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '09, pages 377--390, New York, NY, USA, 2009. ACM.

Digital Library

[22]

Y. Shuf, M. Gupta, R. Bordawekar, and J. P. Singh. Exploiting Prolific Types for Memory Management and Optimizations. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '02, pages 295--306, New York, NY, USA, 2002. ACM.

Digital Library

[23]

Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. P. Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '02, pages 13--25, New York, NY, USA, 2002. ACM.

Digital Library

[24]

SPECjbb2005. Standard Performance Evaluation Corporation. Available at: http://www.spec.org/jbb2005.

[25]

B. Steensgaard. Thread-specific Heaps for Multi-threaded Programs. In Proceedings of the 2Nd International Symposium on Memory Management, ISMM '00, pages 18--24, New York, NY, USA, 2000. ACM.

Digital Library

[26]

M. M. Tikir and J. K. Hollingsworth. Numa-aware Java Heaps for Server Applications. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05, pages 108.2--, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[27]

P. R. Wilson, M. S. Lam, and T. G. Moher. Effective "Static-graph" Reorganization to Improve Locality in Garbage-collected Systems. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI '91, pages 177--191, New York, NY, USA, 1991. ACM.

Digital Library

[28]

J. Zhou and B. Demsky. Memory Management for Many-core Processors with Software Configurable Locality Policies. In Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pages 3--14, New York, NY, USA, 2012. ACM.

Digital Library

Cited By

Alnowaiser KSinger J(2015)Topology-Aware Parallelism for NUMA Copying CollectorsRevised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 951910.1007/978-3-319-29778-1_12(191-205)Online publication date: 9-Sep-2015
https://dl.acm.org/doi/10.1007/978-3-319-29778-1_12

Index Terms

A study of connected object locality in NUMA heaps

Recommendations

Creating and preserving locality of java applications at allocation and garbage collection times
OOPSLA '02: Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications

The growing gap between processor and memory speeds is motivating the need for optimization strategies that improve data locality. A major challenge is to devise techniques suitable for pointer-intensive applications. This paper presents two techniques ...
Thread-local heaps for Java
ISMM '02: Proceedings of the 3rd international symposium on Memory management

We present a memory management scheme for Java based on thread-local heaps. Assuming most objects are created and used by a single thread, it is desirable to free the memory manager from redundant synchronization for thread-local objects. Therefore, in ...
Creating and preserving locality of java applications at allocation and garbage collection times

The growing gap between processor and memory speeds is motivating the need for optimization strategies that improve data locality. A major challenge is to devise techniques suitable for pointer-intensive applications. This paper presents two techniques ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSPC '14: Proceedings of the workshop on Memory Systems Performance and Correctness

June 2014

61 pages

ISBN:9781450329170

DOI:10.1145/2618128

General Chair:
Jeremy Singer
University of Glasgow
,
Program Chairs:
Milind Kulkarni
Purdue University
,
Tim Harris
Oracle Labs

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Salman Bin Abdul-Aziz University

Conference

PLDI '14

Sponsor:

SIGPLAN

PLDI '14: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 13, 2014

Edinburgh, United Kingdom

Acceptance Rates

MSPC '14 Paper Acceptance Rate 6 of 20 submissions, 30%;

Overall Acceptance Rate 6 of 20 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
169
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alnowaiser KSinger J(2015)Topology-Aware Parallelism for NUMA Copying CollectorsRevised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 951910.1007/978-3-319-29778-1_12(191-205)Online publication date: 9-Sep-2015
https://dl.acm.org/doi/10.1007/978-3-319-29778-1_12

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten