skip to main content
10.1145/2618128.2618132acmconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
research-article

A study of connected object locality in NUMA heaps

Published: 13 June 2014 Publication History

Abstract

Reference locality is vital to the performance of parallel Garbage Collection (GC) running on Non-Uniform Memory Access (NUMA) machines. A GC thread may trace remotely placed objects that descend from the root set or, for load balance, a GC thread may steal non-local objects from other threads' work lists. Processing distant live objects could introduce contention in the interconnect links between memory nodes and it could increase memory access latency. Researchers have proposed various techniques to improve GC tracing locality. However, few studies attempt to optimize the locality of connected objects in NUMA object graph. In this paper, we study the locality of a rooted subgraph, a unit of object connectivity in the object graph. A rooted subgraph is a set of references containing one root reference, and every reference is reachable from the root. We empirically study the locality of rooted subgraphs of DaCapo and SPECjbb2005 benchmark suites. The results show that more than 80% of objects in a rooted subgraph are located in the same memory node as the root object. We then propose a GC locality optimization that uses the root memory node as a heuristic to guide GC threads processing local objects.

References

[1]
T. A. Anderson. Optimizations in a Private Nursery-based Garbage Collector. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 21--30, New York, NY, USA, 2010. ACM.
[2]
S. Auhagen, L. Bergstrom, M. Fluet, and J. Reppy. Garbage Collection for Multicore NUMA Machines. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 51--57, New York, NY, USA, 2011. ACM.
[3]
M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the Problems and Opportunities Posed by Multiple On-chip Memory Controllers. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 319--330, New York, NY, USA, 2010. ACM.
[4]
K. Barabash and E. Petrank. Tracing Garbage Collection on Highly Parallel Platforms. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, pages 1--10, New York, NY, USA, 2010. ACM.
[5]
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The Dacapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM.
[6]
C. J. Cheney. A Nonrecursive List Compacting Algorithm. Communications of the ACM, 13(11):677--678, 1970.
[7]
Y. Chicha and S. M. Watt. A Localized Tracing Scheme Applied to Garbage Collection. Programming Languages and Systems, 4279:323--339, 2006.
[8]
T. Domani, G. Goldshtein, E. K. Kolodner, E. Lewis, E. Petrank, and D. Sheinwald. Thread-local Heaps for Java. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 76--87, New York, NY, USA, 2002. ACM.
[9]
T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, pages 1--14, San Jose, CA, 1997.
[10]
H. Eran and E. Petrank. A Study of Data Structures with a Deep Heap Shape. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pages 21--28, Seattle, Washington, 2013. ACM.
[11]
C. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium, page 21, Monterey, California, 2001. USENIX Association.
[12]
L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM.
[13]
M. Hirzel, J. Henkel, A. Diwan, and M. Hind. Understanding the Connectivity of Heap Objects. In Proceedings of the 3rd International Symposium on Memory Management, ISMM '02, pages 36--49, New York, NY, USA, 2002. ACM.
[14]
X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The Garbage Collection Advantage: Improving Program Locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 69--80, New York, NY, USA, 2004. ACM.
[15]
R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman & Hall/CRC, 1st edition, 2011.
[16]
R. Jones and A. King. A Fast Analysis for Thread-local Garbage Collection with Dynamic Class Loading. In Fifth IEEE International Workshop on Source Code Analysis and Manipulation, pages 129--138, 2005.
[17]
M. S. Lam, P. R. Wilson, and T. G. Moher. Object Type Directed Garbage Collection To Improve Locality. In Proceedings of the International Workshop on Memory Management, IWMM '92, pages 404--425, London, UK, UK, 1992. Springer-Verlag.
[18]
Z. Majo and T. R. Gross. Memory Management in NUMA Multicore Systems: Trapped Between Cache Contention and Interconnect Overhead. In Proceedings of the international symposium on Memory management - ISMM '11, page 11, New York, NY, USA, 2011. ACM.
[19]
S. Marlow and S. Peyton Jones. Multicore Garbage Collection with Local Heaps. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 21--32, New York, NY, USA, 2011. ACM.
[20]
C. E. Oancea, A. Mycroft, and S. M. Watt. A New Approach to Parallelising Tracing Algorithms. In Proceedings of the 2009 International Symposium on Memory Management, ISMM '09, pages 10--19, New York, NY, USA, 2009. ACM.
[21]
T. Ogasawara. Numa-aware Memory Manager with Dominant-thread-based Copying GC. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '09, pages 377--390, New York, NY, USA, 2009. ACM.
[22]
Y. Shuf, M. Gupta, R. Bordawekar, and J. P. Singh. Exploiting Prolific Types for Memory Management and Optimizations. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '02, pages 295--306, New York, NY, USA, 2002. ACM.
[23]
Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. P. Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '02, pages 13--25, New York, NY, USA, 2002. ACM.
[24]
SPECjbb2005. Standard Performance Evaluation Corporation. Available at: http://www.spec.org/jbb2005.
[25]
B. Steensgaard. Thread-specific Heaps for Multi-threaded Programs. In Proceedings of the 2Nd International Symposium on Memory Management, ISMM '00, pages 18--24, New York, NY, USA, 2000. ACM.
[26]
M. M. Tikir and J. K. Hollingsworth. Numa-aware Java Heaps for Server Applications. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS '05, pages 108.2--, Washington, DC, USA, 2005. IEEE Computer Society.
[27]
P. R. Wilson, M. S. Lam, and T. G. Moher. Effective "Static-graph" Reorganization to Improve Locality in Garbage-collected Systems. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI '91, pages 177--191, New York, NY, USA, 1991. ACM.
[28]
J. Zhou and B. Demsky. Memory Management for Many-core Processors with Software Configurable Locality Policies. In Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pages 3--14, New York, NY, USA, 2012. ACM.

Cited By

View all
  • (2015)Topology-Aware Parallelism for NUMA Copying CollectorsRevised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 951910.1007/978-3-319-29778-1_12(191-205)Online publication date: 9-Sep-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSPC '14: Proceedings of the workshop on Memory Systems Performance and Correctness
June 2014
61 pages
ISBN:9781450329170
DOI:10.1145/2618128
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. locality
  2. rooted subgraph

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '14
Sponsor:

Acceptance Rates

MSPC '14 Paper Acceptance Rate 6 of 20 submissions, 30%;
Overall Acceptance Rate 6 of 20 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Topology-Aware Parallelism for NUMA Copying CollectorsRevised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 951910.1007/978-3-319-29778-1_12(191-205)Online publication date: 9-Sep-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media