Cyclic reference counting by typed reference fields

https://doi.org/10.1016/j.cl.2011.09.001Get rights and content

Abstract

Reference counting strategy is a natural choice for real-time garbage collection, but the cycle collection phase which is required to ensure the correctness for reference counting algorithms can introduce heavy scanning overheads. This degrades the efficiency and inflates the pause time required for garbage collection. In this paper, we present two schemes to improve the efficiency of reference counting algorithms. First, in order to make better use of the semantics of a given program, we introduce a novel classification model to predict the behavior of objects precisely. Second, in order to reduce the scanning overheads, we propose an enhancement for cyclic reference counting algorithms by utilizing strongly-typed reference features of the Java language. We implement our proposed algorithm in Jikes RVM and measure the performance over various Java benchmarks. Our results show that the number of scanned objects can be reduced by an average of 37.9% during cycle collection phase.

Highlights

► We propose the DRC algorithm to improve the cyclic reference counting. ► We introduce a novel classification model to predict the behavior of objects. ► The improvements come from identifying and disregarding unnecessary operations.

Introduction

Automatic dynamic memory management (a.k.a. garbage collection) has received a great deal of attention in recent years, especially in the object-oriented language systems. It provides benefits in software development, testing, and security. The productivity is improved when programmers are free from managing objects during the software development stage and the software execution is free from memory leaks. Today garbage collection is a core component in most object-oriented languages and also supported by the modern run-time environments (e.g. Java Virtual Machine and .NET Framework).

There are two general approaches for garbage collection: tracing and reference counting. Tracing collectors, which include mark-sweep, tricolor and generational/copying collectors, are often the choices for high performance applications [9]. Reference counting was not embraced widely in the past, but has received a renewed interest recently [5], [7], [11], [19], [22], [23], [27], [28], due to its tight data locality, naturally incremental behavior, and rapid reclamation of memory. Besides, reference counting provides small pauses for collection cycles, so that it can be used for real-time applications and embedded systems.

Although reference counting [14], [18] has many innate advantages, it also has particular weaknesses which often limit its usage. Due to the nature of the algorithm, the two features of natural incremental workload and rapid memory reuse are in conflict with each other. But most crucially, reference counting in its basic form is not complete, in that it cannot reclaim cyclic data structures [26], leading to memory leaks. These cyclic data structures can be addressed in three general ways: static elimination of any possible cyclic structures, a backup tracing collector which periodically collects accumulated cyclic garbage, or special functionality built into the reference counting collector to handle garbage cycles. Static cycle elimination, when it is not done by a program analysis in a compiler, could place additional burden in the developer, undermining the stated benefits of garbage collection. A backup tracing collector can introduce significant additional complexity to the run-time system and inflate the pause times in execution associated with any kind of garbage collection. These points suggest that cyclic reference counting, a functional extension to reference counting to collect cyclic data without compromising incremental behavior, would be an ideal solution.

Among cyclic reference counting algorithms, some of the most prominent are local mark-scan algorithms, as introduced by Martinez et al. [25]. But these cyclic reference counting algorithms are often inefficient, incurring significant scanning overheads during run-time and creating indeterminate pauses during program execution. Recent work has been done to reduce these cyclic scanning overheads [7], [22], [23], with regard to local mark-scan algorithm of cyclic reference counting.

However, these recent improvements have made only limited use of programming language semantics to predict the behavior of objects. Knowledge of this behavior could be applied to more quickly identify specific kinds of data structures, including cycles. The types of data structures that can be expected depend on the inter-object connectivity of a given program, which in turn is based on the reference fields among the objects in memory. In many languages, such as Java, the structure of each data type, and the types that it can link to, are very well defined, and can be used to predict the nature of the data structures in which it can participate. Our study shows that this strongly-typed reference feature of the Java language can be utilized to improve the cyclic reference counting algorithm effectively.

The contribution of this paper is threefold: to introduce a new classification in modeling the predictable inter-object connectivity; to present the concept of double reference counts for capturing the relation of references between objects; and to propose a new cyclic reference counting algorithm which can distinguish dispensable operations based on the previous two schemes to reduce the scanning overheads effectively. Moreover, we implement our proposed algorithm in Jikes RVM [1], [2] and measure the performance over various Java benchmarks.

The rest of the paper is organized as follows. In Section 2, we describe the current cyclic reference counting algorithms, including their strengths and weaknesses. Next, in Section 3, we introduce a novel object classification model based on the inter-object connectivity. Then, in Sections 4 and 5, we present our double reference counts algorithm and analyze our experimental results. And then, we review prior related work in Section 6. Finally, we conclude the paper with some remarks in Section 7.

Section snippets

Cyclic reference counting

The cyclic data structures are the structures in which an object is reachable from itself, either directly or indirectly (through other objects). Generally, reference counting maintains a reference count (RC) for each object indicating the number of pointers that reference the object. For any object, adding or deleting references to the object will cause increment or decrement of its RC respectively. If an object's RC is zero due to other reference deletions, it should be recycled. However, the

Classification models

The Java programming language is a strongly-typed language, which means that every variable and every reference has a type that is known at compile time. This feature of the Java language allows us to formulate much more detailed properties of objects than what we can get from an arbitrary directed graph. In this section, we will present a new classification model for object connectivity to make best utilization of the fixed types of reference fields in the Java language.

The double reference counts algorithm

Besides the new object classification model, we bring up another more effective approach to improve the local mark-scan algorithm, which is the double reference counts (DRC) algorithm. In this section, we will introduce the structures used in the DRC algorithm and the detailed description for this algorithm.

Analysis and implementation

In this section, we will present the implementation details for this algorithm, as well as its experimental results. We implement our proposed algorithm in Jikes RVM and measure the performance on the selected SPECjvm98 and DaCapo benchmarks. Our experiments show that the DRC algorithm can significantly reduce the pause time and the scanning overheads of the existing local mark-scan algorithm.

Prior work

Aside from local mark-scan algorithms, cyclic reference counting was addressed by Brownbridge [12] with weak and strong references, which was extended in [3] to optimize scanning in databases. Although Brownbridge also used multiple reference types and corresponding reference counts, his reference types were based only on the structure of the heap at the time objects and references are created, and make no use of class connectivity.

Local mark-scan algorithms were explored previously in [3], [7]

Conclusions

Cyclic reference counting offers a naturally incremental, fully correct garbage collection algorithm. While the basic local mark-scan algorithm suffers from poor efficiency, there are a variety of methods by which this efficiency may be improved. This paper presents a new method to model the object connectivity in Java programs, which can categorize the objects and references in a more effective way, so as to improve the efficacy of cycle collection phase potentially. Besides, this paper

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. 0296131 (ITR) 0219870 (ITR) and 0098235. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References (28)

  • S.M. Blackburn et al.

    The DaCapo benchmarks: java benchmarking development and analysis

  • S.M. Blackburn et al.

    Ulterior reference counting: fast garbage collection without a long wait

  • S.M. Blackburn et al.

    Oil and water? High performance garbage collection in Java with MMTk

  • H. Boehm

    The space cost of lazy reference counting

  • Cited by (0)

    This work is partially supported by National Science Council under the Grant NSC97-2221-E-011-097-.

    View full text