A tunable hybrid memory allocator

https://doi.org/10.1016/j.jss.2005.09.003Get rights and content

Abstract

Dynamic memory management can make up to 60% of total program execution time. Object oriented languages such as C++ can use 20 times more memory than procedural languages like C. Bad memory management causes severe waste of memory, several times that actually needed, in programs. It can also cause degradation in performance. Many widely used allocators waste memory and/or CPU time. Since computer memory is an expensive and limited resource its efficient utilization is necessary. There cannot exist a memory allocator that will deliver best performance and least memory consumption for all programs and therefore easily tunable allocators are required. General purpose allocators that come with operating systems give less than optimal performance or memory consumption. An allocator with a few tunable parameters can be tailored to a program’s needs for optimal performance and memory consumption. Our tunable hybrid allocator design shows 11–54% better performance and nearly equal memory consumption when compared to the well known Doug Lea allocator in seven benchmark programs.

Introduction

Computer programs usually cannot foresee the amount of memory they will need to perform their tasks which often depend on the inputs provided to them. Moreover, object oriented programming languages such as C++ use dynamic memory transparent to the programmer (Calder et al., 1994, Chang et al., 2001). C++ programs can use 20 times more memory than C programs (Haggander and Lundberg, 1998). Operating systems usually provide system library routines and space to allocate and free dynamic memory for the program. Dynamic memory allocation and deallocation can constitute up to 60% of total program execution time (Berger et al., 2001, Zorn and Grunwald, 1992a). The increasing program demand for dynamic memory has led to the search for more efficient allocation algorithms that minimize time and memory costs (Chang and Daugherty, 2000, Nilsen and Gao, 1995). The memory allocation issue arises in permanent storage also and similar algorithms as used for dynamic memory allocation are utilized for optimal performance (Iyengar et al., 2001).

In C++ programs, dynamic memory (also called heap memory) is allocated and freed by invoking operators new and delete and in C by calls to library routines malloc( ), realloc( ), and free( ). Usually, the C++ new and delete operators rely on the C library routines for dynamic memory. These library routines can be implemented using different algorithms. A dynamic memory management algorithm is often referred to simply as an allocator or allocation algorithm. An allocator’s memory consumption for a particular program is the high-water mark of memory it takes from the operating system to satisfy the program’s requests for dynamic memory (Berger et al., 2001). An allocators’s performance is the amount of time it consumes to perform all its task. In Unix systems, an allocator could use the sbrk library routine or the memory mapping system call, mmap, to obtain dynamic memory from the operating system. In the program’s memory space the stack grows downwards from the top and the heap upwards towards the stack as shown in Fig. 1.

The allocator’s job is to provide memory to the program when requested and take it back when the program returns it. It has to obtain memory from the operating system when it has no more memory left, as in the beginning of program execution, keep track of the memory bytes returned by the program so they can be used again to service future program requests. Furthermore, the allocator should try to do all these tasks in the least possible amount of time using the least possible amount of heap memory space taken from the OS. In other words the allocator should try to maximize performance, and minimize memory consumption (Wilson et al., 1995).

There appears to be a trade-off between performance and memory consumption, however (Hasan and Chang, 2003). In the course of memory allocation and deallocation by the program, the contiguous heap memory space gets fragmented because the deallocations are isolated and usually not in the same sequence as the allocations. Fragmentation, which is proliferation of small disjoint free memory blocks in the contiguous heap space leads to memory waste and increase in allocator’s memory consumption. Fig. 2 shows linked fragmented free memory blocks between allocated (shaded) blocks. A program request for six words of memory in this fragmented heap cannot be satisfied even though 10 words of memory are free. More memory will have to be obtained from the OS increasing allocator’s memory consumption due to fragmentation.

If memory consumption is to be minimized more time will be needed to better manage the limited heap to minimize the chances of fragmentation. Thus heap fragmentation is the chief problem (Beck, 1982, Denning, 1970, Wilson et al., 1995) that has to be solved to minimize memory consumption. If there were unlimited heap memory available allocation algorithms would be very fast but unfortunately memory is expensive and limited. The goal of minimizing memory consumption conflicts with the goal of high performance and a good allocator has to be able to balance the two interests (Wilson et al., 1995).

For any given allocation algorithm it is possible to find a program with memory allocation and free sequence that will ‘beat’ the allocator’s policy i.e. cause it to increase memory consumption or/and degrade performance (Garey et al., 1972, Wilson et al., 1995). For example, a program specially designed to deallocate heap memory blocks that are not contiguous to any existing free block will cause very high fragmentation. An allocator whose policy is to fight fragmentation by immediately coalescing contiguous free blocks will be rendered ineffective by such a program. In practice programs are written to solve specific problems and therefore this problem does not arise (Stephenson, 1983). Fortunately, real application programs show regular patterns in their memory allocation and deallocation behavior that can be exploited to create high-performance allocation algorithms (Wilson et al., 1995, Zorn and Grunwald, 1992a). Programs tend to allocate a large number of small sized heap memory objects, a very small number of objects of size greater than 1 KB, and most of the allocations are for a small number of sizes (Barrett and Zorn, 1993). These properties of programs suggest that an allocator that handles small and large objects differently might be more efficient.

Given that programs vary widely in their dynamic memory usage and different allocation policies work better for different programs, a single allocator or allocation policy that works best for all programs is not possible. Memory allocators provided with operating systems show less than optimal memory consumption or performance and can in fact be very inefficient in some programs. The most powerful allocator, therefore, must be flexible and tunable to the peculiar needs of each program. To be of practical use it must also be easily and quickly tunable. This is the rationale behind the tunable allocator proposed in this paper.

The rest of this paper discusses some related dynamic memory allocation algorithms in Section 2, describes the design of our tunable allocator in Section 3, describes the test programs, allocators, and inputs in Section 4, and finally, reports, compares and analyzes the results in Section 5. We show with results from seven well known programs that our allocator performs better than one of the fastest known allocators, the Doug Lea allocator. Section 6 summarizes the papers’s conclusion.

Section snippets

Related allocation algorithms

The basic allocator data structure is a linked list of free memory blocks in the heap space as shown Fig. 2 (Johnstone and Wilson, 1998). The linked list is called the free list as it contains available memory blocks. The available blocks comprise blocks that were allocated and later freed by the program, preallocated blocks, and split or coalesced blocks. In some allocators, other structures like cartesian trees, bitmaps, and multiple free lists are also used. A program request to the

Design and implementation of allocator

We aim for an allocator that gives high performance with low memory consumption. Since no allocation algorithm can be optimal for all programs we have made our allocator’s algorithm flexible and easily tunable so that it can be used in any program to deliver good performance and memory consumption. This will make the programmer’s job much easier than having to write a new custom allocator for every program that needs dynamic memory optimization. By providing a small set of tunable parameters

Test programs and allocators

We tested our allocator with seven different C/C++ programs shown in Table 2 that allocate large numbers of heap objects. The memory allocation and deallocation behavior of these applications was captured in a trace file with our tracing allocator and simulated with a driver; thus, most or all of the program execution time was devoted to allocating and freeing heap objects. The driver for each simulated application program is a C language source file automatically generated by a shell script

Performance and memory consumption

Performance was measured on a Pentium II 400 MHz machine with 128 MB of memory, running RedHat Linux 6.1. The paging or swapping activity was found to be almost negligible in the test runs but not studied separately. We measured the number of clock cycles converted to milliseconds to compare execution times of allocators. Lower execution time meant higher performance. Memory consumption was recorded by each allocator and was output after the completion of execution and recording of execution time

Conclusion and future work

Given that no allocator or allocation policy can be perfect for all programs, easily tunable allocators that can be tailored to individual program requirements are needed. In this paper, we have described the design and implementation of a tunable allocator (HC allocator) that can be used with generic settings, or tuned for least memory consumption or maximum performance. The small set of tunable parameters enables the allocator to utilize a range of allocation policies from simple segregated

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. 0296131 (ITR) 0219870 (ITR) and 0098235. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Yusuf Hasan hold B.S. (1990) and M.S. (1995) degrees in Mathematical Computer Science from University of Illinois at Chicago and is a Computer Science Ph.D. candidate in December 2004 at Illinois Institute of Technology, Chicago. He has to date published eight papers in various international conferences and journals in the area of dynamic memory management. His research interest include memory management, computer architecture, compilers, programming languages and operating systems.

He has 13 

References (26)

  • J.M. Chang et al.

    An efficient data structure for dynamic memory management

    The Journal of Systems and Software

    (2000)
  • J.M. Chang et al.

    A study of the allocation behavior of C++ programs

    The Journal of Systems and Software volume

    (2001)
  • Barrett, D., Zorn, B.G., 1993. Using lifetime predictors to improve memory allocation performance. In: Proceedings of...
  • C. Bays

    A comparison of next-fit, first-fit and best-fit

    Communications of the ACM

    (1977)
  • L.L. Beck

    A dynamic storage allocation technique based on memory residence time

    Communications of the ACM

    (1982)
  • Berger, E.D., Zorn, B.G., McKinley, K.S., 2001. Composing high-performance memory allocators, Programming Languages...
  • B. Calder et al.

    Quantifying behavioral differences between C and C++ programs

    Journal of Programming Languages

    (1994)
  • J.M. Chang et al.

    A high-performance memory allocator for object-oriented systems

    IEEE Transactions on Computers

    (1996)
  • P.J. Denning

    Virtual memory

    Computing Surveys

    (1970)
  • Fenton, J.S., Payne, D.W., 1974. Dynamic storage allocations of arbitrary sized segments. In: Proceedings of IFIPS, pp....
  • Garey, M.R., Graham, R.L., Ullman, J.D., 1972. Worst case analysis of memory allocation algorithms. In: Proceedings of...
  • Haggander, D., Lundberg, L., 1998. Optimizing memory management in a multithreaded application executing on a...
  • Hasan, Y., Chang, J.M., 2003. A hybrid allocator. In: Proceedings of the 3rd IEEE International Symposium on...
  • Cited by (9)

    • Group system: An efficient dynamic memory management scheme for real-time systems

      2020, Journal of Systems Architecture
      Citation Excerpt :

      Following the review by Wilson et al, additional enhanced DMM schemes such as Lea’s allocator [5], Hoard [6], and Jemalloc [7], were proposed. Due to the absence of a DMM scheme that shows optimum performance in all applications, application-customized DMM schemes [8–12] were investigated that optimize the DMM schemes for each individual application. Puaut, Heikkilä, Masmano, and Ogasawara presented studies on DMM in real-time systems [1,2,13–15].

    • The intelligent memory allocator selector

      2015, Computer Languages, Systems and Structures
      Citation Excerpt :

      Therefore, internal fragmentation is still a challenging problem in this area. For avoiding disadvantages of garbage collectors, memory allocators were developed for general purpose [8–12], multi-threaded processes [13], network applications [14], object oriented programming languages [15,16], video-on-demand servers [17], etc. Modern operating systems assign a default memory allocator to every process.

    • Simulation of high-performance memory allocators

      2011, Microprocessors and Microsystems
      Citation Excerpt :

      For this reason, many general-purpose memory allocators have been proposed to provide good runtime and low memory usage for a wide range of applications [1,2]. However, using specialized memory allocators that take advantage of application-specific behavior can dramatically improve application performance [3–5]. In fact, three out of the twelve integer benchmarks included in SPEC (parser, gcc, and vpr [6]) and several server applications, use one or more custom allocators [7].

    • Understanding and exploiting the internals of GPU resource allocation for critical systems

      2019, IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
    • A constant-time region-based memory allocator for embedded systems with unpredictable length array generation

      2010, Proceedings of the 15th IEEE International Conference on Emerging Technologies and Factory Automation, ETFA 2010
    View all citing articles on Scopus

    Yusuf Hasan hold B.S. (1990) and M.S. (1995) degrees in Mathematical Computer Science from University of Illinois at Chicago and is a Computer Science Ph.D. candidate in December 2004 at Illinois Institute of Technology, Chicago. He has to date published eight papers in various international conferences and journals in the area of dynamic memory management. His research interest include memory management, computer architecture, compilers, programming languages and operating systems.

    He has 13 years of industry experience in developing a variety of SW applications. He has worked in a number of companies including Motorola, MCI, Nextel, and VeriSign in diverse areas such as Motorola’s iDEN digital wireless system’s dispatch Group and Private call processing, Instant Messaging, Database programming, GSM, MAP, SS7, SIGTRAN, 3GPP, 3GPP2, and VoIP.

    Jien Morris Chang received the B.S. degree in electrical engineering from Tatung Institute of Technology, Taiwan, the M.S. degree in Electrical Engineering and the Ph.D. degree in Computer Engineering from North Carolina State University in 1983, 1986 and 1993, respectively.

    In 2001, Dr. Chang joined the Department of Electrical and Computer Engineering at Iowa State University where he is currently an Associate Professor. His industrial experience includes positions at Texas Instruments, Microelectronics Center of North Carolina, and AT&T Bell Laboratories. He was on the faculty of the Department of Electrical Engineering at Rochester Institute of Technology, and the Department of Computer Science at Illinois Institute of Technology (IIT). In 1999, he received the IIT University Excellence in Teaching Award.

    Dr. Chang’s research interests include: Wireless Networks, Object-oriented Systems, Computer Architecture, and VLSI design and testing. He has published more than 90 technical papers in these areas. His current research projects are supported by three NSF grants (including two ITR awards). He served as the Secretary and Treasurer in 1995 and Vendor Liaison Chair in 1996 for the International ASIC Conference. He was the Conference Chair for the 17th International Conference on Advanced Science and Technology (ICAST 2001), Chicago, Illinois, USA. He was on the program committee of ACM SIGPLAN 2004 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’04). He is on the editorial boards of Journal of Microprocessors and Microsystems and IEEE IT Professional. Dr. Chang is a senior member of IEEE.

    A preliminary version of this paper titled “A Hybrid Allocator” appeared in the Proceedings of 2003 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, 6–8 March 2003, pp. 214–221.

    View full text