Abstract
Modern memory allocators have to balance many simultaneous demands, including performance, security, the presence of concurrency, and application-specific demands depending on the context of their use. One increasing use-case for allocators is as back-end implementations of languages, such as Swift and Python, that use reference counting to automatically deallocate objects. We present mimalloc, a memory allocator that effectively balances these demands, shows significant performance advantages over existing allocators, and is tailored to support languages that rely on the memory allocator as a backend for reference counting. Mimalloc combines several innovations to achieve this result. First, it uses three page-local sharded free lists to increase locality, avoid contention, and support a highly-tuned allocate and free fast path. These free lists also support temporal cadence, which allows the allocator to predictably leave the fast path for regular maintenance tasks such as supporting deferred freeing, handling frees from non-local threads, etc. While influenced by the allocation workload of the reference-counted Lean and Koka programming language, we show that mimalloc has superior performance to modern commercial memory allocators, including tcmalloc and jemalloc, with speed improvements of 7% and 14%, respectively, on redis, and consistently out performs over a wide range of sequential and concurrent benchmarks. Allocators tailored to provide an efficient runtime for reference-counting languages reduce the implementation burden on developers and encourage the creation of innovative new language designs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aigner, M., Kirsch, C.M., Lippautz, M., Sokolova, A.: Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures. CoRR abs/1503.09006 (2015). http://arxiv.org/abs/1503.09006
Amazon EC2. Cloud Instance Types (2019). https://aws.amazon.com/ec2/instance-types/
Barnes, J., Hut, P.: A hierarchical O(N Log N) force-calculation algorithm. Nature 324, 446–449 (1986). https://doi.org/10.1038/324446a0
Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, Cambridge, Massachusetts, USA, pp. 117–128. ACM (2000). https://doi.org/10.1145/378993.379232
Berger, E.D., Zorn, B.G.: DieHard: probabilistic memory safety for unsafe languages. In: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2006, Ottawa, Ontario, Canada, pp. 158–168 (2006). https://doi.org/10.1145/1133981.1134000
Berger, E.D., Zorn, B.G., McKinley, K.S.: Reconsidering Custom Memory Allocation, vol. 37, no. 11. ACM (2002)
Crundal, T.: Reducing Active-False Sharing in TCMalloc (2016). http://courses.cecs.anu.edu.au/courses/CSPROJECTS/16S1/Reports/Timothy*Crundal*Report.pdf. CS16S1 project at the Australian National University
Evans, J.: Jemalloc. In: Proceedings of the 2006 BSDCan Conference, BSDCan 2006, Ottowa, CA, May 2006. http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf
Feng, Y., Berger, E.D.: A locality-improving dynamic memory allocator. In: Proceedings of the 2005 Workshop on Memory System Performance, Chicago, Illinois, USA, pp. 68–77, January 2005. https://doi.org/10.1145/1111583.1111594
Google. Tcmalloc (2014). https://github.com/gperftools/gperftools.
Grunwald, D., Zorn, B., Henderson, R.: Improving the cache locality of memory allocation. ACM SIGPLAN Not. 28(6), 177–186 (1993). https://doi.org/10.1145/173262.155107
Hudson, R.L., Saha, B., Adl-Tabatabai, A.R., Hertzberg, B.C.: McRT-Malloc: a scalable transactional memory allocator. In: Proceedings of the 5th International Symposium on Memory Management, pp. 74–83. ACM (2006)
Intel. Thread Building Blocks (TBB) (2017). https://www.threadingbuildingblocks.org/
Jansson, M.: Rpmalloc (2017). https://github.com/rampantpixels/rpmalloc
Kukanov, A., Voss, M.J.: The foundations for scalable multi-core software in Intel threading building blocks. Intel Technol. J. 11(4), 309–322 (2007)
Kuszmaul, B.C.: SuperMalloc: a super fast multithreaded malloc for 64-bit machines. In: Proceedings of the 2015 International Symposium on Memory Management, ISMM 2015, Portland, OR, USA, pp. 41–55. ACM (2015). https://doi.org/10.1145/2754169.2754178
Larson, P.-Å., Krishnan, M.: Memory allocation for long-running server applications. In: Proceedings of the 1998 International Symposium on Memory Management, ISMM 1998, pp. 176–185 (1998)
Leijen, D.: Koka: programming with row polymorphic effect types. In: MSFP 2014, 5th Workshop on Mathematically Structured Functional Programming (2014). https://doi.org/10.4204/EPTCS.153.8
Leijen, D.: Type directed compilation of row-typed algebraic effects. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017), Paris, France, pp. 486–499, January 2017. https://doi.org/10.1145/3009837.3009872
Leijen, D.: Mimalloc Repository, June 2019. https://github.com/microsoft/mimalloc
Leijen, D.: Mimalloc Benchmark Repository, June 2019. https://github.com/daanx/mimalloc-bench
Lever, C., Boreham, D.: Malloc() performance in a multithreaded Linux environment. In: USENIX Annual Technical Conference, Freenix Session, San Diego, CA, June 2000. Malloc-test available from https://github.com/kuszmaul/SuperMalloc/tree/master/tests
Liétar, P., et al.: Snmalloc: a message passing allocator. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, pp. 122–135. ACM (2019)
Liétar, P., et al.: Snmalloc: a message passing allocator. In: Proceedings of the 2019 International Symposium on Memory Management, ISMM 2019, Phoenix, AZ (2019). https://github.com/Microsoft/snmalloc
MicroQuill. SmartHeap (2006). http://www.microquill.com. sh6bench available at http://www.microquill.com/smartheap/shbench/bench.zip. sh8benc available at http://www.microquill.com/smartheap/SH8BENCH.zip
de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The lean theorem prover (system description). In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21401-6_26
Novark, G., Berger, E.D.: DieHarder: securing the heap. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, Chicago, Illinois, USA, pp. 573–584 (2010). https://doi.org/10.1145/1866307.1866371
OLogN Technologies AG (ITHare.com). Testing Memory Allocators: ptmalloc2 vs Tcmalloc vs Hoard vs Jemalloc, While Trying to Simulate Real-World Loads, July 2018. http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/. Test available at https://github.com/node-dot-cpp/alloc-test
Sanner, M.F., et al.: Python: a programming language for software integration and development. J. Mol. Graph. Model. 17(1), 57–61 (1999)
Schweizer, H., Besta, M., Hoefler, T.: Evaluating the cost of atomic operations on modern architectures. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 445–456, October 2015. https://doi.org/10.1109/PACT.2015.24
Sotirov, A.: Heap Feng Shui in JavaScript (2007). https://www.blackhat.com/presentations/bh-europe-07/FSotirov/Presentation/bh-eu-07-sotirov-apr19.pdf. Blackhat Europe
Ullrich, S., de Moura, L.: Counting immutable beans - reference counting optimized for purely functional programming. In: Proceedings of the 31st Symposium on Implementation and Application of Functional Languages (IFL 2019), Singapore, September 2019
Weinstock, C.B., Wulf, W.A.: An efficient algorithm for heap storage allocation. ACM SIGPLAN Not. 23(10), 141–148 (1988)
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)
Acknowledgements
We would like to thank Matthew Parkison, and the other authors of snmalloc, for the valuable feedback, and encouragement to include the xmallocN benchmark.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Evaluation of Peak Working Memory
A Evaluation of Peak Working Memory
Figure 6 shows the peak working memory (RSS) relative to mimalloc. These figures correspond to the earlier performance Figs. 2 and 3 respectively. Note that the memory usage of xmallocN should be disregarded as the faster the benchmark runs, the more memory it uses. Also the cfrac, espresso, and cscratchN benchmarks use just little active memory and the differences in RSS are not very important here.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Leijen, D., Zorn, B., de Moura, L. (2019). Mimalloc: Free List Sharding in Action. In: Lin, A. (eds) Programming Languages and Systems. APLAS 2019. Lecture Notes in Computer Science(), vol 11893. Springer, Cham. https://doi.org/10.1007/978-3-030-34175-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-34175-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34174-9
Online ISBN: 978-3-030-34175-6
eBook Packages: Computer ScienceComputer Science (R0)