Mimalloc: Free List Sharding in Action

Leijen, Daan; Zorn, Benjamin; de Moura, Leonardo

doi:10.1007/978-3-030-34175-6_13

Daan Leijen⁹,
Benjamin Zorn⁹ &
Leonardo de Moura⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11893))

Included in the following conference series:

Asian Symposium on Programming Languages and Systems

772 Accesses
20 Citations
3 Altmetric

Abstract

Modern memory allocators have to balance many simultaneous demands, including performance, security, the presence of concurrency, and application-specific demands depending on the context of their use. One increasing use-case for allocators is as back-end implementations of languages, such as Swift and Python, that use reference counting to automatically deallocate objects. We present mimalloc, a memory allocator that effectively balances these demands, shows significant performance advantages over existing allocators, and is tailored to support languages that rely on the memory allocator as a backend for reference counting. Mimalloc combines several innovations to achieve this result. First, it uses three page-local sharded free lists to increase locality, avoid contention, and support a highly-tuned allocate and free fast path. These free lists also support temporal cadence, which allows the allocator to predictably leave the fast path for regular maintenance tasks such as supporting deferred freeing, handling frees from non-local threads, etc. While influenced by the allocation workload of the reference-counted Lean and Koka programming language, we show that mimalloc has superior performance to modern commercial memory allocators, including tcmalloc and jemalloc, with speed improvements of 7% and 14%, respectively, on redis, and consistently out performs over a wide range of sequential and concurrent benchmarks. Allocators tailored to provide an efficient runtime for reference-counting languages reduce the implementation burden on developers and encourage the creation of innovative new language designs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Do not confuse the word page with OS pages. A mimalloc page is larger and corresponds more closely to a superblock [4] or subslab [24] in other allocators.

References

Aigner, M., Kirsch, C.M., Lippautz, M., Sokolova, A.: Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures. CoRR abs/1503.09006 (2015). http://arxiv.org/abs/1503.09006
Amazon EC2. Cloud Instance Types (2019). https://aws.amazon.com/ec2/instance-types/
Barnes, J., Hut, P.: A hierarchical O(N Log N) force-calculation algorithm. Nature 324, 446–449 (1986). https://doi.org/10.1038/324446a0
Article Google Scholar
Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, Cambridge, Massachusetts, USA, pp. 117–128. ACM (2000). https://doi.org/10.1145/378993.379232
Berger, E.D., Zorn, B.G.: DieHard: probabilistic memory safety for unsafe languages. In: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2006, Ottawa, Ontario, Canada, pp. 158–168 (2006). https://doi.org/10.1145/1133981.1134000
Berger, E.D., Zorn, B.G., McKinley, K.S.: Reconsidering Custom Memory Allocation, vol. 37, no. 11. ACM (2002)
Google Scholar
Crundal, T.: Reducing Active-False Sharing in TCMalloc (2016). http://courses.cecs.anu.edu.au/courses/CSPROJECTS/16S1/Reports/Timothy*Crundal*Report.pdf. CS16S1 project at the Australian National University
Evans, J.: Jemalloc. In: Proceedings of the 2006 BSDCan Conference, BSDCan 2006, Ottowa, CA, May 2006. http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf
Feng, Y., Berger, E.D.: A locality-improving dynamic memory allocator. In: Proceedings of the 2005 Workshop on Memory System Performance, Chicago, Illinois, USA, pp. 68–77, January 2005. https://doi.org/10.1145/1111583.1111594
Google. Tcmalloc (2014). https://github.com/gperftools/gperftools.
Grunwald, D., Zorn, B., Henderson, R.: Improving the cache locality of memory allocation. ACM SIGPLAN Not. 28(6), 177–186 (1993). https://doi.org/10.1145/173262.155107
Article Google Scholar
Hudson, R.L., Saha, B., Adl-Tabatabai, A.R., Hertzberg, B.C.: McRT-Malloc: a scalable transactional memory allocator. In: Proceedings of the 5th International Symposium on Memory Management, pp. 74–83. ACM (2006)
Google Scholar
Intel. Thread Building Blocks (TBB) (2017). https://www.threadingbuildingblocks.org/
Jansson, M.: Rpmalloc (2017). https://github.com/rampantpixels/rpmalloc
Kukanov, A., Voss, M.J.: The foundations for scalable multi-core software in Intel threading building blocks. Intel Technol. J. 11(4), 309–322 (2007)
Article Google Scholar
Kuszmaul, B.C.: SuperMalloc: a super fast multithreaded malloc for 64-bit machines. In: Proceedings of the 2015 International Symposium on Memory Management, ISMM 2015, Portland, OR, USA, pp. 41–55. ACM (2015). https://doi.org/10.1145/2754169.2754178
Larson, P.-Å., Krishnan, M.: Memory allocation for long-running server applications. In: Proceedings of the 1998 International Symposium on Memory Management, ISMM 1998, pp. 176–185 (1998)
Google Scholar
Leijen, D.: Koka: programming with row polymorphic effect types. In: MSFP 2014, 5th Workshop on Mathematically Structured Functional Programming (2014). https://doi.org/10.4204/EPTCS.153.8
Article MathSciNet Google Scholar
Leijen, D.: Type directed compilation of row-typed algebraic effects. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017), Paris, France, pp. 486–499, January 2017. https://doi.org/10.1145/3009837.3009872
Leijen, D.: Mimalloc Repository, June 2019. https://github.com/microsoft/mimalloc
Leijen, D.: Mimalloc Benchmark Repository, June 2019. https://github.com/daanx/mimalloc-bench
Lever, C., Boreham, D.: Malloc() performance in a multithreaded Linux environment. In: USENIX Annual Technical Conference, Freenix Session, San Diego, CA, June 2000. Malloc-test available from https://github.com/kuszmaul/SuperMalloc/tree/master/tests
Liétar, P., et al.: Snmalloc: a message passing allocator. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, pp. 122–135. ACM (2019)
Google Scholar
Liétar, P., et al.: Snmalloc: a message passing allocator. In: Proceedings of the 2019 International Symposium on Memory Management, ISMM 2019, Phoenix, AZ (2019). https://github.com/Microsoft/snmalloc
MicroQuill. SmartHeap (2006). http://www.microquill.com. sh6bench available at http://www.microquill.com/smartheap/shbench/bench.zip. sh8benc available at http://www.microquill.com/smartheap/SH8BENCH.zip
de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The lean theorem prover (system description). In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21401-6_26
Chapter Google Scholar
Novark, G., Berger, E.D.: DieHarder: securing the heap. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, Chicago, Illinois, USA, pp. 573–584 (2010). https://doi.org/10.1145/1866307.1866371
OLogN Technologies AG (ITHare.com). Testing Memory Allocators: ptmalloc2 vs Tcmalloc vs Hoard vs Jemalloc, While Trying to Simulate Real-World Loads, July 2018. http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/. Test available at https://github.com/node-dot-cpp/alloc-test
Sanner, M.F., et al.: Python: a programming language for software integration and development. J. Mol. Graph. Model. 17(1), 57–61 (1999)
Google Scholar
Schweizer, H., Besta, M., Hoefler, T.: Evaluating the cost of atomic operations on modern architectures. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 445–456, October 2015. https://doi.org/10.1109/PACT.2015.24
Sotirov, A.: Heap Feng Shui in JavaScript (2007). https://www.blackhat.com/presentations/bh-europe-07/FSotirov/Presentation/bh-eu-07-sotirov-apr19.pdf. Blackhat Europe
Ullrich, S., de Moura, L.: Counting immutable beans - reference counting optimized for purely functional programming. In: Proceedings of the 31st Symposium on Implementation and Application of Functional Languages (IFL 2019), Singapore, September 2019
Google Scholar
Weinstock, C.B., Wulf, W.A.: An efficient algorithm for heap storage allocation. ACM SIGPLAN Not. 23(10), 141–148 (1988)
Article Google Scholar
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)
Article Google Scholar

Download references

Acknowledgements

We would like to thank Matthew Parkison, and the other authors of snmalloc, for the valuable feedback, and encouragement to include the xmallocN benchmark.

Author information

Authors and Affiliations

Microsoft Research, Redmond, USA
Daan Leijen, Benjamin Zorn & Leonardo de Moura

Authors

Daan Leijen
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Zorn
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo de Moura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daan Leijen .

Editor information

Editors and Affiliations

University of Kaiserslautern, Kaiserslautern, Germany
Anthony Widjaja Lin

A Evaluation of Peak Working Memory

Figure 6 shows the peak working memory (RSS) relative to mimalloc. These figures correspond to the earlier performance Figs. 2 and 3 respectively. Note that the memory usage of xmallocN should be disregarded as the faster the benchmark runs, the more memory it uses. Also the cfrac, espresso, and cscratchN benchmarks use just little active memory and the differences in RSS are not very important here.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leijen, D., Zorn, B., de Moura, L. (2019). Mimalloc: Free List Sharding in Action. In: Lin, A. (eds) Programming Languages and Systems. APLAS 2019. Lecture Notes in Computer Science(), vol 11893. Springer, Cham. https://doi.org/10.1007/978-3-030-34175-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-34175-6_13
Published: 18 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34174-9
Online ISBN: 978-3-030-34175-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mimalloc: Free List Sharding in Action

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Evaluation of Peak Working Memory

A Evaluation of Peak Working Memory

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation