Skip to main content

Mimalloc: Free List Sharding in Action

  • Conference paper
  • First Online:
Programming Languages and Systems (APLAS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11893))

Included in the following conference series:

Abstract

Modern memory allocators have to balance many simultaneous demands, including performance, security, the presence of concurrency, and application-specific demands depending on the context of their use. One increasing use-case for allocators is as back-end implementations of languages, such as Swift and Python, that use reference counting to automatically deallocate objects. We present mimalloc, a memory allocator that effectively balances these demands, shows significant performance advantages over existing allocators, and is tailored to support languages that rely on the memory allocator as a backend for reference counting. Mimalloc combines several innovations to achieve this result. First, it uses three page-local sharded free lists to increase locality, avoid contention, and support a highly-tuned allocate and free fast path. These free lists also support temporal cadence, which allows the allocator to predictably leave the fast path for regular maintenance tasks such as supporting deferred freeing, handling frees from non-local threads, etc. While influenced by the allocation workload of the reference-counted Lean and Koka programming language, we show that mimalloc has superior performance to modern commercial memory allocators, including tcmalloc and jemalloc, with speed improvements of 7% and 14%, respectively, on redis, and consistently out performs over a wide range of sequential and concurrent benchmarks. Allocators tailored to provide an efficient runtime for reference-counting languages reduce the implementation burden on developers and encourage the creation of innovative new language designs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Do not confuse the word page with OS pages. A mimalloc page is larger and corresponds more closely to a superblock [4] or subslab [24] in other allocators.

References

  1. Aigner, M., Kirsch, C.M., Lippautz, M., Sokolova, A.: Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures. CoRR abs/1503.09006 (2015). http://arxiv.org/abs/1503.09006

  2. Amazon EC2. Cloud Instance Types (2019). https://aws.amazon.com/ec2/instance-types/

  3. Barnes, J., Hut, P.: A hierarchical O(N Log N) force-calculation algorithm. Nature 324, 446–449 (1986). https://doi.org/10.1038/324446a0

    Article  Google Scholar 

  4. Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, Cambridge, Massachusetts, USA, pp. 117–128. ACM (2000). https://doi.org/10.1145/378993.379232

  5. Berger, E.D., Zorn, B.G.: DieHard: probabilistic memory safety for unsafe languages. In: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2006, Ottawa, Ontario, Canada, pp. 158–168 (2006). https://doi.org/10.1145/1133981.1134000

  6. Berger, E.D., Zorn, B.G., McKinley, K.S.: Reconsidering Custom Memory Allocation, vol. 37, no. 11. ACM (2002)

    Google Scholar 

  7. Crundal, T.: Reducing Active-False Sharing in TCMalloc (2016). http://courses.cecs.anu.edu.au/courses/CSPROJECTS/16S1/Reports/Timothy*Crundal*Report.pdf. CS16S1 project at the Australian National University

  8. Evans, J.: Jemalloc. In: Proceedings of the 2006 BSDCan Conference, BSDCan 2006, Ottowa, CA, May 2006. http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf

  9. Feng, Y., Berger, E.D.: A locality-improving dynamic memory allocator. In: Proceedings of the 2005 Workshop on Memory System Performance, Chicago, Illinois, USA, pp. 68–77, January 2005. https://doi.org/10.1145/1111583.1111594

  10. Google. Tcmalloc (2014). https://github.com/gperftools/gperftools.

  11. Grunwald, D., Zorn, B., Henderson, R.: Improving the cache locality of memory allocation. ACM SIGPLAN Not. 28(6), 177–186 (1993). https://doi.org/10.1145/173262.155107

    Article  Google Scholar 

  12. Hudson, R.L., Saha, B., Adl-Tabatabai, A.R., Hertzberg, B.C.: McRT-Malloc: a scalable transactional memory allocator. In: Proceedings of the 5th International Symposium on Memory Management, pp. 74–83. ACM (2006)

    Google Scholar 

  13. Intel. Thread Building Blocks (TBB) (2017). https://www.threadingbuildingblocks.org/

  14. Jansson, M.: Rpmalloc (2017). https://github.com/rampantpixels/rpmalloc

  15. Kukanov, A., Voss, M.J.: The foundations for scalable multi-core software in Intel threading building blocks. Intel Technol. J. 11(4), 309–322 (2007)

    Article  Google Scholar 

  16. Kuszmaul, B.C.: SuperMalloc: a super fast multithreaded malloc for 64-bit machines. In: Proceedings of the 2015 International Symposium on Memory Management, ISMM 2015, Portland, OR, USA, pp. 41–55. ACM (2015). https://doi.org/10.1145/2754169.2754178

  17. Larson, P.-Å., Krishnan, M.: Memory allocation for long-running server applications. In: Proceedings of the 1998 International Symposium on Memory Management, ISMM 1998, pp. 176–185 (1998)

    Google Scholar 

  18. Leijen, D.: Koka: programming with row polymorphic effect types. In: MSFP 2014, 5th Workshop on Mathematically Structured Functional Programming (2014). https://doi.org/10.4204/EPTCS.153.8

    Article  MathSciNet  Google Scholar 

  19. Leijen, D.: Type directed compilation of row-typed algebraic effects. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017), Paris, France, pp. 486–499, January 2017. https://doi.org/10.1145/3009837.3009872

  20. Leijen, D.: Mimalloc Repository, June 2019. https://github.com/microsoft/mimalloc

  21. Leijen, D.: Mimalloc Benchmark Repository, June 2019. https://github.com/daanx/mimalloc-bench

  22. Lever, C., Boreham, D.: Malloc() performance in a multithreaded Linux environment. In: USENIX Annual Technical Conference, Freenix Session, San Diego, CA, June 2000. Malloc-test available from https://github.com/kuszmaul/SuperMalloc/tree/master/tests

  23. Liétar, P., et al.: Snmalloc: a message passing allocator. In: Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, pp. 122–135. ACM (2019)

    Google Scholar 

  24. Liétar, P., et al.: Snmalloc: a message passing allocator. In: Proceedings of the 2019 International Symposium on Memory Management, ISMM 2019, Phoenix, AZ (2019). https://github.com/Microsoft/snmalloc

  25. MicroQuill. SmartHeap (2006). http://www.microquill.com. sh6bench available at http://www.microquill.com/smartheap/shbench/bench.zip. sh8benc available at http://www.microquill.com/smartheap/SH8BENCH.zip

  26. de Moura, L., Kong, S., Avigad, J., van Doorn, F., von Raumer, J.: The lean theorem prover (system description). In: Felty, A.P., Middeldorp, A. (eds.) CADE 2015. LNCS (LNAI), vol. 9195, pp. 378–388. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21401-6_26

    Chapter  Google Scholar 

  27. Novark, G., Berger, E.D.: DieHarder: securing the heap. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, Chicago, Illinois, USA, pp. 573–584 (2010). https://doi.org/10.1145/1866307.1866371

  28. OLogN Technologies AG (ITHare.com). Testing Memory Allocators: ptmalloc2 vs Tcmalloc vs Hoard vs Jemalloc, While Trying to Simulate Real-World Loads, July 2018. http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/. Test available at https://github.com/node-dot-cpp/alloc-test

  29. Sanner, M.F., et al.: Python: a programming language for software integration and development. J. Mol. Graph. Model. 17(1), 57–61 (1999)

    Google Scholar 

  30. Schweizer, H., Besta, M., Hoefler, T.: Evaluating the cost of atomic operations on modern architectures. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 445–456, October 2015. https://doi.org/10.1109/PACT.2015.24

  31. Sotirov, A.: Heap Feng Shui in JavaScript (2007). https://www.blackhat.com/presentations/bh-europe-07/FSotirov/Presentation/bh-eu-07-sotirov-apr19.pdf. Blackhat Europe

  32. Ullrich, S., de Moura, L.: Counting immutable beans - reference counting optimized for purely functional programming. In: Proceedings of the 31st Symposium on Implementation and Application of Functional Languages (IFL 2019), Singapore, September 2019

    Google Scholar 

  33. Weinstock, C.B., Wulf, W.A.: An efficient algorithm for heap storage allocation. ACM SIGPLAN Not. 23(10), 141–148 (1988)

    Article  Google Scholar 

  34. Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Matthew Parkison, and the other authors of snmalloc, for the valuable feedback, and encouragement to include the xmallocN benchmark.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daan Leijen .

Editor information

Editors and Affiliations

A Evaluation of Peak Working Memory

A Evaluation of Peak Working Memory

Figure 6 shows the peak working memory (RSS) relative to mimalloc. These figures correspond to the earlier performance Figs. 2 and 3 respectively. Note that the memory usage of xmallocN should be disregarded as the faster the benchmark runs, the more memory it uses. Also the cfrac, espresso, and cscratchN benchmarks use just little active memory and the differences in RSS are not very important here.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leijen, D., Zorn, B., de Moura, L. (2019). Mimalloc: Free List Sharding in Action. In: Lin, A. (eds) Programming Languages and Systems. APLAS 2019. Lecture Notes in Computer Science(), vol 11893. Springer, Cham. https://doi.org/10.1007/978-3-030-34175-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34175-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34174-9

  • Online ISBN: 978-3-030-34175-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics