skip to main content
10.1145/3297858.3304064acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

HawkEye: Efficient Fine-grained OS Support for Huge Pages

Published:04 April 2019Publication History

ABSTRACT

Effective huge page management in operating systems is necessary for mitigation of address translation overheads. However, this continues to remain a difficult area in OS design. Recent work on Ingens uncovered some interesting pitfalls in current huge page management strategies. Using both page access patterns discovered by the OS kernel and fine-grained data from hardware performance counters, we expose problematic aspects of current huge page management strategies. In our system, called HawkEye/Linux, we demonstrate alternate ways to address issues related to performance, page fault latency and memory bloat; the primary ideas behind HawkEye management algorithms are async page pre-zeroing, de-duplication of zero-filled pages, fine-grained page access tracking and measurement of address translation overheads through hardware performance counters. Our evaluation shows that HawkEye is more performant, robust and better-suited to handle diverse workloads when compared with current state-of-the-art systems.

References

  1. Mohammad Agbarya, Idan Yaniv, and Dan Tsafrir. 2018. Memomania: From Huge to Huge-Huge Pages. In Proceedings of the 11th ACM International Systems and Storage Conference (SYSTOR '18). ACM, New York, NY, USA, 112--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Hanna Alam, Tianhao Zhang, Mattan Erez, and Yoav Etsion. 2017. Do-It-Yourself Virtual Memory Translation. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 457--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nadav Amit, Dan Tsafrir, and Assaf Schuster. 2014. VSwapper: A Memory Swapper for Virtualized Environments. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 349--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks;Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2010. Translation Caching: Skip, Don'T Walk (the Page Table). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 48--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 237--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR, Vol. abs/1508.03619 (2015). arxiv: 1508.03619 http://arxiv.org/abs/1508.03619Google ScholarGoogle Scholar
  8. Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. 2008. Accelerating Two-dimensional Page Walks for Virtualized Systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, USA, 26--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Abhishek Bhattacharjee. 2013. Large-reach Memory Management Unit Caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 383--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08). ACM, New York, NY, USA, 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniel Bovet and Marco Cesati. 2005. Understanding The Linux Kernel .Oreilly & Associates Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Josiah L. Carlson. 2013. Redis in Action .Manning Publications Co., Greenwich, CT, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jonathan Corbet. 2010. Memory compaction. https://lwn.net/Articles/368869/.Google ScholarGoogle Scholar
  14. Guilherme Cox and Abhishek Bhattacharjee. 2017. Efficient Address Translation for Architectures with Multiple Page Sizes. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 435--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cort Dougan, Paul Mackerras, and Victor Yodaiken. 1999. Optimizing the Idle Task and Other MMU Tricks. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI '99). USENIX Association, Berkeley, CA, USA, 229--237. http://dl.acm.org/citation.cfm?id=296806.296833 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Niall Douglas. 2011. User Mode Memory Page Allocation: A Silver Bullet For Memory Allocation? CoRR, Vol. abs/1105.1811 (2011). arxiv: 1105.1811 http://arxiv.org/abs/1105.1811Google ScholarGoogle Scholar
  17. Ulrich Drepper. 2007. What Every Programmer Should Know About Memory.Google ScholarGoogle Scholar
  18. Yu Du, Miao Zhou, Bruce R. Childers, Daniel Mossé, and Rami G. Melhem. 2015. Supporting superpages in non-contiguous physical memory. 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (2015), 223--234.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Franklin, D. Yeung, n. Xue Wu, A. Jaleel, K. Albayraktaroglu, B. Jacob, and n. Chau-Wen Tseng. 2005. BioBench: A Benchmark Suite of Bioinformatics Applications. In IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.(ISPASS), Vol. 00. 2--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2014. Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 178--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large Pages May Be Harmful on NUMA Systems. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 231--242. http://dl.acm.org/citation.cfm?id=2643634.2643659 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mel Gorman and Patrick Healy. 2008. Supporting Superpage Allocation Without Additional Hardware Support. In Proceedings of the 7th International Symposium on Memory Management (ISMM '08). ACM, New York, NY, USA, 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mel Gorman and Patrick Healy. 2012. Performance Characteristics of Explicit Superpage Support. In Proceedings of the 2010 International Conference on Computer Architecture (ISCA'10). Springer-Verlag, Berlin, Heidelberg, 293--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mel Gorman and Andy Whitcroft. 2006. The What, The Why and the Where To of Anti-Fragmentation. In Linux Symposium. 141.Google ScholarGoogle Scholar
  25. Fei Guo, Seongbeom Kim, Yury Baskakov, and Ishan Banerjee. 2015. Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15). ACM, New York, NY, USA, 39--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fan Guo, Yongkun Li, Yinlong Xu, Song Jiang, and John C. S. Lui. 2017. SmartMD: A High Performance Deduplication Engine with Mixed Pages. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 733--744. http://dl.acm.org/citation.cfm?id=3154690.3154759 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News, Vol. 34, 4 (Sept. 2006), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Vasileios Karakostas, Osman S. Unsal, Mario Nemirovsky, Adrián Cristal, and Michael M. Swift. 2014. Performance analysis of the memory management unit under scale-out workloads. 2014 IEEE International Symposium on Workload Characterization (IISWC) (2014), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  29. Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 705--721. http://dl.acm.org/citation.cfm?id=3026877.3026931 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Joshua Magee and Apan Qasem. 2009. A Case for Compiler-driven Superpage Allocation. In Proceedings of the 47th Annual Southeast Regional Conference (ACM-SE 47). ACM, New York, NY, USA, Article 82, bibinfonumpages4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Juan Navarro, Sitararn Iyer, Peter Druschel, and Alan Cox. 2002. Practical, Transparent Operating System Support for Superpages. SIGOPS Oper. Syst. Rev., Vol. 36, SI (Dec. 2002), 89--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ashish Panwar, Naman Patel, and K. Gopinath. 2016. A Case for Protecting Huge Pages from the Kernel. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys '16). ACM, New York, NY, USA, Article 15, bibinfonumpages8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 679--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (2014), 558--567.Google ScholarGoogle ScholarCross RefCross Ref
  35. Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 258--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. 2015. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways?. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jee Ho Ryoo, Nagendra Gulur, Shuang Song, and Lizy K. John. 2017. Rethinking TLB Designs in Virtualized Environments: A Very Large Part-of-Memory TLB. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Indira Subramanian, Clifford Mather, Kurt Peterson, and Balakrishna Raghunath. 1998. Implementation of Multiple Pagesize Support in HP-UX.. In USENIX Annual Technical Conference. 105--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. {n. d.}. XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis. In PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future. Kyoto.Google ScholarGoogle Scholar
  40. Koji Ueno and Toyotaro Suzumura. 2012. Highly Scalable Graph Search for the Graph500 Benchmark. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC '12). ACM, New York, NY, USA, 149--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Carl A. Waldspurger. 2002. Memory Resource Management in VMware ESX Server. SIGOPS Oper. Syst. Rev., Vol. 36, SI (Dec. 2002), 181--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards Practical Page Coloring-based Multicore Cache Management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys '09). ACM, New York, NY, USA, 89--102. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HawkEye: Efficient Fine-grained OS Support for Huge Pages

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2019
    1126 pages
    ISBN:9781450362405
    DOI:10.1145/3297858

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 4 April 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    ASPLOS '19 Paper Acceptance Rate74of351submissions,21%Overall Acceptance Rate535of2,713submissions,20%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader