research-article

HawkEye: Efficient Fine-grained OS Support for Huge Pages

Authors:
Ashish Panwar

Indian Institute of Science, Bangalore, India

Indian Institute of Science, Bangalore, India
View Profile

,
Sorav Bansal

Indian Institute of Technology Delhi, New Delhi, India

Indian Institute of Technology Delhi, New Delhi, India
View Profile

,
K. Gopinath

Indian Institute of Science, Bangalore, India

Indian Institute of Science, Bangalore, India
View Profile

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating SystemsApril 2019Pages 347–360https://doi.org/10.1145/3297858.3304064

Published:04 April 2019Publication History

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 347–360

ABSTRACT

Effective huge page management in operating systems is necessary for mitigation of address translation overheads. However, this continues to remain a difficult area in OS design. Recent work on Ingens uncovered some interesting pitfalls in current huge page management strategies. Using both page access patterns discovered by the OS kernel and fine-grained data from hardware performance counters, we expose problematic aspects of current huge page management strategies. In our system, called HawkEye/Linux, we demonstrate alternate ways to address issues related to performance, page fault latency and memory bloat; the primary ideas behind HawkEye management algorithms are async page pre-zeroing, de-duplication of zero-filled pages, fine-grained page access tracking and measurement of address translation overheads through hardware performance counters. Our evaluation shows that HawkEye is more performant, robust and better-suited to handle diverse workloads when compared with current state-of-the-art systems.

References

Mohammad Agbarya, Idan Yaniv, and Dan Tsafrir. 2018. Memomania: From Huge to Huge-Huge Pages. In Proceedings of the 11th ACM International Systems and Storage Conference (SYSTOR '18). ACM, New York, NY, USA, 112--112. Google ScholarDigital Library
Hanna Alam, Tianhao Zhang, Mattan Erez, and Yoav Etsion. 2017. Do-It-Yourself Virtual Memory Translation. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 457--468. Google ScholarDigital Library
Nadav Amit, Dan Tsafrir, and Assaf Schuster. 2014. VSwapper: A Memory Swapper for Virtualized Environments. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 349--366. Google ScholarDigital Library
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. 1991. The NAS Parallel Benchmarks;Summary and Preliminary Results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91). ACM, New York, NY, USA, 158--165. Google ScholarDigital Library
Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2010. Translation Caching: Skip, Don'T Walk (the Page Table). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 48--59. Google ScholarDigital Library
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 237--248. Google ScholarDigital Library
Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR, Vol. abs/1508.03619 (2015). arxiv: 1508.03619 http://arxiv.org/abs/1508.03619Google Scholar
Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. 2008. Accelerating Two-dimensional Page Walks for Virtualized Systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, USA, 26--35. Google ScholarDigital Library
Abhishek Bhattacharjee. 2013. Large-reach Memory Management Unit Caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 383--394. Google ScholarDigital Library
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08). ACM, New York, NY, USA, 72--81. Google ScholarDigital Library
Daniel Bovet and Marco Cesati. 2005. Understanding The Linux Kernel .Oreilly & Associates Inc. Google ScholarDigital Library
Josiah L. Carlson. 2013. Redis in Action .Manning Publications Co., Greenwich, CT, USA. Google ScholarDigital Library
Jonathan Corbet. 2010. Memory compaction. https://lwn.net/Articles/368869/.Google Scholar
Guilherme Cox and Abhishek Bhattacharjee. 2017. Efficient Address Translation for Architectures with Multiple Page Sizes. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 435--448. Google ScholarDigital Library
Cort Dougan, Paul Mackerras, and Victor Yodaiken. 1999. Optimizing the Idle Task and Other MMU Tricks. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI '99). USENIX Association, Berkeley, CA, USA, 229--237. http://dl.acm.org/citation.cfm?id=296806.296833 Google ScholarDigital Library
Niall Douglas. 2011. User Mode Memory Page Allocation: A Silver Bullet For Memory Allocation? CoRR, Vol. abs/1105.1811 (2011). arxiv: 1105.1811 http://arxiv.org/abs/1105.1811Google Scholar
Ulrich Drepper. 2007. What Every Programmer Should Know About Memory.Google Scholar
Yu Du, Miao Zhou, Bruce R. Childers, Daniel Mossé, and Rami G. Melhem. 2015. Supporting superpages in non-contiguous physical memory. 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (2015), 223--234.Google ScholarCross Ref
M. Franklin, D. Yeung, n. Xue Wu, A. Jaleel, K. Albayraktaroglu, B. Jacob, and n. Chau-Wen Tseng. 2005. BioBench: A Benchmark Suite of Bioinformatics Applications. In IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.(ISPASS), Vol. 00. 2--9. Google ScholarDigital Library
Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2014. Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 178--189. Google ScholarDigital Library
Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large Pages May Be Harmful on NUMA Systems. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 231--242. http://dl.acm.org/citation.cfm?id=2643634.2643659 Google ScholarDigital Library
Mel Gorman and Patrick Healy. 2008. Supporting Superpage Allocation Without Additional Hardware Support. In Proceedings of the 7th International Symposium on Memory Management (ISMM '08). ACM, New York, NY, USA, 41--50. Google ScholarDigital Library
Mel Gorman and Patrick Healy. 2012. Performance Characteristics of Explicit Superpage Support. In Proceedings of the 2010 International Conference on Computer Architecture (ISCA'10). Springer-Verlag, Berlin, Heidelberg, 293--310. Google ScholarDigital Library
Mel Gorman and Andy Whitcroft. 2006. The What, The Why and the Where To of Anti-Fragmentation. In Linux Symposium. 141.Google Scholar
Fei Guo, Seongbeom Kim, Yury Baskakov, and Ishan Banerjee. 2015. Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15). ACM, New York, NY, USA, 39--51. Google ScholarDigital Library
Fan Guo, Yongkun Li, Yinlong Xu, Song Jiang, and John C. S. Lui. 2017. SmartMD: A High Performance Deduplication Engine with Mixed Pages. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 733--744. http://dl.acm.org/citation.cfm?id=3154690.3154759 Google ScholarDigital Library
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News, Vol. 34, 4 (Sept. 2006), 1--17. Google ScholarDigital Library
Vasileios Karakostas, Osman S. Unsal, Mario Nemirovsky, Adrián Cristal, and Michael M. Swift. 2014. Performance analysis of the memory management unit under scale-out workloads. 2014 IEEE International Symposium on Workload Characterization (IISWC) (2014), 1--12.Google ScholarCross Ref
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 705--721. http://dl.acm.org/citation.cfm?id=3026877.3026931 Google ScholarDigital Library
Joshua Magee and Apan Qasem. 2009. A Case for Compiler-driven Superpage Allocation. In Proceedings of the 47th Annual Southeast Regional Conference (ACM-SE 47). ACM, New York, NY, USA, Article 82, bibinfonumpages4 pages. Google ScholarDigital Library
Juan Navarro, Sitararn Iyer, Peter Druschel, and Alan Cox. 2002. Practical, Transparent Operating System Support for Superpages. SIGOPS Oper. Syst. Rev., Vol. 36, SI (Dec. 2002), 89--104. Google ScholarDigital Library
Ashish Panwar, Naman Patel, and K. Gopinath. 2016. A Case for Protecting Huge Pages from the Kernel. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys '16). ACM, New York, NY, USA, Article 15, bibinfonumpages8 pages. Google ScholarDigital Library
Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 679--692. Google ScholarDigital Library
Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) (2014), 558--567.Google ScholarCross Ref
Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 258--269. Google ScholarDigital Library
Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. 2015. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways?. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 1--12. Google ScholarDigital Library
Jee Ho Ryoo, Nagendra Gulur, Shuang Song, and Lizy K. John. 2017. Rethinking TLB Designs in Virtualized Environments: A Very Large Part-of-Memory TLB. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 469--480. Google ScholarDigital Library
Indira Subramanian, Clifford Mather, Kurt Peterson, and Balakrishna Raghunath. 1998. Implementation of Multiple Pagesize Support in HP-UX.. In USENIX Annual Technical Conference. 105--119. Google ScholarDigital Library
John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. {n. d.}. XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis. In PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future. Kyoto.Google Scholar
Koji Ueno and Toyotaro Suzumura. 2012. Highly Scalable Graph Search for the Graph500 Benchmark. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC '12). ACM, New York, NY, USA, 149--160. Google ScholarDigital Library
Carl A. Waldspurger. 2002. Memory Resource Management in VMware ESX Server. SIGOPS Oper. Syst. Rev., Vol. 36, SI (Dec. 2002), 181--194. Google ScholarDigital Library
Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards Practical Page Coloring-based Multicore Cache Management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys '09). ACM, New York, NY, USA, 89--102. Google ScholarDigital Library

Index Terms

HawkEye: Efficient Fine-grained OS Support for Huge Pages
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory

Recommendations

Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Read More
HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking ...
Read More
Adaptive huge-page subrelease for non-moving memory allocators in warehouse-scale computers
ISMM 2021: Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management

Modern C++ server workloads rely on 2 MB huge pages to improve memory system performance via higher TLB hit rates. Huge pages have traditionally been supported at the kernel level, but recent work has shown that user-level, huge page-aware memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
April 2019
1126 pages
ISBN:9781450362405
DOI:10.1145/3297858
General Chairs:
Iris Bahar
Brown University
,
Maurice Herlihy
Brown University
,
Program Chairs:
Emmett Witchel
University of Texas, Austin
,
Alvin Lebeck
Duke University
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 April 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hardware counters
huge pages
virtual memory
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '19 Paper Acceptance Rate74of351submissions,21%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 1,468
  Total Downloads
- Downloads (Last 12 months)262
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HawkEye: Efficient Fine-grained OS Support for Huge Pages

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Redesign the Memory Allocator for Non-Volatile Main Memory

HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter

Adaptive huge-page subrelease for non-moving memory allocators in warehouse-scale computers