skip to main content
10.1145/2896377.2901456acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Hash, Don't Cache (the Page Table)

Published: 14 June 2016 Publication History

Abstract

Radix page tables as implemented in the x86-64 architecture incur a penalty of four memory references for address translation upon each TLB miss. These 4 references become 24 in virtualized setups, accounting for 5%--90% of the runtime and thus motivating chip vendors to incorporate page walk caches (PWCs). Counterintuitively, an ISCA 2010 paper found that radix page tables with PWCs are superior to hashed page tables, yielding up to 5x fewer DRAM accesses per page walk. We challenge this finding and show that it is the result of comparing against a suboptimal hashed implementation---that of the Itanium architecture. We show that, when carefully optimized, hashed page tables in fact outperform existing PWC-aided x86-64 hardware, shortening benchmark runtimes by 1%--27% and 6%--32% in bare-metal and virtualized setups, without resorting to PWCs. We further show that hashed page tables are inherently more scalable than radix designs and are better suited to accommodate the ever increasing memory size; their downside is that they make it more challenging to support such features as superpages.

References

[1]
Transparent hugepage support. https://www.kernel.org/doc/Documentation/vm/transhuge.txt, 2016. Linux documentation page (Accessed: Apr 2016).
[2]
Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86 virtualization. In ACM Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 2--13, 2006. http://dx.doi.org/10.1145/1168857.1168860.
[3]
Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. Revisiting hardware-assisted page walks for virtualized systems. In ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 476--487, 2012. http://dx.doi.org/10.1145/2366231.2337214.
[4]
AMD, Inc. AMD-V Nested Paging, 2008. White Paper available at: http://developer.amd.com/wordpress/media/2012/10/NPT-WP-1 1-final-TM.pdf. (Accessed: Apr 2016).
[5]
AMD, Inc. AMD64 Architecture Programmer's Manual, Volume 2, 2013. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf. (Accessed: Apr 2016).
[6]
ARM Holdings. ARM Cortex-A53 MPCore Processor, Technical Reference Manual, 2014. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500d/DDI0500D_cortex_a53_r0p2_trm.pdf. (Accessed: Apr 2016).
[7]
Vlastimil Babka and Petr T\ruma. Investigating cache parameters of x86 family processors. In SPEC Benchmark Workshop on Comput. Performance Evaluation and Benchmarking, pages 77--96, 2009. http://dx.doi.org/10.1007/978--3--540--93799--9_5.
[8]
David A. Bader, Jonathan Berry, Simon Kahan, Richard Murphy, E. Jason Riedy, and Jeremiah Willcock. Graph 500 benchmark. http://www.graph500.org/specifications, 2011. Version 1.2 (Accessed: Apr 2016).
[9]
Thomas W. Barr, Alan L. Cox, and Scott Rixner. Translation caching: Skip\, don't walk (the page table). In ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 48--59, 2010. http://dx.doi.org/10.1145/1815961.1815970.
[10]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. Efficient virtual memory for big memory servers. In ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 237--248, 2013. http://dx.doi.org/10.1145/2485922.2485943.
[11]
Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. Accelerating two-dimensional page walks for virtualized systems. In ACM Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 26--35, 2008. http://dx.doi.org/10.1145/1346281.1346286.
[12]
Nikhil Bhatia. Performance evaluation of AMD RVI hardware assist. Technical report, VMware, Inc., 2009. http://www.cse.iitd.ernet.in/ sbansal/csl862-virt/2010/readings/RVI_performance.pdf. (Accessed: Apr 2016).
[13]
Nikhil Bhatia. Performance evaluation of Intel EPT hardware assist. Technical report, VMware, Inc., 2009. http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf. (Accessed: Apr 2016).
[14]
Abhishek Bhattacharjee. Large-reach memory management unit caches. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 383--394, 2013. http://dx.doi.org/10.1145/2540708.2540741.
[15]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. The MIT Press, 3rd edition, 2009.
[16]
Charlie Curtsinger and Emery D Berger. STABILIZER: Statistically sound performance evaluation. In ACM Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 219--228, 2013. http://dx.doi.org/10.1145/2451116.2451141.
[17]
Cort Dougan, Paul Mackerras, and Victor Yodaiken. Optimizing the idle task and other MMU tricks. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 229--237, 1999. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.1609.
[18]
Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. Efficient memory virtualization: Reducing dimensionality of nested page walks. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 178--189, 2014. http://dx.doi.org/10.1109/MICRO.2014.37.
[19]
Charles Gray, Matthew Chapman, Peter Chubb, David Mosberger-Tang, and Gernot Heiser. Itanium: A system implementor's tale. In USENIX Annual Technical Conference (ATC), pages 264--278, 2005. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3059.
[20]
John L. Henning. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News (CAN), 34(4):1--17, sep 2006. http://dx.doi.org/10.1145/1186736.1186737.
[21]
John L. Henning. SPEC CPU2006 memory footprint. ACM SIGARCH Computer Architecture News (CAN), 35(1):84--89, mar 2007. http://doi.acm.org/10.1145/1241601.1241618.
[22]
Jerry Huck and Jim Hays. Architectural support for translation table management in large address space machines. In ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 39--50, 1993. http://dx.doi.org/10.1145/165123.165128.
[23]
IBM Corporation. PowerPC Microprocessor Family: The Programming Environments Manual for 32 and 64-bit Microprocessors, 2005. https://wiki.alcf.anl.gov/images/f/fb/PowerPC_-_Assembly_-_IBM_Programming_Environment_2.3.pdf. (Accessed: Apr 2016).
[24]
IBM Corporation. AIX Version 6.1 Performance Management, first edition, 2007. http://ps-2.kev009.com/basil.holloway/ALL PDF/sc23525300.pdf. (Accessed: Apr 2016).
[25]
Intel Corporation. Intel Itanium Architecture Software Developer's Manual, Volume 2, 2010. http://tinyurl.com/itanium2. (Accessed: Apr 2016).
[26]
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A, 2015. http://tinyurl.com/intel-x86--3a. (Accessed: Apr 2016).
[27]
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3B, 2015. http://tinyurl.com/intel-x86--3b. (Accessed: Apr 2016).
[28]
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3C, 2015. http://tinyurl.com/intel-x86--3c. (Accessed: Apr 2016).
[29]
Bruce L. Jacob and Trevor N. Mudge. A look at several memory management units, TLB-refill mechanisms, and page table organizations. In ACM Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 295--306, 1998. http://dx.doi.org/10.1145/291069.291065.
[30]
David Koester and Bob Lucas. RandomAccess -- GUPS (Giga updates per second). http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/. (Accessed: Apr 2016).
[31]
Jun Min Lin, Yu Chen, Wenlong Li, Zhao Tang, and Aamer Jaleel. Memory characterization of SPEC CPU2006 benchmark suite. In Workshop for Computer Architecture Evaluation using Commercial Workloads (CAECW), 2008. http://www.jaleels.org/ajaleel/publications/SPECanalysis.pdf. (Accessed: Apr 2016).
[32]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In ACM International Conference on Programming Languages Design and Implementation (PLDI), pages 190--200, 2005. http://dx.doi.org/10.1145/1065010.1065034.
[33]
Piotr R Luszczek, David H Bailey, Jack J Dongarra, Jeremy Kepner, Robert F Lucas, Rolf Rabenseifner, and Daisuke Takahashi. The HPC challenge (HPCC) benchmark suite. In ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2006. http://dx.doi.org/10.1145/1188455.1188677. An SC tutorial, available via http://icl.cs.utk.edu/projectsfiles/hpcc/pubs/sc06_hpcc.pdf. (Accessed: Apr 2016).
[34]
Collin McCurdy, Alan L. Cox, and Jeffrey Vetter. Investigating the TLB behavior of high-end scientific applications on commodity microprocessors. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 95--104, 2008. http://dx.doi.org/10.1109/ISPASS.2008.4510742.
[35]
Richard C. Murphy, Kyle B. Wheeler, and Brian W. Barrett. Introducing the Graph 500. In Cray User Group Conference (CUG), 2010. https://cug.org/5-publications/proceedings_attendee_lists/CUG10CD/pages/1-program/final_program/CUG10_Proceedings/pages/authors/11--15Wednesday/14C-Murphy-paper.pdf. (Accessed: Apr 2016).
[36]
Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan Cox. Practical, transparent operating system support for superpages. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 89--104, 2002. http://dx.doi.org/10.1145/844128.844138.
[37]
C. Ray Peng, Thomas A. Petersen, and Ron Clark. The PowerPC architecture: 64-bit Power with 32-bit compatibility. In IEEE Computer Society International Computer Conference (COMPCON), pages 300--307, 1995. http://dx.doi.org/10.1109/CMPCON.1995.512400.
[38]
Cristan Szmajda and Gernot Heiser. Variable radix page table: A page table for modern architectures. In Advances in Computer Systems Architecture, Asia-Pacific Conference (ACSAC), pages 290--304, 2003. http://dx.doi.org/10.1007/978--3--540--39864--6_24.
[39]
M. Talluri, M. D. Hill, and Y. A. Khalidi. A new page table for 64-bit address spaces. In ACM Symposium on Operating Systems Principles (SOSP), pages 184--200, 1995. http://dx.doi.org/10.1145/224056.224071.
[40]
Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 84--97, 2003. http://dx.doi.org/10.1145/859618.859629.

Cited By

View all
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2024)FlexiMem: Flexible Shared Virtual Memory for PCIe-attached FPGAs2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00020(78-86)Online publication date: 2-Sep-2024
  • (2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '16: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science
June 2016
434 pages
ISBN:9781450342667
DOI:10.1145/2896377
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hashed page tables
  2. hypervisor
  3. page table design
  4. page walk caches
  5. radix page tables
  6. tlb
  7. virtual memory

Qualifiers

  • Research-article

Conference

SIGMETRICS '16
Sponsor:

Acceptance Rates

SIGMETRICS '16 Paper Acceptance Rate 28 of 208 submissions, 13%;
Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)209
  • Downloads (Last 6 weeks)17
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2024)FlexiMem: Flexible Shared Virtual Memory for PCIe-attached FPGAs2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00020(78-86)Online publication date: 2-Sep-2024
  • (2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
  • (2023)Mosaic Pages: Big TLB Reach with Small PagesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582021(433-448)Online publication date: 25-Mar-2023
  • (2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
  • (2023)Memory-Efficient Hashed Page Tables2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071061(1221-1235)Online publication date: Feb-2023
  • (2022)Clio: a hardware-software co-designed disaggregated memory systemProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507762(417-433)Online publication date: 28-Feb-2022
  • (2022)Parallel virtualized memory translation with nested elastic cuckoo page tablesProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507720(84-97)Online publication date: 28-Feb-2022
  • (2021)Morrigan: A Composite Instruction TLB PrefetcherMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480049(1138-1153)Online publication date: 18-Oct-2021
  • (2021)Radiant: efficient page table management for tiered memory systemsProceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management10.1145/3459898.3463907(66-79)Online publication date: 22-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media