skip to main content
10.1145/3613424.3614276acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

Published: 08 December 2023 Publication History

Abstract

Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) large software-managed TLBs. Unfortunately, both solutions have significant drawbacks: increased access latency, power and area (for hardware TLBs), and costly memory accesses, the need for large contiguous memory blocks, and complex OS modifications (for software-managed TLBs).
We present Victima, a new software-transparent mechanism that drastically increases the translation reach of the processor by leveraging the underutilized resources of the cache hierarchy. The key idea of Victima is to repurpose L2 cache blocks to store clusters of TLB entries, thereby providing an additional low-latency and high-capacity component that backs up the last-level TLB and thus reduces PTWs. Victima has two main components. First, a PTW cost predictor (PTW-CP) identifies costly-to-translate addresses based on the frequency and cost of the PTWs they lead to. Leveraging the PTW-CP, Victima uses the valuable cache space only for TLB entries that correspond to costly-to-translate pages, reducing the impact on cached application data. Second, a TLB-aware cache replacement policy prioritizes keeping TLB entries in the cache hierarchy by considering (i) the translation pressure (e.g., last-level TLB miss rate) and (ii) the reuse characteristics of the TLB entries.
Our evaluation results show that in native (virtualized) execution environments Victima improves average end-to-end application performance by 7.4% (28.7%) over the baseline four-level radix-tree-based page table design and by 6.2% (20.1%) over a state-of-the-art software-managed TLB, across 11 diverse data-intensive workloads. Victima delivers similar performance as a system that employs an optimistic 128K-entry L2 TLB, while avoiding the associated area and power overheads. Victima (i) is effective in both native and virtualized environments, (ii) is completely transparent to application and system software, (iii) unlike large software-managed TLBs, does not require contiguous physical allocations, (iv) is compatible with modern large page mechanisms and (iv) incurs very small area and power overheads of and, respectively, on a modern high-end CPU. The source code of Victima is freely available at https://github.com/CMU-SAFARI/Victima.

References

[1]
B Frey. PowerPC Architecture Book 2003. www.ibm.com/developerworks/eserver/inproceedingss/archguide.html.
[2]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. Efficient Virtual Memory for Big Memory Servers. In ISCA 2013.
[3]
Vasileios Karakostas, Osman S. Unsal, Mario Nemirovsky, Adrian Cristal, and Michael Swift. Performance Analysis of the Memory Management Unit Under Scale-out Workloads. In IISWC 2014.
[4]
Thomas W. Barr, Alan L. Cox, and Scott Rixner. Translation Caching: Skip, Don’t Walk (the Page Table). In ISCA 2010.
[5]
Linux. 5 Level Paging. https://docs.kernel.org/x86/x86_64/5level-paging.html.
[6]
Kaiyang Zhao, Kaiwen Xue, Ziqi Wang, Dan Schatzberg, Leon Yang, Antonis Manousis, Johannes Weiner, Rik Van Riel, Bikash Sharma, Chunqiang Tang, and Dimitrios Skarlatos. Contiguitas: the Pursuit of Physical Memory Contiguity in Datacenters. In ISCA 2023.
[7]
Sandeep Kumar, Aravinda Prasad, Smruti R. Sarangi, and Sreenivas Subramoney. Radiant: Efficient Page Table Management for Tiered Memory Systems. In ISMM 2021.
[8]
Abhishek Bhattacharjee and Margaret Martonosi. Characterizing the TLB Behavior of Emerging Parallel Workloads On Chip Multiprocessors. In PACT 2009.
[9]
Swapnil Haria, Mark D. Hill, and Michael M. Swift. Devirtualizing Memory in Heterogeneous Systems. In ASPLOS 2018.
[10]
Idan Yaniv and Dan Tsafrir. Hash, Don’t Cache (the Page Table). In SIGMETRICS 2016.
[11]
Timothy Merrifield and H. Reza Taheri. Performance ImplicatiOns of Extended Page Tables On Virtualized X86 Processors. In VEE 2016.
[12]
Peter Hornyack, Luis Ceze, Steve Gribble, Dan Ports, and Hank Levy. 01O. A Study of Virtual Memory Usage and Implications for Large Memory. Technical Report. Univ. of Washington.
[13]
Advanced Micro Devices. AMD-V Nested Paging, White Paper. http://developer.amd.com/wordpress/media/2012/10/NPT-WP-1%201-final-TM.pdf.
[14]
Google, Inc.Compute Engine: Enabling Nested Virtualization for VM Instances. https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances.
[15]
Abhishek Bhattacharjee, Daniel Lustig, and Margaret Martonosi. Shared Last-Level TLBs for Chip Multiprocessors. In ISCA 2011.
[16]
Srikant Bharadwaj, Guilherme Cox, Tushar Krishna, and Abhishek Bhattacharjee. Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects. In MICRO 2018.
[17]
B. Pratheek, Neha Jawalkar, and Arkaprava Basu. Improving GPU Multi-tenancy with Page Walk Stealing. In HPCA 2021.
[18]
Jee Ho Ryoo, Nagendra Gulur, Shuang Song, and Lizy K. John. RethInking TLB Designs in Virtualized Environments: A Very Large Part-of-Memory TLB. In ISCA 2017.
[19]
Yashwant Marathe, Nagendra Gulur, Jee Ho Ryoo, Shuang Song, and Lizy K. John. CSALT: Context Switch Aware Large TLB. In MICRO 2017.
[20]
Yunfang Tai, Wanwei Cai, Qi Liu, Ge Zhang, and Wenzhi Wang. Comparisons of Memory Virtualization Solutions for Architectures with Software-Managed TLBs. In NAS 2013.
[21]
Xiaotao Chang, Hubertus Franke, Yi Ge, Tao Liu, Kun Wang, Jimi Xenidis, Fei Chen, and Yu Zhang. Improving Virtualization in the Presence of Software Managed Translation Lookaside Buffers. In ISCA 2013.
[22]
Richard Uhlig, David Nagle, Tim Stanley, Trevor Mudge, Stuart Sechrest, and Richard Brown. Design Tradeoffs for Software-Managed TLBs. In TOCS 1994.
[23]
Bruce L. Jacob and Trevor N. Mudge. A Look At Several Memory Management Units, TLB-Refill Mechanisms, and Page Table OrganizAtions. In ASPLOS 1998.
[24]
D. R. Cheriton, G. A. Slavenburg, and P. D. Boyle. Software-Controlled Caches in the VMP Multiprocessor. In ISCA 1986.
[25]
David Nagle, Richard Uhlig, Tim Stanley, Stuart Sechrest, Trevor N. Mudge, and Richard B. Brown. Design Tradeoffs for Software-managed TLBs. In ISCA 1993.
[26]
Kavita Bala, M. Frans Kaashoek, and William E. Weihl. Software Prefetching and Caching for Translation Lookaside Buffers. In OSDI 1994.
[27]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. CACTI 7.0: A Tool to Model Large Caches. In HP laboratories.
[28]
Anant Vithal Nori, Jayesh Gaur, Siddharth Rai, Sreenivas Subramoney, and Hong Wang. Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook At Multi-Level Cache Hierarchies. In ISCA 2018.
[29]
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. Clearing the Clouds: A Study of Emerging Scale-Out Workloads On Modern Hardware. In ASPLOS 2012.
[30]
Majid Jalili and Mattan Erez. Harvesting L2 Caches in Server Processors. In arXiv 2023.
[31]
Geraldo F. Oliveira, Juan Gómez-Luna, Lois Orosa, Saugata Ghose, Nandita Vijaykumar, Ivan Fernandez, Mohammad Sadrosadati, and Onur Mutlu. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks. In IEEE Access 2021.
[32]
Priyank Faldu, Jeff Diamond, and Boris Grot. Domain-specialized Cache Management for Graph Analytics. In HPCA 2020.
[33]
A. Basak, S. Li, X. Hu, S. M. Oh, X. Xie, L. Zhao, X. Jiang, and Y. Xie. Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads. In HPCA 2019.
[34]
Stijn Eyerman, Wim Heirman, Kristof Du Bois, Joshua B. Fryman, and Ibrahim Hur. Many-Core Graph Workload Analysis. In SC 2018.
[35]
Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, and Onur Mutlu. Hermes: Accelerating Long-Latency Load Requests Via Perceptron-Based Off-Chip Load Prediction. In MICRO 2022.
[36]
Moinuddin K. Qureshi, M. Aater Suleman, and Yale N. Patt. Line Distillation: Increasing Cache Capacity By Filtering Unused Words in Cache Lines. In HPCA 2007.
[37]
Moinuddin K. Qureshi. Adaptive Spill-Receive for Robust high-performance Caching in CMPs. In HPCA 2009.
[38]
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. Adaptive Insertion Policies for High Performance Caching. In ISCA 2007.
[39]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. In ISCA 2015.
[40]
Zhengrong Wang and Tony Nowatzki. Stream-based Memory Access Specialization for General Purpose Processors. In ISCA 2019.
[41]
D. H. Yoon, M. K. Jeong, M. Sullivan, and M. Erez. The Dynamic Granularity Memory System. In ISCA 2012.
[42]
The Linux Kernel.https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html.
[43]
Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulations. In SC 2011.
[44]
SAFARI Research Group. Victima - Github Repository. https://github.com/CMU-SAFARI/Victima.
[45]
Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim, and Ching-Yung Lin. GraphBIG: Understanding Graph Computing in the Context of Industrial Solutions. In SC 2015.
[46]
Steven J. Plimpton, Ron Brightwell, Courtenay Vaughan, Keith Underwood, and Mike Davis. A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark. In Cluster 2006.
[47]
John R Tramm, Andrew R Siegel, Tanzima Islam, and Martin Schulz. XSBench - The Development and Verification of a Performance Abstraction for Monte Carlo Reactor Analysis. In PHYSOR 2014.
[48]
Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. Deep Learning Recommendation Model for Personalization and Recommendation Systems.
[49]
Arun Subramaniyan, Yufeng Gu, Timothy Dunn, Somnath Paul, Md. Vasimuddin, Sanchit Misra, David Blaauw, Satish Narayanasamy, and Reetuparna Das. GenomicsBench: A Benchmark Suite for Genomics. In ISPASS 2021.
[50]
Jayneel Gandhi, Mark D. Hill, and Michael M. Swift. Agile Paging: Exceeding the Best of Nested and Shadow Paging. In ISCA 2016.
[51]
Wiki Chip. Intel Raptor Lake. https://en.wikichip.org/wiki/intel/microarchitectures/raptor_lake.
[52]
Abhishek Bhattacharjee. Breaking the Address Translation Wall By Accelerating Memory Replays. In IEEE Micro 2018.
[53]
Steven M Hand. Self-Paging in the Nemesis Operating System. In OSDI 1999.
[54]
Kai Li and Paul Hudak. Memory Coherence in Shared Virtual Memory Systems. In TOCS 1989.
[55]
Andrew W. Appel and Kai Li. Virtual Memory Primitives for User Programs. In ASPLOS 1991.
[56]
Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew. Machine-independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. In OSR 1987.
[57]
M. Satyanarayanan, Henry H. Mashburn, Puneet Kumar, David C. Steere, and James J. Kistler. Lightweight Recoverable Virtual Memory. In SOSP 1993.
[58]
E. Abrossimov, M. Rozier, and M. Shapiro. Generic Virtual Memory Management for Operating System Kernels. In SOSP 1989.
[59]
Richard W. Carr and John L. Hennessy. WSCLOCK – A Simple and Effective Algorithm for Virtual Memory Management. In SOSP 1981.
[60]
Ting Yang, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss. CRAMM: Virtual Memory Support for Garbage-Collected Applications. In OSDI 2006.
[61]
Peter J. Denning. Virtual Memory. In CSUR 1970.
[62]
Thomas Ahearn, Robert Capowski, Neal Christensen, Patrick Gannon, Arlin Lee, and John Liptay. Virtual Memory System.
[63]
Robert P Goldberg. Survey of Virtual Machine Research. In Computer 1974.
[64]
A.J. Smith. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory. In IEEE Transactions on Software Engineering 1978.
[65]
D. A. Wood, S. J. Eggers, G. Gibson, M. D. Hill, and J. M. Pendleton. An In-Cache Address Translation Mechanism. In ISCA 1986.
[66]
J Bradley Chen, Anita Borg, and Norman P Jouppi. A Simulation Based Study of TLB Performance. In ISCA 1992.
[67]
Eric J. Koldinger, Jeffrey S. Chase, and Susan J. Eggers. Architecture Support for Single Address Space Operating Systems. In ASPLOS 1992.
[68]
Anders Lindstrom, John Rosenberg, and Alan Dearle. The Grand Unified Theory of Address Spaces. In HotOS 1995.
[69]
Bruce Jacob and Trevor Mudge. Virtual Memory in Contemporary Microprocessors. In IEEE Micro 1998.
[70]
D. R. Engler, S. K. Gupta, and M. F. Kaashoek. AVM: Application-Level Virtual Memory. In HotOS 1995.
[71]
Jerry Huck and Jim Hays. Architectural Support for Translation Table Management in Large Address Space Machines. In ISCA 1993.
[72]
Thomas E. Anderson, Henry M. Levy, Brian N. Bershad, and Edward D. Lazowska. The Interaction of Architecture and Operating System Design. In ASPLOS 1991.
[73]
F. J. Corbató and V. A. Vyssotsky. Introduction and Overview of the Multics System. In AFIPS 1965.
[74]
Intel. Intel® 64 and IA-32 Architectures Software Developer’s Manual, Vol. 3: System Programming Guide 3A 4-19.
[75]
ARM. Arm Architecture Reference Manual for A-profile Architecture. https://developer.arm.com/documentation/ddi0487/latest/.
[76]
WikiChip. Intel Cascade Lake. https://en.wikichip.org/wiki/intel/cores/cascade_lake_sp.
[77]
Venkat Sri Sai Ram, Ashish Panwar, and Arkaprava Basu. Trident: Harnessing Architectural Resources for All Page Sizes in X86 Processors. In MICRO 2021.
[78]
Jonathan Corbet. Transparent Huge Pages in 2.6.38. https://lwn.net/inproceedingss/423584/.
[79]
Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan Cox. Practical, Transparent Operating System Support for Superpages. In OSDI 2002.
[80]
Ashish Panwar, Sorav Bansal, and K Gopinath. Hawkeye: Efficient Fine-grained Os Support for Huge Pages. In ASPLOS 2019.
[81]
Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. Translation Ranger: Operating System Support for Contiguity-Aware TLBs. In ISCA 2019.
[82]
Chloe Alverti, Stratos Psomadakis, Vasileios Karakostas, Jayneel Gandhi, Konstantinos Nikas, Georgios Goumas, and Nectarios Koziris. Enhancing and Exploiting Contiguity for Fast Memory Virtualization. In ISCA 2020.
[83]
Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, and Dimitrios Skarlatos. TMO: Transparent Memory Offloading in Datacenters. In ASPLOS 2022.
[84]
Hasan Al Maruf, Yuhong Zhong, Hongyi Wang, Mosharaf Chowdhury, Asaf Cidon, and Carl Waldspurger. Memtrade: Marketplace for Disaggregated Memory Clouds. In SIGMETRICS 2023.
[85]
Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. Software-Defined Far Memory in Warehouse-Scale Computers. In ASPLOS 2019.
[86]
Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, and Josep Torrellas. Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism. In ASPLOS 2020.
[87]
Jovan Stojkovic, Namrata Mantri, Dimitrios Skarlatos, Tianyin Xu, and Josep Torrellas. Memory-Efficient Hashed Page Tables. In HPCA 2023.
[88]
Chang Hyun Park, Ilias Vougioukas, Andreas Sandberg, and David Black-Schaffer. Every Walk’s a Hit: Making Page Walks Single-Access Cache Hits. In ASPLOS 2022.
[89]
Jovan Stojkovic, Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, and Josep Torrellas. Parallel Virtualized Memory Translation with Nested Elastic Cuckoo Page Tables. In ASPLOS 2022.
[90]
Elixir. ASID in Linux. https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/tlbflush.h.
[91]
Aninda Manocha, Juan L. Aragón, and Margaret Martonosi. Graphfire: Synergizing Fetch, Insertion, and Replacement Policies for Graph Analytics. In TC 2023.
[92]
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, and Joel Emer. High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP). In ISCA 2010.
[93]
Simon Haykin. 0F2. Neural Networks: a Comprehensive Foundation.
[94]
Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-L.
[95]
Jack Tigar Humphries, Kostis Kaffes, David Mazières, and Christos Kozyrakis. A Case Against (Most) Context Switches. In HotOS 2021.
[96]
Dong Du, Zhichao Hua, Yubin Xia, Binyu Zang, and Haibo Chen. XPC: Architectural Support for Secure and Efficient Cross Process Call(ISCA).
[97]
Mohan Kumar Kumar, Steffen Maass, Sanidhya Kashyap, Ján Veselỳ, Zi Yan, Taesoo Kim, Abhishek Bhattacharjee, and Tushar Krishna. Latr: Lazy Translation Coherence. In ASPLOS 2018.
[98]
Hewlett Packard. McPAT. https://github.com/HewlettPackard/mcpat.
[99]
John W. C. Fu, Janak H. Patel, and Bob L. Janssens. Stride Directed Prefetching in Scalar Processors. In MICRO 1992.
[100]
Tien-Fu Chen and Jean-Loup Baer. Effective Hardware-based Data Prefetching for High-performance Processors. In TC 1995.
[101]
Andrea Arcangeli. Transparent Hugepage Support. In KVM Forum 2010.
[102]
Sam Ainsworth and Timothy M. Jones. Compendia: Reducing Virtual-Memory Costs Via Selective Densification. In ISMM 2021.
[103]
Siddharth Gupta, Atri Bhattacharyya, Yunho Oh, Abhishek Bhattacharjee, Babak Falsafi, and Mathias Payer. Rebooting Virtual Memory with Midgard. In ISCA 2021.
[104]
Samira Mirbagher-Ajorpaz, Elba Garza, Gilles Pokam, and Daniel A. Jiménez. CHiRP: Control-Flow History Reuse Prediction. In MICRO 2020.
[105]
Misel-Myrto Papadopoulou, Xin Tong, André Seznec, and Andreas Moshovos. Prediction-Based Superpage-Friendly TLB Designs. In HPCA 2015.
[106]
Toni Juan, Tomas Lang, and Juan J. Navarro. Reducing TLB Power Requirements. In ISLPED 1997.
[107]
T.H. Romer, W.H. Ohlrich, A.R. Karlin, and B.N. Bershad. Reducing TLB and Memory Overhead Using Online Superpage Promotion. In ISCA 1995.
[108]
I. Kadayif, P. Nath, M. Kandemir, and A. Sivasubramaniam. Compiler-directed Physical Address Generation for Reducing dTLB Power. In ISPASS 2004.
[109]
Thomas W. Barr, Alan L. Cox, and Scott Rixner. SpecTLB: A Mechanism for Speculative Address Translation. In ISCA 2011.
[110]
Tianhao Zheng, Haishan Zhu, and Mattan Erez. SIPT: Speculatively Indexed, Physically Tagged Caches. In HPCA 2018.
[111]
A. Seznec. Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB. In TC 2004.
[112]
Georgios Vavouliotis, Lluc Alvarez, Vasileios Karakostas, Konstantinos Nikas, Nectarios Koziris, Daniel A. Jiménez, and Marc Casas. Exploiting Page Table Locality for Agile TLB Prefetching. In ISCA 2021.
[113]
Georgios Vavouliotis, Lluc Alvarez, Boris Grot, Daniel Jiménez, and Marc Casas. Morrigan: A Composite Instruction TLB Prefetcher. In MICRO 2021.
[114]
Artemiy Margaritov, Dmitrii Ustiugov, Edouard Bugnion, and Boris Grot. Prefetched Address Translation. In MICRO 2019.
[115]
Gokul B Kandiraju and Anand Sivasubramaniam. Going the Distance for TLB Prefetching: An Application-driven Study. In ISCA 2002.
[116]
Ashley Saulsbury, Fredrik Dahlgren, and Per Stenström. Recency-based TLB Preloading. In ISCA 2000.
[117]
Abhishek Bhattacharjee. Large-Reach Memory Management Unit Caches. In MICRO 2013.
[118]
Chandrashis Mazumdar, Prachatos Mitra, and Arkaprava Basu. Dead Page and Dead Block Predictors: Cleaning TLBs and Caches Together. In HPCA 2021.
[119]
Albert Esteve, Maria Engracia Gómez, and Antonio Robles. Exploiting Parallelization On Address Translation: Shared Page Walk Cache. In OMHI 2014.
[120]
Osang Kwon, Yongho Lee, and Seokin Hong. Pinning Page Structure Entries to Last-Level Cache for Fast Address Translation. In IEEE Access 2022.
[121]
Vasudha Vasudha and Biswabandan Panda. Address Translation Conscious Caching and Prefetching for High Performance Cache Hierarchy. In ISPASS 2022.
[122]
Swapnil Haria, Michael M. Swift, and Mark D. Hill. Devirtualizing Virtual Memory for Heterogeneous Systems. In ASPLOS 2018.
[123]
Krishnan Gosakan, Jaehyun Han, William (Massachusetts Inst. of Technology) Kuszmaul, Ibrahim Nael Mubarek, Nirjhar Mukherjee, Guido Tagliavini, Evan West, Michael Bender, Abhishek Bhattacharjee, Alex Conway, Martin Farach-Colton, Jayneel Gandhi, Rob Johnson, Sudarsun Kannan, and Donald Porter. Mosaic Pages: Big TLB Reach with Small Pages. In ASPLOS 2023.
[124]
Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, F. Nisa Bostanci, Can Firtina, Rachata Ausavarungnirun, Rakesh Kumar, Nastaran Hajinazar, Mohammad Sadrosadati, Nandita Vijaykumar, and Onur Mutlu. Utopia: Fast and Efficient Address Translation via Hybrid Flexible and Restrictive Virtual-to-Physical Address Mappings. In MICRO 2023.
[125]
Javier Picorel, Djordje Jevdjic, and Babak Falsafi. Near-Memory Address Translation. In PACT 2017.
[126]
Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu. Accelerating Pointer Chasing in 3D-stacked Memory: Challenges, Mechanisms, Evaluation. In ICCD 2016.
[127]
Reto Achermann, Ashish Panwar, Abhishek Bhattacharjee, Timothy Roscoe, and Jayneel Gandhi. Mitosis: Transparently Self-Replicating Page-Tables for Large-Memory Machines. In ASPLOS 2020.
[128]
Hanna Alam, Tianhao Zhang, Mattan Erez, and Yoav Etsion. Do-It-Yourself Virtual Memory Translation. In ISCA 2017.
[129]
Dimitris Fotakis, Rasmus Pagh, Peter Sanders, and Paul G. Spirakis. Space Efficient Hash Tables with Worst Case Constant Access Time. In STACS 2003.
[130]
Chang Hyun Park, Sanghoon Cha, Bokyeong Kim, Youngjin Kwon, David Black-Schaffer, and Jaehyuk Huh. Perforated Page: Supporting Fragmented Memory Allocation for Large Pages. In ISCA 2020.
[131]
Faruk Guvenilir and Yale N Patt. Tailored Page Sizes. In ISCA 2020.
[132]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. Coordinated and Efficient Huge Page Management with Ingens. In OSDI 2016.
[133]
Madhusudhan Talluri, Shing Kong, Mark D. Hill, and David A. Patterson. Tradeoffs in Supporting Two Page Sizes. In ISCA 1992.
[134]
Ashish Panwar, Aravinda Prasad, and K Gopinath. Making Huge Pages Actually Useful. In ASPLOS 2018.
[135]
Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways?. In MICRO 2015.
[136]
Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In MICRO 2017.
[137]
Zhen Fang, Lixin Zhang, J.B. Carter, W.C. Hsieh, and S.A. McKee. Reevaluating Online Superpage Promotion with Hardware Support. In HPCA 2001.
[138]
Mark Swanson, Leigh Stoller, and John Carter. Increasing TLB Reach Using Superpages Backed By Shadow Memory. In ISCA 1998.
[139]
Yu Du, Miao Zhou, Bruce R Childers, Daniel Mossé, and Rami Melhem. Supporting Superpages in Non-Contiguous Physical Memory. In HPCA 2015.
[140]
Madhusudhan Talluri and Mark D. Hill. Surpassing the TLB Performance of Superpages with Less Operating System Support. In ASPLOS 1994.
[141]
Mel Gorman and Patrick Healy. Supporting Superpage Allocation Without Additional Hardware Support. In ISMM 2008.
[142]
Mohammad Agbarya, Idan Yaniv, Jayneel Gandhi, and Dan Tsafrir. Predicting Execution Times with Partial Simulations in Virtual Memory Research: Why and How. In MICRO 2020.
[143]
Narayanan Ganapathy and Curt Schimmel. General Purpose Operating System Support for Multiple Page Sizes. In ATC 1998.
[144]
Vasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. Unsal. Energy-Efficient Address Translation. In HPCA 2016.
[145]
Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman Ünsal. Redundant Memory Mappings for Fast Access to Large Memories. In ISCA 2015.
[146]
Chang Hyun Park, Taekyung Heo, Jungi Jeong, and Jaehyuk Huh. Hybrid TLB Coalescing: Improving TLB Translation Coverage Under Diverse Fragmented Memory Allocations. In ISCA 2017.
[147]
Dongwei Chen, Dong Tong, Chun Yang, Jiangfang Yi, and Xu Cheng. FlexPointer: Fast Address TranslatiOn Based On Range TLB and Tagged Pointers. In TACO 2023.
[148]
Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. CoLT: Coalesced Large-Reach TLBs. In MICRO 2012.
[149]
Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks. In MICRO 2014.
[150]
Binh Pham, Jan Vesely, Gabriel H Loh, and Abhishek Bhattacharjee. 0O3. Using TLB Speculation to Overcome Page Splintering in Virtual Machines. Technical Report DCS-TR-713. Rutgers Univ.
[151]
Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. Accelerating Two-Dimensional Page Walks for Virtualized Systems. In ASPLOS 2008.
[152]
Zi Yan, Ján Veselỳ, Guilherme Cox, and Abhishek Bhattacharjee. Hardware Translation Coherence for Virtualized Systems. In ISCA 2017.
[153]
Dimitrios Skarlatos, Umur Darbaz, Bhargava Gopireddy, Nam Sung Kim, and Josep Torrellas. BabelFish: Fusing Address Translations for Containers. In ISCA 2020.
[154]
Artemiy Margaritov, Dmitrii Ustiugov, Amna Shahab, and Boris Grot. PTEMagnet: FIne-graIned Physical Memory Reservation for Faster Page Walks in Public Clouds. In ASPLOS 2021.
[155]
Ashish Panwar, Reto Achermann, Arkaprava Basu, Abhishek Bhattacharjee, K Gopinath, and Jayneel Gandhi. Fast Local Page-tables for Virtualized Numa Servers with vmitosis. In ASPLOS 2021.
[156]
Stefanos Kaxiras and Alberto Ros. A New Perspective for Efficient Virtual-Cache Coherence. In ISCA 2013.
[157]
Mayank Parasar, Abhishek Bhattacharjee, and Tushar Krishna. SEESAW: Using Superpages to Improve VIPT Caches. In ISCA 2018.
[158]
Arkaprava Basu, Mark D. Hill, and Michael M. Swift. Reducing Memory Reference Energy with Opportunistic Virtual Caching. In ISCA 2012.
[159]
Michel Cekleov and Michel Dubois. Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors. In IEEE Micro 1997.
[160]
Lixin Zhang, Evan Speight, Ram Rajamony, and Jiang Lin. Enigma: Architectural and Operating System Support for Reducing the Impact of Address Translation. In ICS 2010.
[161]
Nastaran Hajinazar, Pratyush Patel, Minesh Patel, Konstantinos Kanellopoulos, Saugata Ghose, Rachata Ausavarungnirun, Geraldo F. Oliveira, Jonathan Appavoo, Vivek Seshadri, and Onur Mutlu. The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework. In ISCA 2020.
[162]
Michel Cekleov and Michel Dubois. Virtual-Address Caches Part 2: Multiprocessor Issues. In IEEE Micro 1997.
[163]
Andy B Yoo, Morris A Jette, and Mark Grondona. Slurm: Simple Linux Utility for Resource Management. In Workshop on Job Scheduling Strategies 2003.
[164]
James R. Goodman. Coherency for Multiprocessor Virtual Address Caches. In ASPLOS 1987.
[165]
W. H. Wang, J.-L. Baer, and H. M. Levy. Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy. In ISCA 1989.
[166]
Bob Wheeler and Brian N. Bershad. Consistency Management for Virtually Indexed Caches. In ASPLOS 1992.

Cited By

View all
  • (2024)A Case for Speculative Address Translation with Rapid Validation for GPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00029(278-292)Online publication date: 2-Nov-2024
  • (2024)Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00013(36-49)Online publication date: 2-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
October 2023
1528 pages
ISBN:9798400703294
DOI:10.1145/3613424
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Address Translation
  2. Cache
  3. Memory Hierarchy
  4. Memory Systems
  5. Microarchitecture
  6. TLB
  7. Virtual Memory
  8. Virtualization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Google, Huawei, Intel, Microsoft, VMware, SRC, EFCL

Conference

MICRO '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)606
  • Downloads (Last 6 weeks)54
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Case for Speculative Address Translation with Rapid Validation for GPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00029(278-292)Online publication date: 2-Nov-2024
  • (2024)Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00013(36-49)Online publication date: 2-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media