research-article

Public Access

LATR: Lazy Translation Coherence

Authors:

Mohan Kumar Kumar,

Sanidhya Kashyap,

Abhishek Bhattacharjee,

Tushar KrishnaAuthors Info & Claims

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 651 - 664

https://doi.org/10.1145/3173162.3173198

Published: 19 March 2018 Publication History

Abstract

We propose LATR-lazy TLB coherence-a software-based TLB shootdown mechanism that can alleviate the overhead of the synchronous TLB shootdown mechanism in existing operating systems. By handling the TLB coherence in a lazy fashion, LATR can avoid expensive IPIs which are required for delivering a shootdown signal to remote cores, and the performance overhead of associated interrupt handlers. Therefore, virtual memory operations, such as free and page migration operations, can benefit significantly from LATR's mechanism. For example, LATR improves the latency of munmap() by 70.8% on a 2-socket machine, a widely used configuration in modern data centers. Real-world, performance-critical applications such as web servers can also benefit from LATR: without any application-level changes, LATR improves Apache by 59.9% compared to Linux, and by 37.9% compared to ABIS, a highly optimized, state-of-the-art TLB coherence technique.

References

[1]

Lluc Alvarez, Llu'ıs Vilanova, Miquel Moreto, Marc Casas, Marc Gonzàlez, Xavier Martorell, Nacho Navarro, Eduard Ayguadé, and Mateo Valero. Coherence Protocol for Transparent Management of Scratchpad Memories in Shared Memory Manycore Architectures. In Proceedings of the 42nd ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 720--732, Portland, OR, June 2015.

Digital Library

[2]

Nadav Amit. Optimizing the TLB Shootdown Algorithm with Page Access Tracking. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC), pages 27--39, Santa Clara, CA, July 2017.

Digital Library

[3]

Lukasz Anaczkowski. Linux VM workaround for Knights Landing A/D leak, 2016. https://lkml.org/lkml/2016/6/14/505.

[4]

Apache. Apache HTTP Server Project, 2017. https://httpd.apache.org/.

[5]

Ravi Arimilli, Guy Guthrie, and Kirk Livingston. Multiprocessor system supporting multiple outstanding TLBI operations per partition, October 2004. US Patent App. 10/425,425.

[6]

ARM. ARM Compiler Reference Guide: TLBI, 2014. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/TLBI_SYS.html.

[7]

Amro Awad, Arkaprava Basu, Sergey Blagodurov, Yan Solihin, and Gabriel H. Loh. Avoiding TLB Shootdowns through Self-invalidating TLB Entries. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 273--287, Portland, OR, September 2017.

[8]

Ramesh Balan and Kurt Gollhard. A Scalable Implementation of Virtual Memory HAT Layer for Shared Memory Multiprocessor Machine. In Proceedings of the Summer 1992 USENIX Annual Technical Conference (ATC), pages 107--115, San Antonio, TX, June 1992.

[9]

Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. Attack of the Killer Microseconds. Communications of the ACM, 60(4):48--54, March 2017.

Digital Library

[10]

Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 29--44, Big Sky, MT, October 2009.

Digital Library

[11]

Abhishek Bhattacharjee. Translation-Triggered Prefetching. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 63--76, Xi'an, China, April 2017.

Digital Library

[12]

Abhishek Bhattacharjee, Daniel Lustig, and Margaret Martonosi. Shared Last-Level TLBs for Chip Multiprocessors. In Proceedings of the 17th IEEE Symposium on High Performance Computer Architecture (HPCA), pages 62--73, San Antonio, TX, February 2011.

Digital Library

[13]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, Toronto, Canada, October 2008.

Digital Library

[14]

Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh, Don McCaule, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Shen, and Clair Webb. Die Stacking (3D) Microarchitecture. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 469--479, Orlando, FL, December 2006.

Digital Library

[15]

David L. Black, Richard F. Rashid, David B. Golub, Charles R. Hill, and Robert V. Baron. Translation Lookaside Buffer Consistency: A Software Approach. In Proceedings of the 3rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 113--122, Boston, MA, April 1989.

Digital Library

[16]

Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, and Zheng Zhang. Corey: An Operating System for Many Cores. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 43--57, San Diego, CA, December 2008.

Digital Library

[17]

Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. An Analysis of Linux Scalability to Many Cores. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Vancouver, Canada, October 2010.

Digital Library

[18]

Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. RadixVM: Scalable Address Spaces for Multithreaded Applications. In Proceedings of the 8th European Conference on Computer Systems (EuroSys), pages 211--224, Prague, Czech Republic, April 2013.

Digital Library

[19]

Jonathan Corbet. Memory compaction, 2010. https://lwn.net/Articles/368869/.

[20]

Jonathan Corbet. AutoNUMA: the other approach to NUMA scheduling, 2012. https://lwn.net/Articles/488709/.

[21]

Jonathan Corbet. (Nearly) full tickless operation in 3.10, 2013. https://lwn.net/Articles/549580/.

[22]

Christopher Covington. arm64: Work around Falkor erratum 1003, 2016. https://lkml.org/lkml/2016/12/29/267.

[23]

Guilherme Cox and Abhishek Bhattacharjee. Efficient Address Translation for Architectures with Multiple Page Sizes. In Proceedings of the 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 435--448, Xi'an, China, April 2017.

Digital Library

[24]

Linux Kernel Driver Database. CONFIG_ARM_ERRATA_720789, 2017. http://cateee.net/lkddb/web-lkddb/ARM_ERRATA_720789.html.

[25]

Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 137--150, San Francisco, CA, December 2004.

Digital Library

[26]

FreeBSD. FreeBSD - PCID implementation, 2015. https://reviews.freebsd.org/rS282684.

[27]

Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. Network Requirements for Resource Disaggregation. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 249--264, Savannah, GA, November 2016.

Digital Library

[28]

Jeff Gilchrist. Parallel Compression with BZIP2. In Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS), pages 559--564, Cambridge, MA, November 2004.

[29]

Will Glozer. wrk - a HTTP benchmarking tool, 2015. https://github.com/wg/wrk.

[30]

Mel Gorman. TLB flush multiple pages per IPI, 2015. https://lkml.org/lkml/2015/4/25/125.

[31]

Graph500 Reference Implementations, 2017. http://graph500.org/?page_id=47.

[32]

Intel. Multiprocessor Specification, 1997.

[33]

Intel Xeon Processor E5--4610 v2, 2014. http://ark.intel.com/products/75285/Intel-Xeon-Processor-E5--4610-v2--16M-Cache-2_30-GHz.

[34]

Introduction to Cache Allocation Technology in the Intel Xeon Processor E5 v4 Family, 2016. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology.

[35]

Intel Xeon Processor E7--8894 v4, 2017. http://ark.intel.com/products/96900/Intel-Xeon-Processor-E7--8894-v4--60M-Cache-2_40-GHz.

[36]

Gu Juncheng, Lee Youngmoon, Zhang Yiwen, Chowdhury Mosharaf, and Shin Kang. Efficient Memory Disaggregation with Infiniswap. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA, April 2017.

Digital Library

[37]

Vasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. Ünsal. Energy-Efficient Address Translation. In Proceedings of the 22nd IEEE Symposium on High Performance Computer Architecture (HPCA), pages 631--643, Barcelona, Spain, March 2016.

[38]

Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 705--721, Savannah, GA, November 2016.

Digital Library

[39]

Daniel Lustig, Abhishek Bhattacharjee, and Margaret Martonosi. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs. ACM Transactions on Architecture and Code Optimization (TACO), 10(1):2:1--2:38, April 2013.

Digital Library

[40]

Daniel Lustig, Geet Sethi, Margaret Martonosi, and Abhishek Bhattacharjee. COATCheck: Verifying Memory Ordering at the Hardware-OS Interface. In Proceedings of the 21st ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 233--247, Atlanta, GA, April 2016.

Digital Library

[41]

Yandong Mao, Robert Morris, and Frans Kaashoek. Optimizing MapReduce for Multicore Architectures. Technical Report MIT-CSAIL-TR-2010-020, MIT, May 2010.

[42]

Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H. Loh. Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-Stacked and Off-Package Memories. In Proceedings of the 21st IEEE Symposium on High Performance Computer Architecture (HPCA), pages 126--136, San Francisco, CA, February 2015.

[43]

Timothy Prickett Morgan. AMD Disrupts The Two-Socket Server Status Quo, 2017. https://www.nextplatform.com/2017/05/17/amd-disrupts-two-socket-server-status-quo/.

[44]

Mark Oskin and Gabriel H. Loh. A Software-Managed Approach to Die-Stacked DRAM. In Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 188--200, San Francisco, CA, September 2015.

Digital Library

[45]

J. Kent Peacock, Sunil Saxena, Dean Thomas, Fred Yang, and Wilfred Yu. Experiences from Multithreading System V Release 4. In Proceedings of the Symposium on Experiences with Distributed and Multiprocessor Systems, SEDMS III, pages 77--91, 1992.

Digital Library

[46]

Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. Increasing TLB Reach by Exploiting Clustering in Page Translations. In Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA), pages 558--567, Orlando, FL, USA, February 2014.

[47]

Binh Pham, Derek Hower, Abhishek Bhattacharjee, and Trey Cain. TLB Shootdown Mitigation for Low-Power, Many-Core Servers with L1 Virtual Caches. IEEE Computer Architecture Letters, PP(99), June 2017.

[48]

Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. CoLT: Coalesced Large-Reach TLBs. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 258--269, Vancouver, Canada, December 2012.

Digital Library

[49]

Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways? In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1--12, Waikiki, Hawaii, December 2015.

Digital Library

[50]

Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. Architectural Support for Address Translation on GPUs: Designing Memory Management Units for CPU/GPUs with Unified Address Spaces. In Proceedings of the 19th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 743--758, Salt Lake City, UT, March 2014.

Digital Library

[51]

Jason Power, Mark D. Hill, and David A. Wood. Supporting x86--64 Address Translation for 100s of GPU Lanes. In Proceedings of the 20th IEEE Symposium on High Performance Computer Architecture (HPCA), pages 568--578, Orlando, FL, USA, February 2014.

[52]

Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In Proceedings of the 13th IEEE Symposium on High Performance Computer Architecture (HPCA), pages 13--24, Phoenix, AZ, February 2007.

Digital Library

[53]

Bogdan F. Romanescu, Alvin R. Lebeck, and Daniel J. Sorin. Specifying and Dynamically Verifying Address Translation-aware Memory Consistency. In Proceedings of the 15th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 323--334, Pittsburgh, PA, March 2010.

Digital Library

[54]

Bogdan F. Romanescu, Alvin R. Lebeck, Daniel J. Sorin, and Anne Bracy. UNified Instruction/Translation/Data (UNITD) Coherence: One Protocol to Rule Them All. In Proceedings of the 16th IEEE Symposium on High Performance Computer Architecture (HPCA), pages 1--12, Bangalore, India, January 2010.

[55]

Anand Lal Shimpi. AMD's B3 stepping Phenom previewed, TLB hardware fix tested., 2008. http://www.anandtech.com/show/2477/2.

[56]

Patricia Teller. Translation-Lookaside Buffer Consistency. Computer, 23(6):26--36, June 1990.

Digital Library

[57]

Patricia J. Teller, Richard Kenner, and Marc Snir. TLB Consistency on Highly-Parallel Shared-Memory Multiprocessors. In Proceedings of the 21st Annual Hawaii International Conference on System Sciences. Volume I: Architecture Track, volume 1, pages 184--193, 1988.

Digital Library

[58]

Scott Rixner Thomas Barr, Alan Cox. SpecTLB: a Mechanism for Speculative Address Translation. In Proceedings of the 38th ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 307--318, San Jose, California, USA, June 2011.

Digital Library

[59]

Michael Y Thompson, JM Barton, TA Jermoluk, and JC Wagner. Translation Lookaside Buffer Synchronization in a Multiprocessor System. In Proceedings of the Winter 1988 USENIX Annual Technical Conference (ATC), Dallas, TX, 1988.

[60]

Linus Torvalds. Linux Kernel, 2017. https://github.com/torvalds/linux.

[61]

Theo Valich. Intel explains the Core 2 CPU errata., 2007. http://www.theinquirer.net/inquirer/news/1031406/intel-explains-core-cpu-errata.

[62]

Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrián Cristal, and Osman S. Ünsal. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 340--349, Galveston Island, TX, October 2011.

Digital Library

[63]

Carl A. Waldspurger. Memory Resource Management in VMware ESX Server. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 181--194, Boston, MA, December 2002.

Digital Library

[64]

Zi Yan, Ján Veselý, Guilherme Cox, and Abhishek Bhattacharjee. Hardware Translation Coherence for Virtualized Systems. In Proceedings of the 44th ACM/IEEE International Symposium on Computer Architecture (ISCA), pages 430--443, Toronto, Canada, June 2017.

Digital Library

Cited By

Dang ZHe SZhang XHong PLi ZChen XSong HSun XChen G(2024)PMAlloc: A Holistic Approach to Improving Persistent Memory AllocationACM Transactions on Computer Systems10.1145/364388642:3-4(1-52)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3643886
Kim YSong W(2024)Genie Cache: Non-Blocking Miss Handling and Replacement in Page-Table-Based DRAM Cache2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00076(983-996)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00076
Li BWang YWang TEeckhout LYang JJaleel ATang X(2024)STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00031(309-323)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00031
Show More Cited By

Index Terms

LATR: Lazy Translation Coherence
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory

Recommendations

ECOTLB: Eventually Consistent TLBs

We propose ecoTLB—software-based eventual translation lookaside buffer (TLB) coherence—which eliminates the overhead of the synchronous TLB shootdown mechanism in operating systems that use address space identifiers (ASIDs). With an eventual TLB ...
LATR: Lazy Translation Coherence
ASPLOS '18

We propose LATR-lazy TLB coherence-a software-based TLB shootdown mechanism that can alleviate the overhead of the synchronous TLB shootdown mechanism in existing operating systems. By handling the TLB coherence in a lazy fashion, LATR can avoid ...
Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and design

While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

March 2018

827 pages

ISBN:9781450349116

DOI:10.1145/3173162

General Chairs:
Xipeng Shen
North Carolina State University, USA
,
James Tuck
North Carolina State University, USA
,
Program Chairs:
Ricardo Bianchini
Microsoft Research, USA
,
Vivek Sarkar
Georgia Institute of Technology, USA

ACM SIGPLAN Notices Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency
Office of Naval Research
Electronics and Telecommunications Research Institute
National Science Foundation

Conference

ASPLOS '18

Sponsor:

ASPLOS '18: Architectural Support for Programming Languages and Operating Systems

March 24 - 28, 2018

VA, Williamsburg, USA

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
1,225
Total Downloads

Downloads (Last 12 months)307
Downloads (Last 6 weeks)51

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dang ZHe SZhang XHong PLi ZChen XSong HSun XChen G(2024)PMAlloc: A Holistic Approach to Improving Persistent Memory AllocationACM Transactions on Computer Systems10.1145/364388642:3-4(1-52)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3643886
Kim YSong W(2024)Genie Cache: Non-Blocking Miss Handling and Replacement in Page-Table-Based DRAM Cache2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00076(983-996)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00076
Li BWang YWang TEeckhout LYang JJaleel ATang X(2024)STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00031(309-323)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00031
Kwon OLee YPark JJang STak BHong S(2024)Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00013(36-49)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00013
Park JMin CYeom HSon Y(2024)Design and Implementation of Efficient and Transparent Zero Copy ReadIEEE Access10.1109/ACCESS.2024.350268812(174078-174093)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3502688
Li BGuo YWang YJaleel AYang JTang X(2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614269
Leis VAlhomssi AZiegler TLoeck YDietrich C(2023)Virtual-Memory Assisted Buffer ManagementProceedings of the ACM on Management of Data10.1145/35886871:1(1-25)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588687
Zhao KXue KWang ZSchatzberg DYang LManousis AWeiner JVan Riel RSharma BTang CSkarlatos DSolihin YHeinrich M(2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589079
Li BYin JHoley AZhang YYang JTang X(2023)Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071054(456-470)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071054
Gupta SOh YYan LSutherland MBhattacharjee AFalsafi BHsu P(2023)AstriFlash A Flash-Based System for Online Services2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070955(81-93)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070955
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten