ABSTRACT
The TLB is increasingly a bottleneck for big data applications. In most designs, the number of TLB entries are highly constrained by latency requirements, and growing much more slowly than the working sets of applications. Many solutions to this problem, such as huge pages, perforated pages, or TLB coalescing, rely on physical contiguity for performance gains, yet the cost of defragmenting memory can easily nullify these gains. This paper introduces mosaic pages, which increase TLB reach by compressing multiple, discrete translations into one TLB entry. Mosaic leverages virtual contiguity for locality, but does not use physical contiguity. Mosaic relies on recent advances in hashing theory to constrain memory mappings, in order to realize this physical address compression without reducing memory utilization or increasing swapping. This paper presents a full-system prototype of Mosaic, in gem5 and modified Linux. In simulation and with comparable hardware to a traditional design, mosaic reduces TLB misses in several workloads by 6-81%. Our results show that Mosaic’s constraints on memory mappings do not harm performance, we never see conflicts before memory is 98% full in our experiments — at which point, a traditional design would also likely swap. Once memory is over-committed, Mosaic swaps fewer pages than Linux in most cases. Finally, we present timing and area analysis for a verilog implementation of the hashing function required on the critical path for the TLB, and show that on a commercial 28nm CMOS process; the circuit runs at a maximum frequency of 4 GHz, indicating that a mosaic TLB is unlikely to affect clock frequency.
- Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. 2015. Fast Two-Level Address Translation for Virtualized Systems. IEEE Trans. Comput., 64, 12 (2015), dec, 3461–3474. issn:0018-9340 https://doi.org/10.1109/tc.2015.2401022 Google ScholarDigital Library
- Chloe Alverti, Stratos Psomadakis, Vasileios Karakostas, Jayneel Gandhi, Konstantinos Nikas, Georgios Goumas, and Nectarios Koziris. 2020. Enhancing and Exploiting Contiguity for Fast Memory Virtualization. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA ’20). IEEE, Virtual Event. 515–528. isbn:9781728146614 https://doi.org/10.1109/ISCA45697.2020.00050 Google ScholarDigital Library
- Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2010. Translation Caching: Skip, Don’t Walk (the Page Table). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA ’10). ACM, New York, NY, USA. 48–59. isbn:9781450300537 https://doi.org/10.1145/1815961.1815970 Google ScholarDigital Library
- Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2011. SpecTLB: A Mechanism for Speculative Address Translation. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA ’11). ACM, New York, NY, USA. 307–318. isbn:9781450304726 https://doi.org/10.1145/2000064.2000101 Google ScholarDigital Library
- Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). ACM, New York, NY, USA. 237–248. isbn:9781450320795 https://doi.org/10.1145/2485922.2485943 Google ScholarDigital Library
- Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2012. Reducing Memory Reference Energy with Opportunistic Virtual Caching. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA ’12). IEEE Computer Society, USA. 297–308. isbn:9781450316422 Google ScholarDigital Library
- Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martín Farach-Colton, Rob Johnson, Sudarsun Kannan, William Kuszmaul, Nirjhar Mukherjee, Don Porter, Guido Tagliavini, Janet Vorobyeva, and Evan West. 2021. Paging and the Address-Translation Problem. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’21). ACM, New York, NY, USA. 105–117. isbn:9781450380706 https://doi.org/10.1145/3409964.3461814 Google ScholarDigital Library
- Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2021. All-Purpose Hashing. https://doi.org/10.48550/ARXIV.2109.04548 Google Scholar
- Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2023. Tiny Pointers. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’23). Society for Industrial and Applied Mathematics, USA. 477–508. https://doi.org/10.1137/1.9781611977554.ch21 arxiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611977554.ch21. Google ScholarCross Ref
- Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. 2008. Accelerating Two-Dimensional Page Walks for Virtualized Systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIII). ACM, New York, NY, USA. 26–35. isbn:9781595939586 https://doi.org/10.1145/1346281.1346286 Google ScholarDigital Library
- Abhishek Bhattacharjee. 2013. Large-Reach Memory Management Unit Caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA. 383–394. isbn:9781450326384 https://doi.org/10.1145/2540708.2540741 Google ScholarDigital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News, 39, 2 (2011), aug, 1–7. issn:0163-5964 https://doi.org/10.1145/2024716.2024718 Google ScholarDigital Library
- Yann Collet. 2016. xxHash: Extremely fast hash algorithm. https://cyan4973.github.io/xxHash/ Google Scholar
- Intel Coorporation. 2022. Intel 64 and IA-32 architectures optimization reference manual. Google Scholar
- Cort Dougan, Paul Mackerras, and Victor Yodaiken. 1999. Optimizing the Idle Task and Other MMU Tricks. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI ’99). USENIX Association, USA. 229–237. isbn:1880446391 https://doi.org/10.5555/296806.296833 Google ScholarDigital Library
- Yu Du, Miao Zhou, Bruce R Childers, Daniel Mossé, and Rami Melhem. 2015. Supporting Superpages in Non-Contiguous Physical Memory. In Proceedings of the 21st International Symposium on High Performance Computer Architecture (HPCA ’15). IEEE, USA. 223–234. https://doi.org/10.1109/hpca.2015.7056035 Google ScholarCross Ref
- Stephane Eranian and David Mosberger. 2000. The Linux/ia64 Project: Kernel Design and Status Update. HP LABORATORIES TECHNICAL REPORT HPL. Google Scholar
- James R. Goodman. 1987. Coherency for Multiprocessor Virtual Address Caches. In Proceedings of the Second International Conference on Architectual Support for Programming Languages and Operating Systems (ASPLOS II). ACM, ew York, NY, USA. 72–81. isbn:0818608056 https://doi.org/10.1145/36206.36186 Google ScholarDigital Library
- Mel Gorman. 2010. Linux Huge Pages. https://lwn.net/Articles/375096/ Google Scholar
- Mel Gorman. 2018. AMD Zen Architecture. https://en.wikichip.org/wiki/amd/microarchitectures/zen Google Scholar
- Charles Gray, Matthew Chapman, Peter Chubb, David Mosberger-Tang, and Gernot Heiser. 2005. Itanium: A System Implementor’s Tale. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC ’05). USENIX Association, USA. 264–278. Google Scholar
- Joe Heinrich. 1994. MIPS R4000 Microprocessor User’s Manual. Google Scholar
- Mark D. Hill and Alan Jay Smith. 1984. Experimental Evaluation of On-Chip Microprocessor Cache Memories. In Proceedings of the 11th Annual International Symposium on Computer Architecture (ISCA ’84). ACM, New York, NY, USA. 158–166. isbn:0818605383 https://doi.org/10.1145/800015.808178 Google ScholarDigital Library
- Michal Hocko and Tomas Kalibera. 2010. Reducing Performance Non-Determinism via Cache-Aware Page Allocation Strategies. In Proceedings of the First Joint WOSP/SIPEW International Conference on Performance Engineering (WOSP/SIPEW ’10). ACM, New York, NY, USA. 223–234. isbn:9781605585635 https://doi.org/10.1145/1712605.1712640 Google ScholarDigital Library
- Jerry Huck and Jim Hays. 1993. Architectural Support for Translation Table Management in Large Address Space Machines. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA ’93). ACM, New York, NY, USA. 39–50. isbn:0-8186-3810-9 https://doi.org/10.1145/165123.165128 Google ScholarDigital Library
- Bruce L. Jacob and Trevor N. Mudge. 1998. A Look at Several Memory Management Units, TLB-Refill Mechanisms, and Page Table Organizations. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). ACM, New York, NY, USA. 295–306. isbn:1581131070 https://doi.org/10.1145/291069.291065 Google ScholarDigital Library
- Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, Can Firtina, Rachata Ausavarungnirun, Nastaran Hajinazar, Jisung Park, Nandita Vijaykumar, and Onur Mutlu. 2022. Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping. https://doi.org/10.48550/arXiv.2211.12205 arxiv:2211.12205. Google Scholar
- Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman Ünsal. 2015. Redundant Memory Mappings for Fast Access to Large Memories. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA ’15). ACM, New York, NY, USA. 66–78. isbn:9781450334020 https://doi.org/10.1145/2749469.2749471 Google ScholarDigital Library
- Vasileios Karakostas, Jayneel Gandhi, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. Unsal. 2016. Energy-efficient address translation. In Proceedings of the 22nd International Symposium on High Performance Computer Architecture (HPCA ’16). IEEE, USA. 631–643. https://doi.org/10.1109/HPCA.2016.7446100 Google ScholarCross Ref
- Stefanos Kaxiras and Alberto Ros. 2013. A New Perspective for Efficient Virtual-Cache Coherence. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). ACM, New York, NY, USA. 535–546. isbn:9781450320795 https://doi.org/10.1145/2485922.2485968 Google ScholarDigital Library
- Richard E Kessler and Mark D Hill. 1992. Page Placement Algorithms for Large Real-Indexed Caches. ACM Transactions on Computer Systems, 10, 4 (1992), nov, 338–359. issn:0734-2071 https://doi.org/10.1145/138873.138876 Google ScholarDigital Library
- Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI ’16). USENIX Association, USA. 705–721. isbn:978-1-931971-33-1 https://doi.org/10.5555/3026877.3026931 Google ScholarDigital Library
- John S. Liptay. 1968. Structural Aspects of the System/360 Model 85: II the Cache. IBM Systems Journal, 7, 1 (1968), mar, 15–21. issn:0018-8670 https://doi.org/10.1147/sj.71.0015 Google ScholarDigital Library
- Artemiy Margaritov, Dmitrii Ustiugov, Edouard Bugnion, and Boris Grot. 2019. Prefetched Address Translation. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-52). ACM, New York, NY, USA. 1023–1036. isbn:9781450369381 https://doi.org/10.1145/3352460.3358294 Google ScholarDigital Library
- Chris Mellor. 2022. SK hynix announces CXL 2 memory cards and SDK. https://blocksandfiles.com/2022/08/02/sk-hynix-announces-cxl-2-memory-cards-and-sdk/ Google Scholar
- 2022. "Disable Transparent Huge Pages (THP)". https://www.mongodb.com/docs/manual/tutorial/transparent-huge-pages/ Google Scholar
- Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan Cox. 2002. Practical, Transparent Operating System Support for Superpages. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI ’02). USENIX Association, USA. 89–104. isbn:9781450301114 https://doi.org/10.5555/1060289.1060299 Google ScholarDigital Library
- Prashant Pandey, Michael A. Bender, Alex Conway, Martín Farach-Colton, William Kuszmaul, Guido Tagliavini, and Rob Johnson. 2023. IcebergHT: High Performance PMEM Hash Tables Through Stability and Low Associativity. In Proceedings of the 2023 International Conference on Management of Data, to be published (SIGMOD ’23). ACM, New York, NY, USA. Google Scholar
- Ashish Panwar, Sorav Bansal, and K. Gopinath. 2019. HawkEye: Efficient Fine-Grained OS Support for Huge Pages. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’19). ACM, New York, NY, USA. 347–360. isbn:9781450362405 https://doi.org/10.1145/3297858.3304064 Google ScholarDigital Library
- Chang Hyun Park, Sanghoon Cha, Bokyeong Kim, Youngjin Kwon, David Black-Schaffer, and Jaehyuk Huh. 2020. Perforated Page: Supporting Fragmented Memory Allocation for Large Pages. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA ’20). IEEE Press, Virtual Event. 913–925. isbn:9781728146614 https://doi.org/10.1109/ISCA45697.2020.00079 Google ScholarDigital Library
- Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh. 2016. Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA ’16). IEEE, Seoul, Republic of Korea. 217–229. isbn:9781467389471 https://doi.org/10.1109/ISCA.2016.28 Google ScholarDigital Library
- Chang Hyun Park, Taekyung Heo, Jungi Jeong, and Jaehyuk Huh. 2017. Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA ’17). ACM, New York, NY, USA. 444–456. isbn:9781450348928 https://doi.org/10.1145/3079856.3080217 Google ScholarDigital Library
- Mihai Patrascu and Mikkel Thorup. 2011. The Power of Simple Tabulation Hashing. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing (STOC ’11). ACM, New York, NY, USA. 1–10. isbn:9781450306911 https://doi.org/10.1145/1993636.1993638 Google ScholarDigital Library
- Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA ’14). IEEE, Los Alamitos, CA, USA. 558–567. issn:1530-0897 https://doi.org/10.1109/HPCA.2014.6835964 Google ScholarCross Ref
- Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In Proceedings of the 45th International Symposium on Microarchitecture (MICRO-45). IEEE, USA. 258–269. https://doi.org/10.1109/MICRO.2012.32 Google ScholarDigital Library
- Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. 2015. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways? In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA. 1–12. isbn:9781450340342 https://doi.org/10.1145/2830772.2830773 Google ScholarDigital Library
- Javier Picorel, Djordje Jevdjic, and Babak Falsafi. 2017. Near-Memory Address Translation. In Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques (PACT ’17). IEEE Computer Society, Los Alamitos, CA, USA. 303–317. https://doi.org/10.1109/PACT.2017.56 Google ScholarCross Ref
- 2022. Redis Administration. https://redis.io/docs/manual/admin/ Google Scholar
- Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, and Josep Torrellas. 2020. Elastic Cuckoo Page Tables: Rethinking Virtual Memory Translation for Parallelism. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). ACM, New York, NY, USA. 1093–1108. isbn:9781450371025 https://doi.org/10.1145/3373376.3378493 Google ScholarDigital Library
- Alan Jay Smith. 1978. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory. IEEE Transactions on Software Engineering, SE-4, 2 (1978), mar, 121–130. issn:0098-5589 https://doi.org/10.1109/TSE.1978.231482 Google ScholarDigital Library
- 2021. Transparent huge memory pages and Splunk performance. https://docs.splunk.com/Documentation/Splunk/7.3.1/ReleaseNotes/SplunkandTHP Google Scholar
- Jovan Stojkovic, Dimitrios Skarlatos, Apostolos Kokolis, Tianyin Xu, and Josep Torrellas. 2022. Parallel Virtualized Memory Translation with Nested Elastic Cuckoo Page Tables. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). ACM, New York, NY, USA. 84–97. isbn:9781450392051 https://doi.org/10.1145/3503222.3507720 Google ScholarDigital Library
- Mark Swanson, Leigh Stoller, and John Carter. 1998. Increasing TLB Reach Using Superpages Backed by Shadow Memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA ’98). IEEE Computer Society, USA. 204–213. isbn:0818684917 https://doi.org/10.1145/279361.279388 Google ScholarDigital Library
- Michael M. Swift. 2017. Towards O(1) Memory. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS ’17). ACM, New York, NY, USA. 7–11. isbn:9781450350686 https://doi.org/10.1145/3102980.3102982 Google ScholarDigital Library
- Madhusudhan Talluri and Mark D. Hill. 1994. Surpassing the TLB Performance of Superpages with Less Operating System Support. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI). ACM, New York, NY, USA. 171–182. isbn:0897916603 https://doi.org/10.1145/195473.195531 Google ScholarDigital Library
- M. Talluri, M. D. Hill, and Y. A. Khalidi. 1995. A New Page Table for 64-Bit Address Spaces. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP ’95). ACM, New York, NY, USA. 184–200. isbn:0897917154 https://doi.org/10.1145/224056.224071 Google ScholarDigital Library
- Xulong Tang, Ziyu Zhang, Weizheng Xu, Mahmut Taylan Kandemir, Rami Melhem, and Jun Yang. 2020. Enhancing Address Translations in Throughput Processors via Compression. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT ’20). ACM, New York, NY, USA. 191–204. https://doi.org/10.1145/3410463.3414633 Google ScholarDigital Library
- George Taylor, Peter Davies, and Michael Farmwald. 1990. The TLB Slice—a Low-Cost High-Speed Address Translation Mechanism. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA ’90). ACM, New York, NY, USA. 355–363. isbn:0897913663 https://doi.org/10.1145/325164.325161 Google ScholarDigital Library
- Berthold Vöcking. 2003. How Asymmetry Helps Load Balancing. Journal of the ACM, 50, 4 (2003), jul, 568–589. issn:0004-5411 https://doi.org/10.1145/792538.792546 Google ScholarDigital Library
- 2022. VoltDB Administrator’s Guide, S2.3 - Configure Memory Management. https://docs.voltdb.com/AdminGuide/adminmemmgt.php Google Scholar
- W. H. Wang, J.-L. Baer, and H. M. Levy. 1989. Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy. In Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA ’89). ACM, New York, NY, USA. 140–148. isbn:0897913191 https://doi.org/10.1145/74925.74942 Google ScholarDigital Library
- Zi Yan, Daniel Lustig, David Nellans, and Abhishek Bhattacharjee. 2019. Translation Ranger: Operating System Support for Contiguity-Aware TLBs. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). ACM, New York, NY, USA. 698–710. isbn:9781450366694 https://doi.org/10.1145/3307650.3322223 Google ScholarDigital Library
- Idan Yaniv and Dan Tsafrir. 2016. Hash, Don’t Cache (the Page Table). In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science (SIGMETRICS ’16). ACM, New York, NY, USA. 337–350. isbn:9781450342667 https://doi.org/10.1145/2896377.2901456 Google ScholarDigital Library
- Hongil Yoon and Gurindar S. Sohi. 2016. Revisiting virtual L1 caches: A practical design using dynamic synonym remapping. In Proceedings of the 22nd International Symposium on High Performance Computer Architecture (HPCA ’16). IEEE, USA. 212–224. https://doi.org/10.1109/HPCA.2016.7446066 Google ScholarCross Ref
- Lixin Zhang, Evan Speight, Ram Rajamony, and Jiang Lin. 2010. Enigma: Architectural and Operating System Support for Reducing the Impact of Address Translation. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS ’10). ACM, New York, NY, USA. 159–168. isbn:9781450300186 https://doi.org/10.1145/1810085.1810109 Google ScholarDigital Library
- Weixi Zhu, Alan L. Cox, and Scott Rixner. 2020. A Comprehensive Analysis of Superpage Management Mechanisms and Policies. In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (ATC ’20). USENIX Association, USA. Article 57, 14 pages. isbn:978-1-939133-14-4 https://doi.org/10.5555/3489146.3489203 Google ScholarDigital Library
- Sudarsun Kannan and Jaehyun Han. 2023. oscarlab/mosaic-asplos23-artifacts: Mosaic ASPLOS’23 Artifacts. https://doi.org/10.5281/zenodo.7709303 Google ScholarDigital Library
Index Terms
- Mosaic Pages: Big TLB Reach with Small Pages
Recommendations
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating SystemsHeterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18Heterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...
Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitectureAddress translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) ...
Comments