skip to main content
10.1145/3477314.3507110acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Effective TLB thrashing: unveiling the true short reach of modern TLB designs

Published:06 May 2022Publication History

ABSTRACT

The Memory Management Unit (MMU) in modern processors now includes a Translation Lookaside Buffer (TLB) that caches recently-used Page-Table Entries (PTEs), and prevents carrying out redundant page-table walks during the address translation process. Here, the amount of memory that the TLB can readily translate is commonly known as its reach. Yet, the TLB size, and thus its reach, is limited because the TLB must render a low operation latency due to its place on the critical path to access the cache memory.

While extensive research work has been devoted into lessening TLB pressure, it has been generally presumed that the TLB reach is strictly defined by the number of TLB entries, as if it were a fully-associative cache structure. However, in this work we demonstrate that the amount of TLB entries only outlines a theoretical upper bound on the TLB reach, and we reveal how the TLB's true indexing circuitry can reduce the actual TLB reach by 256-KB in some Intel processors when compared to its PTE storing capacity.

Moreover, recent security-related work has proven how adversaries can implement PTE-based cache side-channel attacks by repeatedly forcing the MMU to perform spurious page-table walks, which can be accomplished by passing the TLB reach over and over. In Intel's Skylake for example, the TLB can host up to 1600 PTEs, giving it a reach of 6.25-MB when using 4-KB pages. Yet, we propose a target-relative TLB eviction strategy that only loads 84 handpicked PTEs into the TLB to evict a target PTE, thus letting an adversary artificially diminish the TLB reach to only 344-KB.

References

  1. [n. d.]. https://www.7-cpu.com/cpu/Skylake.htmlGoogle ScholarGoogle Scholar
  2. Thomas W Barr, Alan L Cox, and Scott Rixner. 2011. SpecTLB: a mechanism for speculative address translation. ACM SIGARCH Computer Architecture News 39, 3 (2011), 307--318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D Hill, and Michael M Swift. 2013. Efficient virtual memory for big memory servers. ACM SIGARCH Computer Architecture News 41, 3 (2013), 237--248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Abhishek Bhattacharjee. 2013. Large-reach memory management unit caches. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 383--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Abhishek Bhattacharjee, Daniel Lustig, and Margaret Martonosi. 2011. Shared last-level TLBs for chip multiprocessors. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. IEEE, 62--63.Google ScholarGoogle ScholarCross RefCross Ref
  6. Abhishek Bhattacharjee and Margaret Martonosi. 2010. Inter-core cooperative TLB for chip multiprocessors. ACM Sigplan Notices 45, 3 (2010), 359--370.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J Bradley Chen, Anita Borg, and Norman P Jouppi. 1992. A simulation based study of TLB performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture. 114--123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Intel®Corporation. 2019. Intel®64 and IA-32 architectures software developer's manual combined volumes 2A, 2B, 2C, and 2D: Instruction set reference, A-Z. https://software.intel.com/en-us/articles/intel-sdm.Google ScholarGoogle Scholar
  9. Guilherme Cox and Abhishek Bhattacharjee. 2017. Efficient address translation for architectures with multiple page sizes. ACM SIGPLAN Notices 52, 4 (2017), 435--448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Craig Disselkoen, David Kohlbrenner, Leo Porter, and Dean Tullsen. 2017. Prime+ abort: A timer-free high-precision l3 cache attack using intel {TSX}. In 26th {USENIX} Security Symposium ({USENIX} Security 17). 51--67.Google ScholarGoogle Scholar
  11. Jayneel Gandhi, Arkaprava Basu, Mark D Hill, and Michael M Swift. 2014. BadgerTrap: A tool to instrument x86-64 TLB misses. ACM SIGARCH Computer Architecture News 42, 2 (2014), 20--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2018. Translation leak-aside buffer: Defeating cache side-channel protections with {TLB} attacks. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 955--972.Google ScholarGoogle Scholar
  13. Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, and Cristiano Giuffrida. 2017. ASLR on the Line: Practical Cache Attacks on the MMU.. In NDSS, Vol. 17. 26.Google ScholarGoogle Scholar
  14. Intel. 2021. Performance Monitoring Events. https://perfmon-events.intel.com/Google ScholarGoogle Scholar
  15. Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D Hill, Kathryn S McKinley, Mario Nemirovsky, Michael M Swift, and Osman Ünsal. 2015. Redundant memory mappings for fast access to large memories. ACM SIGARCH Computer Architecture News 43, 3S (2015), 66--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vasileios Karakostas, Osman S Unsal, Mario Nemirovsky, Adrian Cristal, and Michael Swift. 2014. Performance analysis of the memory management unit under scale-out workloads. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  17. Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J Rossbach, and Emmett Witchel. 2016. Coordinated and efficient huge page management with ingens. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 705--721.Google ScholarGoogle Scholar
  18. Martin Maas, David G Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S McKinley, and Colin Raffel. 2020. Learning-based memory allocation for C++ server workloads. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 541--556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Juan Navarro, Sitararn Iyer, Peter Druschel, and Alan Cox. 2002. Practical, transparent operating system support for superpages. ACM SIGOPS Operating Systems Review 36, SI (2002), 89--104.Google ScholarGoogle Scholar
  20. Dag Arne Osvik, Adi Shamir, and Eran Tromer. 2006. Cache attacks and countermeasures: the case of AES. In CryptographersâĂŹ track at the RSA conference. Springer, 1--20.Google ScholarGoogle Scholar
  21. Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 558--567.Google ScholarGoogle ScholarCross RefCross Ref
  22. Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. Colt: Coalesced large-reach tlbs. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 258--269.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Madhusudhan Talluri and Mark D Hill. 1994. Surpassing the TLB performance of superpages with less operating system support. ACM SIGPLAN Notices 29, 11 (1994), 171--182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Stephan Van Schaik, Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi. 2018. Malicious management unit: Why stopping cache attacks in software is harder than you think. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 937--954.Google ScholarGoogle Scholar
  25. Stephan Van Schaik, Kaveh Razavi, Ben Gras, Herbert Bos, and Cristiano Giuffrida. 2017. RevAnC: A framework for reverse engineering hardware page table caches. In Proceedings of the 10th European Workshop on Systems Security. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Effective TLB thrashing: unveiling the true short reach of modern TLB designs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
        April 2022
        2099 pages
        ISBN:9781450387132
        DOI:10.1145/3477314

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 May 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,650of6,669submissions,25%
      • Article Metrics

        • Downloads (Last 12 months)48
        • Downloads (Last 6 weeks)6

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader