Abstract
Modern applications often require a large amount of memory. Conventional 4KB pages lead to large page tables and thus exert high pressure on TLB address translations. This pressure is more prominent in a virtualized system, which adds an additional layer of address translation. Page walks due to TLB misses can result in a significant performance overhead. One effort in reducing this overhead is to use hugepage. Linux kernel has supported transparent hugepage since 2.6.38, which provides an alternate large page size. Generally, hugepage demonstrates better performance on address translations and page table modifications. This paper first analyzes the impact of hugepage on native system, and then, compares the impact of hugepage on different memory virtualization approaches: hardware-assisted paging (HAP), shadow paging, and para-virtualization. We observe that the current implementation of transparent hugepage is inefficient. It cannot exploit the full performance advantage of hugepages. Worse yet, the conservative strategy of transparent hugepage may conflict with existing OS functions, which can lead to performance degradation. So, we propose a new memory allocation strategy, alignment-based hugepage (ABH) that promotes hugepage allocations. We apply ABH to different paging modes in virtualized systems. The results show that the new allocation strategy can significantly reduce TLB misses and up to 90% page walk cycles due to TLB misses and thus improve the performance in real world applications.
创新点
当前环境下, 很多应用需要的内存越来越大。传统的4KB页会导致地址转换开销过大的问题。在虚拟化环境下, 因为需要增加一层额外的地址转化, 这个问题更为明显。一种减少地址转化开销的方法是使用大页。一般来说, 大页相对于普通4K页, 在访问页表和处理缺页中断上有更好的性能。Linux内核自2.6.38开始支持透明大页, 可以在不影响用户程序的前提下, 为程序分配大页, 提升性能。但是透明大页存在缺陷, 使用大页有额外的对齐要求, 当前的实现无法满足。
本文首先分析了Linux和虚拟化环境下内存的性能, 以及透明大页的效果; 发现因为地址对齐的限制, 透明大页在很多情况下, 使用效率不足25%; 提出一种基于对齐的内存管理方案, 提升大页使用比例, 并提升程序性能。
在Linux和几种虚拟化环境下, 评估了新的内存管理方案。实验结果表明: 新的方案, 最多可以减少90%页表访问的开销; 在虚拟化环境中, KVM的影子页表模式有最好的性能; XEN的影子页表模式目前无法使用大页, 但可以通过支持大页获得更好的性能。
Similar content being viewed by others
References
Henning J L. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput Architect News, 2006, 34: 1–17
Bienia C, Kumar S, Singh J P, et al. The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. New York: ACM, 2008. 72–81
Bhargava R, Serebrin B, Spadini F, et al. Accelerating two-dimensional page walks for virtualized systems. ACM SIGOPS Oper Syst Rev, 2008, 42: 26–35
Luo T W, Wang X L, Hu J Y, et al. Improving TLB performance by increasing hugepage ratio. In: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). Washington, DC: IEEE, 2015
Ganapathy N, Schimmel C. General purpose operating system support for multiple page sizes. In: Proceedings of USENIX Annual Technical Conference. Berkeley: USENIX Association Berkeley, 1998. 8
Navarro J, Iyer S, Druschel P, et al. Practical, transparent operating system support for superpages. ACM SIGOPS Oper Syst Rev, 2002, 36: 89–104
Lu H J, Seth R, Doshi K, et al. Using hugetlbfs for mapping application text regions. In: Proceedings of the Linux Symposium, Ottawa, 2006. 2: 75–82
Romer T H, Ohlrich W H, Karlin A R, et al. Reducing tlb and memory overhead using online superpage promotion. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture. New York: ACM, 1995. 176–187
Du Y, Zhou M, Childers B R, et al. Supporting superpages in non-contiguous physical memory. In: Proceedings of IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, 2015. 223–234
Swanson M, Stoller L, Carter J. Increasing TLB reach using super backed by shadow memory. ACM SIGARCH Comput Architect News, 1998, 26: 204–213
Talluri M, Hill M D. Surpassing the TLB performance of super with less operating system support. ACM SIGPLAN Notices, 1994, 29: 171–182
Bhattacharjee A. Large-reach memory management unit caches. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. New York: ACM, 2013. 383–394
Bhattacharjee A, Lustig D, Martonosi M. Shared last-level tlbs for chip multiprocessors. In: Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). Washington, DC: IEEE, 2011. 62–63
Lustig D, Bhattacharjee A, Martonosi M. TLB improvements for chip multiprocessors: inter-core cooperative prefetchers and shared last-level TLBs. ACM Trans Architect Code Optim, 2013, 10: 2
Srikantaiah S, Kandemir M. Synergistic tlbs for high performance address translation in chip multiprocessors. In: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. Washington, DC: IEEE, 2010. 313–324
Barr T W, Cox A L, Rixner S. Translation caching: skip, don’t walk (the page table). ACM SIGARCH Comput Architect News, 2010, 38: 48–59
Barr T W, Cox A L, Rixner S. SpecTLB: a mechanism for speculative address translation. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). New York: ACM, 2011. 307–317
Papadopoulou M-M, Tong X, Seznec A, et al. Prediction-based superpage-friendly TLB designs. In: Proceedings of IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), Burlingame, 2015. 210–222
Basu A, Gandhi J, Chang J C, et al. Efficient virtual memory for big memory servers. ACM SIGARCH Comput Architect News, 2013, 41: 237–248
Karakostas V, Gandhi J, Ayar F, et al. Redundant memory mappings for fast access to large memories. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture. New York: ACM, 2015. 66–78
Fang Z, Zhang L X, Carter J B, et al. Reevaluating online superpage promotion with hardware support. In: Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). Washington, DC: IEEE, 2001. 63–72
Saulsbury A, Dahlgren F, Stenström P. Recency-based TLB preloading. ACM SIGARCH Comput Architect News, 2000, 28: 117–127
Kandiraju G B, Sivasubramaniam A. Going the distance for TLB prefetching: an application-driven study. ACM SIGARCH Comput Architect News, 2002, 30: 195–206
Bhattacharjee A, Martonosi M. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). Washington, DC: IEEE, 2009. 29–40
Bhattacharjee A, Martonosi M. Inter-core cooperative TLB for chip multiprocessors. ACM SIGARCH Comput Architect News, 2010, 38: 359–370
Adams K, Agesen O. A comparison of software and hardware techniques for x86 virtualization. ACM SIGPLAN Notices, 2006, 41: 2–13
Bhatia N. Performance evaluation of Intel EPT hardware assist. VMware, Inc, 2009. http://www.vmware.com/techpapers/2009/performance-evaluation-of-intel-ept-hardware-assis-1000.html
Buell J, Hecht D, Heo J, et al. Methodology for performance analysis of VMware vSphere under Tier-1 applications. VMware Technical J, 2013. 19
Ahn J, Jin S, Huh J. Revisiting hardware-assisted page walks for virtualized systems. ACM SIGARCH Comput Architect News, 2012, 40: 476–487
Gandhi J, Basu A, Hill M D, et al. Efficient memory virtualization. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Washington, DC: IEEE, 2014. 178–189
Gadre A S, Kabra K, Vasani A, et al. X-xen: huge page support in xen. In: Proceedings of the Linux Symposium, Ottawa, 2011. 7
Pham B, Vesely J, Loh G H, et al. Using TLB Speculation to Overcome Page Splintering in Virtual Machines. Technical Report DCS-TR-7132015. Rutgers University, 2015
Wang X L, Zang J R, Wang Z L, et al. Selective hardware/software memory virtualization. ACM SIGPLAN Notices, 2011, 46: 217–226
Wang X L, Weng L M, Wang Z L, et al. Revisiting memory management on virtualized environments. ACM Trans Architect Optim, 2013, 10: 48
Chang X T, Franke H, Ge Y, et al. Improving virtualization in the presence of software managed translation lookaside buffers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture. New York: ACM, 2013. 120–129
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, X., Luo, T., Hu, J. et al. Evaluating the impacts of hugepage on virtual machines. Sci. China Inf. Sci. 60, 012103 (2017). https://doi.org/10.1007/s11432-015-0764-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-015-0764-7