skip to main content
10.1145/3655038.3665944acmconferencesArticle/Chapter ViewAbstractPublication PageshotstorageConference Proceedingsconference-collections
research-article
Open access

Improving Virtualized I/O Performance by Expanding the Polled I/O Path of Linux

Published: 08 July 2024 Publication History

Abstract

The continuing advancement of storage technology has introduced ultra-low latency (ULL) SSDs that feature 20 μs or less access latency. Therefore, the context switching overhead of interrupts has become more pronounced on these SSDs, prompting consideration of polling as an alternative to mitigate this overhead. At the same time, the high price of ULL SSDs is a major issue preventing the wide adoption of polling.
We claim that virtualized systems can benefit from polling even without ULL SSDs. Since the host page cache is located in the DRAM main memory, it can deliver even higher throughput than ULL SSDs. However, the guest operating system in virtualized environments cannot use polled I/Os when accessing the host page cache, failing to exploit the performance advantage of DRAM. To resolve this inefficiency, we propose to expand the polled I/O path of the Linux kernel I/O stack. Our approach allows guest applications to use I/O polling for buffered I/Os and memory mapped I/Os. The expanded I/O path can significantly improve the I/O performance of virtualized systems without modifying the guest application or the backend of the virtual block device. Our proposed buffered I/O path with polling improves the 4 KB random read throughput between guest applications and the host page cache by 3.2×.

References

[1]
2021. Intel Optane SSD 900P Series Product Brief. https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/optane-ssd-900p-brief.pdf.
[2]
2021. Ultra-Low Latency with Samsung Z-NAND SSD. https://www.samsung.com/semiconductor/global.semi.static/Ultra-Low_Latency_with_Samsung_Z-NAND_SSD-0.pdf.
[3]
2021. VirtualBox Source Code Repository. https://www.virtualbox.org/browser/vbox/trunk.
[4]
Ameen Akel, Adrian M. Caulfield, Todor I. Mollov, Rajesh K. Gupta, and Steven Swanson. 2011. Onyx: A Prototype Phase Change Memory Storage Array. In Proceedings of the 3rd Workshop on Hot Topics in Storage and File Systems (HotStorage 11).
[5]
Alibaba. 2024. Alibaba Virtualization Disk Formats. https://www.alibabacloud.com/help/en/ecs/user-guide/common-image-formats Accessed: 2024-05-17.
[6]
Jens Axboe. 2021. blktrace(8) - Linux man page. https://linux.die.net/man/8/blktrace.
[7]
Jens Axboe. 2021. Faster IO through io_uring. https://kernel-recipes.org/en/2019/talks/faster- io- through- io_uring/.
[8]
Jens Axboe. 2021. Flexible I/O Tester. https://github.com/axboe/fio.
[9]
Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta, and Steven Swanson. 2010. Moneta: A High-performance Storage Array Architecture for Next-Generation, Non-volatile Memories. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10]
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal). Article 42, 13 pages.
[11]
WU Fengguang, XI Hongsheng, and XU Chenfeng. 2008. On the Design of a New Linux Readahead Framework. SIGOPS Oper. Syst. Rev. 42, 5 (July 2008), 75--84.
[12]
Stefan Hajnoczi. 2020. Optimizing for NVMe Drives: The 10 Microsecond Challenge. https://vmsplice.net/~stefan/stefanha-kvm-forum-2020.pdf.
[13]
Alex Handy. 2021. Linux 4.10 arrives. https://sdtimes.com/coding/linux-4-10-arrives/.
[14]
Asias He. 2012. Virtio-blk Performance Improvement. https://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf.
[15]
Christoph Hellwig and Ming Lei. 2022. [v2] block: ignore RWF_HIPRI hint for sync dio. https://patchwork.kernel.org/project/linux-mm/patch/[email protected]/#24824449.
[16]
Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, and Youjip Won. 2013. I/O Stack Optimization for Smartphones. In Proceedings of the 2013 USENIX Annual Technical Conference (USENIX ATC 13). San Jose, CA, 309--320. https://www.usenix.org/conference/atc13/technical-sessions/presentation/jeong
[17]
Yongsoo Joo, Junhee Ryu, Sangsoo Park, and Kang G. Shin. 2011. FAST: Quick Application Launch on Solid-State Drives. In Proceeding of the 9th USENIX Conference on File and Storage Technologies (FAST 12) (San Jose, CA, USA). 259--272.
[18]
Hyeong-Jun Kim, Young-Sik Lee, and Jin-Soo Kim. 2016. NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs. In 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16). Denver, CO. https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/kim
[19]
Sewoog Kim, Heekwon Park, and Jongmoo Choi. 2021. Direct-Virtio: A New Direct Virtualized I/O Framework for NVMe SSDs. Electronics 10, 17 (2021). https://doi.org/10.3390/electronics10172058
[20]
Sungjoon Koh, Junhyeok Jang, Changrim Lee, Miryeong Kwon, Jie Zhang, and Myoungsoo Jung. 2019. Faster than flash: An in-depth study of system challenges for emerging ultra-low latency SSDs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 216--227.
[21]
Duy Le, Hai Huang, and Haining Wang. 2012. Understanding Performance Implications of Nested File Systems in a Virtualized Environment. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST 12). San Jose, CA. https://www.usenix.org/conference/fast12/understanding-performance-implications-nested-file-systems-virtualized-environment
[22]
Gyusun Lee, Seokha Shin, and Jinkyu Jeong. 2022. Efficient hybrid polling for ultra-low latency storage devices. Journal of Systems Architecture 122 (2022), 102--338.
[23]
Gyusun Lee, Seokha Shin, Wonsuk Song, Tae Jun Ham, Jae W. Lee, and Jinkyu Jeong. 2019. Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19). Renton, WA, 603--616. https://www.usenix.org/conference/atc19/presentation/lee-gyusun
[24]
Jinhong Li, Qiuping Wang, Patrick PC Lee, and Chao Shi. 2020. An in-depth analysis of cloud block storage workloads in large-scale production. In 2020 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 37--47.
[25]
Tao Lu, Ping Huang, Xubin He, and Ming Zhang. 2016. Understanding the Impact of Cache Locations on Storage Performance and Energy Consumption of Virtualization Systems. In Proceedings of the USENIX Workshop on Cool Topics on Sustainable Data Centers (CoolDC 16). Santa Clara, CA. https://www.usenix.org/conference/cooldc16/workshop-program/presentation/lu
[26]
Tao Lu, Ping Huang, Morgan Stuart, Yuhua Guo, Xubin He, and Ming Zhang. 2016. Successor: Proactive cache warm-up of destination hosts in virtual machine migration contexts. In Proceedings of the 35th Annual IEEE International Conference on Computer Communications (INFOCOM). 1--9. https://doi.org/10.1109/INFOCOM.2016.7524537
[27]
Rui Miao, Lingjun Zhu, Shu Ma, Kun Qian, Shujun Zhuang, Bo Li, Shuguang Cheng, Jiaqi Gao, Yan Zhuang, Pengcheng Zhang, et al. 2022. From luna to solar: the evolutions of the compute-to-storage networks in alibaba cloud. In Proceedings of the ACM SIGCOMM 2022 Conference. 753--766.
[28]
Damien Le Moal. 2017. I/O Latency Optimization with Polling. In Proc. 2017 Linux Storage and Filesystems Conference (VAULT).
[29]
Prateek Sharma, Purushottam Kulkarni, and Prashant Shenoy. 2016. Per-VM page cache partitioning for cloud computing platforms. In Proceedings of the 2016 8th International Conference on Communication Systems and Networks (COMSNETS). 1--8. https://doi.org/10.1109/COMSNETS.2016.7439971
[30]
Woong Shin, Qichen Chen, Myoungwon Oh, Hyeonsang Eom, and Heon Y. Yeom. 2014. OS I/O Path Optimizations for Flash Solid-state Drives. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14). Philadelphia, PA, 483--488. https://www.usenix.org/conference/atc14/technical-sessions/presentation/shin
[31]
Yongju Song and Young Ik Eom. 2019. HyPI: Reducing CPU Consumption of the I/O Completion Method in High-Performance Storage Systems. In Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication (IMCOM). 646--653.
[32]
Billy Tallis. 2021. How We Test PCIe 4.0 Storage: The AnandTech 2021 SSD Benchmark Suite. https://www.anandtech.com/show/16458/2021-ssd-benchmark-suite/4.
[33]
Jörg Thalheim, Harshavardhan Unnibhavi, Christian Priebe, Pramod Bhatotia, and Peter Pietzuch. 2021. Rkt-Io: A Direct I/O Stack for Shielded Execution. In Proceedings of the Sixteenth European Conference on Computer Systems (EuroSys) (Online Event, United Kingdom). 490--506.
[34]
Carl A. Waldspurger. 2002. Memory Resource Management in VMware ESX Server. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 02). Boston, MA. https://www.usenix.org/conference/osdi-02/memory-resource-management-vmware-esx-server
[35]
Qiuping Wang, Jinhong Li, Patrick PC Lee, Tao Ouyang, Chao Shi, and Lilong Huang. 2022. Separating data via block invalidation time inference for write amplification reduction in Log-Structured storage. In 20th USENIX Conference on File and Storage Technologies (FAST 22). 429--444.
[36]
Jisoo Yang, Dave B. Minturn, and Frank Hady. 2012. When Poll Is Better than Interrupt. In Proceeding of the 10th USENIX Conference on File and Storage Technologies (FAST 12).
[37]
Ziye Yang, James R. Harris, Benjamin Walker, Daniel Verkamp, Chang-peng Liu, Cunyin Chang, Gang Cao, Jonathan Stern, Vishal Verma, and Luse E. Paul. 2017. SPDK: A Development Kit to Build High Performance Storage Applications. In Proceedings of the 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). 154--161.
[38]
Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Changlim Lee, Mohammad Alian, Myoungjun Chun, Mahmut Taylan Kandemir, Nam Sung Kim, Jihong Kim, and Myoungsoo Jung. 2018. FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Carlsbad, CA, 477--492. https://www.usenix.org/conference/osdi18/presentation/zhang
[39]
Zhe Zhang, Han Chen, and Hui Lei. 2012. Small Is Big: Functionally Partitioned File Caching in Virtualized Environments. In Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 12). Boston, MA. https://www.usenix.org/conference/hotcloud12/workshop-program/presentation/zhang_zhe

Cited By

View all
  • (2024)DeLiBA-K: Speeding-up Hardware-Accelerated Distributed Storage Access by Tighter Linux Kernel Integration and Use of a Modern APIProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00075(531-544)Online publication date: 17-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HotStorage '24: Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems
July 2024
141 pages
ISBN:9798400706301
DOI:10.1145/3655038
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2024

Check for updates

Author Tags

  1. Host page caching
  2. Polled I/Os
  3. Storage virtualization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

HOTSTORAGE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 34 of 87 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)369
  • Downloads (Last 6 weeks)80
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DeLiBA-K: Speeding-up Hardware-Accelerated Distributed Storage Access by Tighter Linux Kernel Integration and Use of a Modern APIProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00075(531-544)Online publication date: 17-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media