skip to main content
10.1145/3412841.3441990acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

GPUKV: an integrated framework with KVSSD and GPU through P2P communication support

Published: 22 April 2021 Publication History

Abstract

When data is loaded from a key-value store to the GPU in a conventional GPU-driven computing model, it entails the overhead of all the heavy I/O stacks of the key-value store and file system. This paper presents GPUKV, a GPU-driven computing framework that eliminates the aforementioned overhead with less host-side usage of resources such as CPU and memory. GPUKV has the following three features: (i) GPUKV provides a key-value store abstraction to the GPU; (ii) In GPUKV, when loading data from the key-value store to the GPU, it is performed through PCIe peer-to-peer (P2P) communication without copying to the user and kernel space memory; and (iii) GPUKV uses KVSSD, which implements a key-value store inside an SSD, completely eliminating the interaction with the key-value store and file system for P2P communication. We have developed GPUKV with a KVSSD implemented on the Cosmos+ OpenSSD platform in a Linux environment. Our extensive evaluations demonstrate that GPUKV improves execution time by up to 18.7 times and reduces host CPU cycle usage by up to 175 times compared to conventional CPU-based GPU computing models.

References

[1]
Shai Bergman, Tanya Brokhman, Tzachi Cohen, and Mark Silberstein. 2017. SPIN: Seamless Operating System Integration of Peer-to-peer DMA between SSDs and GPUs. In In Proceedings of the USENIX Annual Technical Conference (USENIX ATC '17). 167--179.
[2]
Tanya Brokhman, Pavel Lifshits, and Mark Silberstein. 2019. GAIA: An OS Page Cache for Heterogeneous Systems. In In Proceedings of the USENIX Annual Technical Conference (USENIX ATC '19). 661--674.
[3]
Chanwoo Chung, Jinhyung Koo, Junsu Im, Arvind, and Sungjin Lee. 2019. Light-Store: Software-defined Network-attached Key-value SSD Drives. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). 939--953.
[4]
Prince Hamandawana, Awais Khan, Chang-Gyu Lee, Sungyong Park, and Youngjae Kim. 2020. Crocus: Enabling Computing Resource Orchestration for Inline Cluster-wide Deduplication on Scalable Storage Systems. IEEE Transactions on Parallel & Distributed Systems 31, 08 (August 2020), 1740--1753.
[5]
Anakhi Hazarika, Soumyajit Poddar, and Hafizur Rahaman. 2020. Survey on Memory Management Techniques in Heterogeneous Computing Systems. IET Computers & Digital Techniques 14, 2 (February 2020), 47--60.
[6]
Junsu Im, Jinwook Bae, Changwoo Chung, Avind, and Sungjin Lee. 2020. PinK: High-speed In-storage Key-value Store with Bounded Tails. In In Proceedings of the USENIX Conference on File and Storage Technologies (USENIX FAST '20). 173--187.
[7]
Yanqin Jin, Hung-Wei Tseng, Yannis Papakonstantinou, and Steven Swanson. 2017. KAML: A Flexible, High-performance Key-value SSD. In In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA '17). 373--384.
[8]
Yangwook Kang, Rekha Pitchumani, Pratik Mishra, Yang-suk Kee, Francisco Londono, Sangyoon Oh, Jongyeol Lee, and Daniel DG Lee. 2019. Towards Building A High-performance, Scale-in Key-value Storage System. In Proceedings of the 12th ACM International Conference on Systems and Storage (Systor '19). 144--154.
[9]
Sang-Hoon Kim, Jinhong Kim, Kisik Jeong, and Jin-Soo Kim. 2019. Transaction Support Using Compound Commands in Key-value SSDs. In In Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage '19).
[10]
Gunjae Koo, Kiran Kumar Matam, I Te, HV Krishna Giri Narra, Hung-Wei Li, Jing an Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing Near Storage. In In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '17). 219--231.
[11]
Chang-Gyu Lee, Hyeongu Kang, Donggyu Park, Sungyong Park, Youngjae Kim, Jungki Noh, Woosuk Chung, and Kyoung Park. 2019. iLSM-SSD: An Intelligent LSM-Tree Based Key-value SSD for Data Analytics. In In Proceedings of the IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '19). 384--395.
[12]
Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community Structure in Large Networks: Natural Cluster Sizes and The Absence of Large Well-Defined Clusters. Internet Mathematics 6, 1 (2009), 29--123.
[13]
Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2016. HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics. Proc. VLDB Endow. 9, 14 (2016), 1647--1658.
[14]
Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. In In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '09). 45--55.
[15]
Kiran Kumar Matam, Gunjae Koo, Haipeng Zha, Hung-Wei Tseng, and Murali Annavaram. 2019. GraphSSD: Graph Semantics Aware SSD. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). 116--128.
[16]
Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, and Taesoo Kim. 2018. Solros: A Data-centric Operating System Architecture for Heterogeneous Computing. In Proceedings of the 13th EuroSys Conference (EuroSys '18). 1--15.
[17]
NVidia. 2020. GPUDirect RDMA. https://docs.nvidia.com/cuda/gpudirect-rdma/.
[18]
OpenSSD. 2017. Cosmos Plus OpenSSDPlatform. http://openssd.io/.
[19]
RocksDB. 2020. RocksDB. https://rocksdb.org/.
[20]
Ryan A. Rossi and Nesreen K. Ahmed. 2013. Graph Repository. http://www.graphrepository.com.
[21]
Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph Processing on GPUs: A Survey. ACM Comput. Surv. 50, 6 (January 2018), 1--35.
[22]
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating A File System with GPUs. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). 1--13.
[23]
Hyogi Sim, Youngjae Kim, Sudharshan S Vazhkudai, Devesh Tiwari, Ali Anwar, Ali R Butt, and Lavanya Ramakrishnan. 2015. Analyzethis: An Analysis Workflow-aware Storage System. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.
[24]
Hyogi Sim, Geoffroy Vallee, Youngjae Kim, Sudharshan S Vazhkudai, Devesh Tiwari, and Ali R Butt. 2018. An Analysis Workflow-aware Storage System for Multi-core Active Flash Arrays. IEEE Transactions on Parallel and Distributed Systems 30, 2 (2018), 271--285.
[25]
Bruno Stefanizzi. 2014. DirectGMA on AMD's FIREPRO GPUs. http://developer.amd.com/wordpress/media/2014/09/DirectGMA_Web.pdf.
[26]
Devesh Tiwari, Simona Bobila, Sudharshan Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solin. 2013. Active Flash: Towards Energy-efficient, In-situ Data Analytics on Extreme-scale Machines. In In Proceedings of the USENIX Conference on File and Storage Technologies (USENIX FAST '13). 119--132.
[27]
Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. SIGARCH Comput. Archit. News 44, 3 (June 2016), 53--65.
[28]
Sung-Ming Wu, Kai-Hsiang Lin, and Li-Pin Chang. 2018. KVSSD: Close Integration of LSM Trees and Flash Translation Layer for Write-efficient KV Store. In In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE '18). 563--568.
[29]
Zhen Xu, Xuhao Chen, Jie Shen, Yang Zhang, Cheng Chen, and Canqun Yang. 2019. GARDENIA: A Graph Processing Benchmark Suite for Next-Generation Accelerators. ACM Journal on Emerging Technologies in Computing Systems (JETC) 15, 1 (2019), 1--13.
[30]
Jie Zhang, David Donofrio, John Shalf, Mahmut T. Kandemir, and Myoungsoo Jung. 2015. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures. In In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT '15). 13--24.

Cited By

View all
  • (2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
  • (2023)Catalyst: Optimizing Cache Management for Large In-memory Key-value SystemsProceedings of the VLDB Endowment10.14778/3625054.362506816:13(4339-4352)Online publication date: 1-Sep-2023
  • (2022)Survey on storage-accelerator data movementCCF Transactions on High Performance Computing10.1007/s42514-022-00112-0Online publication date: 21-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
March 2021
2075 pages
ISBN:9781450381048
DOI:10.1145/3412841
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU
  2. key-value SSD
  3. peer-to-peer communication

Qualifiers

  • Research-article

Funding Sources

  • SK hynix

Conference

SAC '21
Sponsor:
SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing
March 22 - 26, 2021
Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
  • (2023)Catalyst: Optimizing Cache Management for Large In-memory Key-value SystemsProceedings of the VLDB Endowment10.14778/3625054.362506816:13(4339-4352)Online publication date: 1-Sep-2023
  • (2022)Survey on storage-accelerator data movementCCF Transactions on High Performance Computing10.1007/s42514-022-00112-0Online publication date: 21-Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media