research-article

GPUKV: an integrated framework with KVSSD and GPU through P2P communication support

Authors:

Youngjae KimAuthors Info & Claims

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

Pages 1156 - 1164

https://doi.org/10.1145/3412841.3441990

Published: 22 April 2021 Publication History

Abstract

When data is loaded from a key-value store to the GPU in a conventional GPU-driven computing model, it entails the overhead of all the heavy I/O stacks of the key-value store and file system. This paper presents GPUKV, a GPU-driven computing framework that eliminates the aforementioned overhead with less host-side usage of resources such as CPU and memory. GPUKV has the following three features: (i) GPUKV provides a key-value store abstraction to the GPU; (ii) In GPUKV, when loading data from the key-value store to the GPU, it is performed through PCIe peer-to-peer (P2P) communication without copying to the user and kernel space memory; and (iii) GPUKV uses KVSSD, which implements a key-value store inside an SSD, completely eliminating the interaction with the key-value store and file system for P2P communication. We have developed GPUKV with a KVSSD implemented on the Cosmos+ OpenSSD platform in a Linux environment. Our extensive evaluations demonstrate that GPUKV improves execution time by up to 18.7 times and reduces host CPU cycle usage by up to 175 times compared to conventional CPU-based GPU computing models.

References

[1]

Shai Bergman, Tanya Brokhman, Tzachi Cohen, and Mark Silberstein. 2017. SPIN: Seamless Operating System Integration of Peer-to-peer DMA between SSDs and GPUs. In In Proceedings of the USENIX Annual Technical Conference (USENIX ATC '17). 167--179.

[2]

Tanya Brokhman, Pavel Lifshits, and Mark Silberstein. 2019. GAIA: An OS Page Cache for Heterogeneous Systems. In In Proceedings of the USENIX Annual Technical Conference (USENIX ATC '19). 661--674.

[3]

Chanwoo Chung, Jinhyung Koo, Junsu Im, Arvind, and Sungjin Lee. 2019. Light-Store: Software-defined Network-attached Key-value SSD Drives. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). 939--953.

[4]

Prince Hamandawana, Awais Khan, Chang-Gyu Lee, Sungyong Park, and Youngjae Kim. 2020. Crocus: Enabling Computing Resource Orchestration for Inline Cluster-wide Deduplication on Scalable Storage Systems. IEEE Transactions on Parallel & Distributed Systems 31, 08 (August 2020), 1740--1753.

Digital Library

[5]

Anakhi Hazarika, Soumyajit Poddar, and Hafizur Rahaman. 2020. Survey on Memory Management Techniques in Heterogeneous Computing Systems. IET Computers & Digital Techniques 14, 2 (February 2020), 47--60.

[6]

Junsu Im, Jinwook Bae, Changwoo Chung, Avind, and Sungjin Lee. 2020. PinK: High-speed In-storage Key-value Store with Bounded Tails. In In Proceedings of the USENIX Conference on File and Storage Technologies (USENIX FAST '20). 173--187.

[7]

Yanqin Jin, Hung-Wei Tseng, Yannis Papakonstantinou, and Steven Swanson. 2017. KAML: A Flexible, High-performance Key-value SSD. In In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA '17). 373--384.

[8]

Yangwook Kang, Rekha Pitchumani, Pratik Mishra, Yang-suk Kee, Francisco Londono, Sangyoon Oh, Jongyeol Lee, and Daniel DG Lee. 2019. Towards Building A High-performance, Scale-in Key-value Storage System. In Proceedings of the 12th ACM International Conference on Systems and Storage (Systor '19). 144--154.

Digital Library

[9]

Sang-Hoon Kim, Jinhong Kim, Kisik Jeong, and Jin-Soo Kim. 2019. Transaction Support Using Compound Commands in Key-value SSDs. In In Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage '19).

[10]

Gunjae Koo, Kiran Kumar Matam, I Te, HV Krishna Giri Narra, Hung-Wei Li, Jing an Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing Near Storage. In In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '17). 219--231.

Digital Library

[11]

Chang-Gyu Lee, Hyeongu Kang, Donggyu Park, Sungyong Park, Youngjae Kim, Jungki Noh, Woosuk Chung, and Kyoung Park. 2019. iLSM-SSD: An Intelligent LSM-Tree Based Key-value SSD for Data Analytics. In In Proceedings of the IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '19). 384--395.

[12]

Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community Structure in Large Networks: Natural Cluster Sizes and The Absence of Large Well-Defined Clusters. Internet Mathematics 6, 1 (2009), 29--123.

[13]

Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2016. HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics. Proc. VLDB Endow. 9, 14 (2016), 1647--1658.

Digital Library

[14]

Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. In In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '09). 45--55.

Digital Library

[15]

Kiran Kumar Matam, Gunjae Koo, Haipeng Zha, Hung-Wei Tseng, and Murali Annavaram. 2019. GraphSSD: Graph Semantics Aware SSD. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). 116--128.

Digital Library

[16]

Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, and Taesoo Kim. 2018. Solros: A Data-centric Operating System Architecture for Heterogeneous Computing. In Proceedings of the 13th EuroSys Conference (EuroSys '18). 1--15.

Digital Library

[17]

NVidia. 2020. GPUDirect RDMA. https://docs.nvidia.com/cuda/gpudirect-rdma/.

[18]

OpenSSD. 2017. Cosmos Plus OpenSSDPlatform. http://openssd.io/.

[19]

RocksDB. 2020. RocksDB. https://rocksdb.org/.

[20]

Ryan A. Rossi and Nesreen K. Ahmed. 2013. Graph Repository. http://www.graphrepository.com.

[21]

Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph Processing on GPUs: A Survey. ACM Comput. Surv. 50, 6 (January 2018), 1--35.

Digital Library

[22]

Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating A File System with GPUs. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). 1--13.

Digital Library

[23]

Hyogi Sim, Youngjae Kim, Sudharshan S Vazhkudai, Devesh Tiwari, Ali Anwar, Ali R Butt, and Lavanya Ramakrishnan. 2015. Analyzethis: An Analysis Workflow-aware Storage System. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.

[24]

Hyogi Sim, Geoffroy Vallee, Youngjae Kim, Sudharshan S Vazhkudai, Devesh Tiwari, and Ali R Butt. 2018. An Analysis Workflow-aware Storage System for Multi-core Active Flash Arrays. IEEE Transactions on Parallel and Distributed Systems 30, 2 (2018), 271--285.

Digital Library

[25]

Bruno Stefanizzi. 2014. DirectGMA on AMD's FIREPRO GPUs. http://developer.amd.com/wordpress/media/2014/09/DirectGMA_Web.pdf.

[26]

Devesh Tiwari, Simona Bobila, Sudharshan Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, and Yan Solin. 2013. Active Flash: Towards Energy-efficient, In-situ Data Analytics on Extreme-scale Machines. In In Proceedings of the USENIX Conference on File and Storage Technologies (USENIX FAST '13). 119--132.

[27]

Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. SIGARCH Comput. Archit. News 44, 3 (June 2016), 53--65.

Digital Library

[28]

Sung-Ming Wu, Kai-Hsiang Lin, and Li-Pin Chang. 2018. KVSSD: Close Integration of LSM Trees and Flash Translation Layer for Write-efficient KV Store. In In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE '18). 563--568.

[29]

Zhen Xu, Xuhao Chen, Jie Shen, Yang Zhang, Cheng Chen, and Canqun Yang. 2019. GARDENIA: A Graph Processing Benchmark Suite for Next-Generation Accelerators. ACM Journal on Emerging Technologies in Computing Systems (JETC) 15, 1 (2019), 1--13.

Digital Library

[30]

Jie Zhang, David Donofrio, John Shalf, Mahmut T. Kandemir, and Myoungsoo Jung. 2015. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures. In In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT '15). 13--24.

Cited By

Zhang XBhimani JPei SLee ELee SSeong YKim EChoi CNam EChoi JKim B(2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
https://dl.acm.org/doi/10.1145/3708992
Wang KChen F(2023)Catalyst: Optimizing Cache Management for Large In-memory Key-value SystemsProceedings of the VLDB Endowment10.14778/3625054.362506816:13(4339-4352)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.14778/3625054.3625068
Zhou ZYi SZhang J(2022)Survey on storage-accelerator data movementCCF Transactions on High Performance Computing10.1007/s42514-022-00112-0Online publication date: 21-Jul-2022
https://doi.org/10.1007/s42514-022-00112-0

Index Terms

GPUKV: an integrated framework with KVSSD and GPU through P2P communication support
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Data flow architectures
  2. Real-time systems
    1. Real-time operating systems
2. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors

Recommendations

Design of LSM-tree-based Key-value SSDs with Bounded Tails
Key-value store based on a log-structured merge-tree (LSM-tree) is preferable to hash-based key-value store, because an LSM-tree can support a wider variety of operations and show better performance, especially for writes. However, LSM-tree is difficult ...
GHOST: GPGPU-offloaded high performance storage I/O deduplication for primary storage system
PMAM '12: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores

Data deduplication has been an effective way to eliminate redundant data mainly for backup storage systems. Since the recent primary storage systems in cloud services are expected to have the redundancy, the deduplication technique can also bring ...
RHIK: Re-configurable Hash-based Indexing for KVSSD
HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

Key-Value Solid State Drive (KV-SSD), a key addressable SSD technology, promises to simplify storage management for unstructured data and improve system performance with minimal host-side intervention. However, we find that the current state-of-the-art ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

March 2021

2075 pages

ISBN:9781450381048

DOI:10.1145/3412841

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Jiman Hong
Soongsil University, South Korea
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Eunjee Song
Baylor University

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

SK hynix

Conference

SAC '21

Sponsor:

SIGAPP

SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing

March 22 - 26, 2021

Virtual Event, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
225
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XBhimani JPei SLee ELee SSeong YKim EChoi CNam EChoi JKim B(2024)Storage Abstractions for SSDs: The Past, Present, and FutureACM Transactions on Storage10.1145/370899221:1(1-44)Online publication date: 30-Dec-2024
https://dl.acm.org/doi/10.1145/3708992
Wang KChen F(2023)Catalyst: Optimizing Cache Management for Large In-memory Key-value SystemsProceedings of the VLDB Endowment10.14778/3625054.362506816:13(4339-4352)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.14778/3625054.3625068
Zhou ZYi SZhang J(2022)Survey on storage-accelerator data movementCCF Transactions on High Performance Computing10.1007/s42514-022-00112-0Online publication date: 21-Jul-2022
https://doi.org/10.1007/s42514-022-00112-0

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten