Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources

Zhang, Kai; Chen, Feng; Ding, Xiaoning; Huai, Yin; Lee, Rubao; Luo, Tian; Wang, Kaibo; Yuan, Yuan; Zhang, Xiaodong

doi:10.1007/s11390-015-1553-y

Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources

Regular Paper
Published: 08 July 2015

Volume 30, pages 657–678, (2015)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Kai Zhang^1,2,
Feng Chen³,
Xiaoning Ding⁴,
Yin Huai⁵,
Rubao Lee²,
Tian Luo⁶,
Kaibo Wang²,
Yuan Yuan² &
…
Xiaodong Zhang²

223 Accesses
7 Citations
Explore all metrics

Abstract

With recent advancement on hardware technologies, new general-purpose high-performance devices have been widely adopted, such as the graphics processing unit (GPU) and solid state drive (SSD). GPU may offer an order of higher throughput for applications with massive data parallelism, compared with the multicore CPU. Moreover, new storage device SSD is also capable of offering a much higher I/O throughput and lower latency than a traditional hard disk device (HDD). These new hardware devices can significantly boost the performance of many applications; thus the database community has been actively engaging in adopting them into database systems. However, the performance benefit cannot be easily reaped if the new hardwares are improperly used. In this paper, we propose Hetero-DB, a high-performance database system by exploiting both the characteristics of the database system and the special properties of the new hardware devices in system’s design and implementation. Hetero-DB develops a GPU-aware query execution engine with GPU device memory management and query scheduling mechanism to support concurrent query execution. Furthermore, with the SSD-HDD hybrid storage system, we redesign the storage engine by organizing HDD and SSD into a two-level caching hierarchy in Hetero-DB. To best utilize the hybrid hardware devices, the semantic information that is critical for storage I/O is identified and passed to the storage manager, which has a great potential to improve the efficiency and performance. Hetero-DB aims to maximize the performance benefits of GPU and SSD, and demonstrates the effectiveness for designing next generation database systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advancements in Quantum Computing—Viewpoint: Building Adoption and Competency in Industry

Article Open access 11 March 2024

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Article 12 April 2024

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

References

Bandi N, Sun C, Agrawal D, Abbadi A E. Hardware acceleration in commercial databases: A case study of spatial operations. In Proc. the 30th International Conference on Very Large Data Bases (VLDB), Aug. 31-Sept. 3, 2004, pp.1021-1032.
Govindaraju N K, Lloyd B, Wang W, Lin M, Manocha D. Fast computation of database operations using graphics processors. In Proc. the 2004 ACM SIGMOD International Conference on Management of Data, June 2004, pp.215-226.
He B, Yang K, Fang R, Liu M, Govindaraju N, Luo Q, Sander P. Relational joins on graphics processors. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, June 2008, pp.511-524.
Pirk H, Manegold S, Kersten M. Accelerating foreign-key joins using asymmetric memory channels. In Proc. ADMS, September 2011, pp.27-35.
Govindaraju N, Gray J, Kumar R, Manocha D. Gputerasort: High performance graphics co-processor sorting for large database management. In Proc. ACM SIGMOD, June 2006, pp.325-336.
Satish N, Kim C, Chhugani J, Nguyen A D, Lee V W, Kim D, Dubey P. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.351-362.
Fang W, He B, Luo Q. Database compression on graphics processors. Proc. VLDB Endow., 2010, 3(1/2): 670-680.
Article Google Scholar
Sitaridi E A, Ross K A. Ameliorating memory contention of OLAP operators on GPU processors. In Proc. the 8th International Workshop on Data Management on New Hardware (DaMoN), May 2012, pp.39-47.
He B, Yu J X. High-throughput transaction executions on graphics processors. Proc. VLDB Endow., 2011, 4(5): 314-325.
Article Google Scholar
He B, Liu M, Yang K, Fang R, Govindaraju N, Luo Q, Sander P. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems, 2009, 34(4): 21:1–21:39.
Kaldewey T, Lohman G, M¨uller R, Volk P. GPU join processing revisited. In Proc. the 8th International Workshop on Data Management on New Hardware (DaMoN), May 2012, pp.55-62.
Ao N, Zhang F, Wu D, Stones D S, Wang G, Liu X, Liu J, Lin S. Efficient parallel lists intersection and index compression algorithms using graphics processing units. PVLDB, 2011, 4(8): 470-481.
Google Scholar
Wu H, Diamos G, Cadambi S, Yalamanchili S. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Proc. the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2012, pp.107-118.
Lieberman M D, Sankaranarayanan J, Samet H. A fast similarity join algorithm using graphics processing units. In Proc. the 24th ICDE, April 2008, pp.1111-1120.
Wang K, Huai Y, Lee R, Wang F, Zhang X, Saltz J H. Accelerating pathology image data cross-comparison on CPUGPU hybrid systems. Proc. VLDB Endow., 2012, 5(11):1543-1554.
Article Google Scholar
Handy J. Flash memory vs. hard disk drives — Which will win? http://www.storagesearch.com/semicoart1.html, May 2015.
Lee S W, Moon B, Park C, Kim J M, Kim S W. A case for flash memory SSD in enterprise database applications. In Proc. the 2008 ACM SIGMOD International Conference on Management of Data, June 2008, pp.1075-1086.
Mesnier M P, Akers J B. Differentiated storage services. SIGOPS Oper. Syst. Rev., 2011, 45(1): 45-53.
Article Google Scholar
Wang K, Ding X, Lee R, Kato S, Zhang X. GDM: Device memory management for GPGPU computing. SIGMETRICS Perform. Eval. Rev., 2014, 42(1): 533-545.
Article Google Scholar
Canim M, Mihaila G A, Bhattacharjee B, Ross K A, Lang C A. An object placement advisor for DB2 using solid state storage. Proc. VLDB Endow., 2009, 2(2): 1318-1329.
Article Google Scholar
Hassidim A. Cache replacement policies for multicore processors. In Proc. Innovations in Computer Science (ICS), January 2010, pp.501-509.
Sivathanu M, Bairavasundaram L N, Arpaci-Dusseau A C, Arpaci-Dusseau R H. Life or death at block-level. In Proc. the 6th Symposium on Operating Systems Design and Implementation (OSDI), December 2004, pp.379-394.
Lee R, Luo T, Huai Y, Wang F, He Y, Zhang X. YSmart: Yet another SQL-to-MapReduce translator. In Proc. the 31st International Conference on Distributed Computing Systems (ICDCS), June 2011, pp.25-36.
Canim M, Mihaila G A, Bhattacharjee B, Ross K A, Lang C A. SSD bufferpool extensions for database systems. Proc. VLDB Endow., 2010, 3(1/2): 1435-1446.
Article Google Scholar
Do J, Zhang D, Patel J M, DeWitt D J, Naughton J F, Halverson A. Turbocharging DBMs buffer pool using SSDs. In Proc. the 2011 ACM SIGMOD International Conference on Management of Data, June 2011, pp.1113-1124.
Jiang S, Zhang X. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. SIGMETRICS Perform. Eval. Rev., 2002, 30(1):31-42.
Article Google Scholar
Balkesen C, Teubner J, Alonso G, ¨Ozsu M T. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Proc. the 29th ICDE, April 2013, pp.362-373.
Blanas S, Li Y, Patel J. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In Proc. ACM SIGMOD, June 2011, pp.37-48.
Alcantara D A, Sharf A, Abbasinejad F, Sengupta S, Mitzenmacher M, Owens J D, Amenta N. Real-time parallel hashing on the GPU. ACM Trans. Graph., 2009, 28(5):154:1-154:9.
Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press, 1995.
Yuan Y, Lee R, Zhang X. The Yin and Yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 2013, 6(10): 817-828.
Article Google Scholar
Heimel M, Markl V. A first step towards GPU-assisted query optimization. In Proc. ADMS, August 2012, pp.33-44.
Yalamanchili S. Scaling data warehousing applications using GPUs. In Proc. the 2nd International Workshop on Performance Analysis of Workload Optimized Systems (FastPath), April 2013.
Pirk H, Manegold S, Kersten M L. Waste not… efficient coprocessing of relational data. In Proc. the 30th IEEE International Conference on Data Engineering (ICDE), March 31-April 4, 2014, pp.508-519.
Heimel M, Saecker M, Pirk H, Manegold S, Markl V. Hardware-oblivious parallelism for in-memory columnstores. Proc. VLDB Endow., 2013, 6(9): 709-720.
Article Google Scholar
Breß S, Saake G. Why it is time for a HyPE: A hybrid query processing engine for efficient GPU coprocessing in DBMs. Proc. VLDB Endow., 2013, 6(12): 1398-1403.
Article Google Scholar
Rossbach C J, Currey J, Silberstein M, Ray B, Witchel E. PTask: Operating system abstractions to manage GPUs as compute devices. In Proc. the 23rd ACM Symposium on Operating Systems Principles (SOSP), October 2011, pp.233-248.
Kato S, Lakshmanan K, RaJjkumar R, Ishikawa Y. Time-Graph: GPU scheduling for real-time multi-tasking environments. In Proc. the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC), June 2011, Article No. 2.
Kato S, McThrow M, Maltzahn C, Brandt C. Gdev: Firstclass GPU resource management in the operating system. In Proc. the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC), June 2012, Article No. 37.
Megiddo N, Modha D S. ARC: A self-tuning, low overhead replacement cache. In Proc. the 2nd USENIX Conference on File and Storage Technologies (FAST), March 31-April 2, 2003, pp.115-130.
Liu X, Aboulnaga A, Salem K, Li X. CLIC: Client-informed caching for storage servers. In Proc. the 7th Conference on File and Storage Technologies (FAST), February 2009, pp.297-310.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, China
Kai Zhang
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, U.S.A.
Kai Zhang, Rubao Lee, Kaibo Wang, Yuan Yuan & Xiaodong Zhang
Department of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, 70803, U.S.A.
Feng Chen
Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, U.S.A.
Xiaoning Ding
Databricks Inc., San Francisco, CA, 94105, U.S.A.
Yin Huai
VMware Inc., Palo Alto, CA, 94304, U.S.A.
Tian Luo

Authors

Kai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoning Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yin Huai
View author publications
You can also search for this author in PubMed Google Scholar
Rubao Lee
View author publications
You can also search for this author in PubMed Google Scholar
Tian Luo
View author publications
You can also search for this author in PubMed Google Scholar
Kaibo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Zhang.

Additional information

This work was supported in part by the National Science Foundation of USA under Grant Nos. CCF-0913050, OCI-1147522, and CNS-1162165.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, K., Chen, F., Ding, X. et al. Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources. J. Comput. Sci. Technol. 30, 657–678 (2015). https://doi.org/10.1007/s11390-015-1553-y

Download citation

Received: 21 February 2015
Revised: 23 April 2015
Published: 08 July 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s11390-015-1553-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources

Abstract

Access this article

Similar content being viewed by others

Advancements in Quantum Computing—Viewpoint: Building Adoption and Competency in Industry

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hetero-DB: Next Generation High-Performance Database Systems by Best Utilizing Heterogeneous Computing and Storage Resources

Abstract

Access this article

Similar content being viewed by others

Advancements in Quantum Computing—Viewpoint: Building Adoption and Competency in Industry

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation