skip to main content
10.1145/3358960.3375794acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

Modeling Analytics for Computational Storage

Published:20 April 2020Publication History

ABSTRACT

Next generation flash storage will be armed with a substantial amount of computing power. In this paper, we investigate opportunities to utilize this computational capability to optimize Online Analytical Processing (OLAP) applications. We have directed our analysis at the performance of a subset of TPC-DS queries using Hadoop clusters and two database engines, SPARK-SQL and Presto. We model the expected speed-up achieved by offloading a few operations that are executed first within most SQL plans. Offloading these operations requires minimal cooperation from the database engine, and no changes to the existing plan. We show that the speed-up achieved varies significantly among queries and between engines, and that the queries benefiting the most are I/O heavy with high selectivity of the "needle in the haystack" variety. Our main contribution is estimating the speed-up anticipated from pushing the execution of a few key SQL building blocks (scan, filter, and project operations) to computational storage when using read optimized, columnar Parquet format files.

References

  1. Samsung SmartSSD: https://samsungatfirst.com/smartssd/ Accessed August, 10,2019.Google ScholarGoogle Scholar
  2. NGD systems: https://www.ngdsystems.com/ Accessed August 10, 2019.Google ScholarGoogle Scholar
  3. ScaleFlux: http://www.scaleflux.com/ Accessed October 1, 2019.Google ScholarGoogle Scholar
  4. SIMMS https://www.simms.co.uk/tech-talk-2/sas-sata-or-pcie-know-your-interface/ Accessed 8/15/2019.Google ScholarGoogle Scholar
  5. G. Koo, et al. "Summarizer: Trading Communication with Computing Near Storage" MICRO'17, Oct 14--18, 2017, Boston, MA, USA.Google ScholarGoogle Scholar
  6. I. Jo, et al. "YourSQL: A High-Performance Database System Leveraging In-Storage Computing" Proceedings of the VLDB Endowment, Vol. 9, No 12, pp. 924--935, August 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Gu, et al. "Biscuit: A Framework for Near-Data Processing of Big Data Workloads" ISCA, Seoul, Korea, pp. 153--165, June 2016.Google ScholarGoogle Scholar
  8. J. Lee, et al. "ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator", Proceedings of the VLDB Endowment, Vol. 10, No. 12, pp. 1706--1717, August 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Stuecheli, B. Blaner, C. Johns, M. Siegel. "CAPRI: A coherent accelerator processor interface". IBM Journal of Research and Development, 59(1):7:1{7:7, January 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Kohei, "GPCPU Accelerates PostgreSQL", DB Tech Showcase, Tokyo, Japan, November 2014.Google ScholarGoogle Scholar
  11. "Postgres Derived Databases", Documentation at https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases. Accessed 6/12/2018.Google ScholarGoogle Scholar
  12. P. Francisco "IBM PureData System for Analytics Architecture" IBM White Paper, 2014.Google ScholarGoogle Scholar
  13. TPC Benchmark DS Standard Specification Version 2.10.1. www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.10.1.pdf Accessed May 13, 2019.Google ScholarGoogle Scholar
  14. M. Poess, et al. "Analysis of TPC-DS the first standard benchmark for SQL-based big data systems", Proceedings of the 2017 Symposium on Cloud Computing, Santa Clara, CA, USA, pp. 573--585, September 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. TPC-DS Top Results. www.tpc.org/tpcds/results/tpcds_advanced_sort.asp Accessed May 13, 2019.Google ScholarGoogle Scholar
  16. T. Ansley "Accelerating the Apache Hadoop 3.1-based Distribution Ecosystem with Flash Storage" www.micron.com/about/blog/2018/july/accelerating-the-apache-hadoop-based-distribution-ecosystem-with-flash-storage July 31, 2018.Google ScholarGoogle Scholar
  17. A. Thapliyal "Azure HDInsight Performance Benchmarking: Interactive Query, Spark and Presto" azure.microsoft.com/en-us/blog/hdinsight-interactive-query-performance-benchmarks-and-integration-with-power-bi-direct-query/ December 20, 2017.Google ScholarGoogle Scholar
  18. Transaction Processing Performance Council website www.tpc.orgGoogle ScholarGoogle Scholar
  19. Apache Spark Documentation 2.4.3. spark.apache.org/docs/latest/ Accessed 8/6/2019.Google ScholarGoogle Scholar
  20. Presto Hive Connector. prestodb.io/docs/current/connector/hive.html Accessed 6/1/2018.Google ScholarGoogle Scholar
  21. Presto Documentation. prestodb.io/docs/current/overview.html Accessed 4/5/2018.Google ScholarGoogle Scholar
  22. B. Braams, "Predicate Pushdown in Parquet and Apache Spark" Master's Thesis. Univ. of Amsterdam. December, 2018.Google ScholarGoogle Scholar
  23. S. Melnik, S. et al. "Dremel: interactive analysis of web-scale datasets". Proceedings of the VLDB Endowment 3.1--2 (2010), pages 330--339.Google ScholarGoogle Scholar
  24. S. Pei, J. Yang, Q. Yang "REGISTOR: A Platform for Unstructured Data Processing Inside SSD Storage" SYSTOR, June 4--8, 2018, Haifa, Israel.Google ScholarGoogle Scholar
  25. Z. Ruan, T. He, J. Cong "INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive" USENIX ATC 2019, Renton, WA, USA.Google ScholarGoogle Scholar

Index Terms

  1. Modeling Analytics for Computational Storage

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICPE '20: Proceedings of the ACM/SPEC International Conference on Performance Engineering
            April 2020
            319 pages
            ISBN:9781450369916
            DOI:10.1145/3358960

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 April 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            ICPE '20 Paper Acceptance Rate15of62submissions,24%Overall Acceptance Rate252of851submissions,30%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader