skip to main content
10.1145/3448016.3457563acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Bringing Cloud-Native Storage to SAP IQ

Published:18 June 2021Publication History

ABSTRACT

In this paper, we describe our journey of transforming SAP IQ into a relational database management system (RDBMS) that utilizes cheap, elastically scalable object stores on the cloud. SAP IQ is a three-decade old, disk-based, columnar RDBMS that is optimized for complex online analytical processing (OLAP) workloads. Traditionally, SAP IQ has been designed to operate on shared storage devices with strong consistency guarantees (e.g., high-caliber storage area network devices). Therefore, deploying SAP IQ on the cloud, as is, would have meant utilizing storage solutions such as NetApp or AWS EFS that provide a POSIX compliant file interface and strong consistency guarantees, but at a much higher monetary cost. These costs can accumulate easily to diminish the economies of scale that one would expect on the cloud, which can be undesirable. Instead, we have enhanced the design of SAP IQ to operate on cloud object stores such as AWS S3 and Azure Blob Storage. Object stores rely on a weaker consistency model, and potentially have higher latency; however, because of these design trade-offs, they are able to offer (i) better pricing, (ii) enhanced durability, (iii) improved elasticity, and (iv) higher throughput. By enhancing SAP IQ to operate under these design trade-offs, we have unlocked many of the opportunities offered by object stores. More specifically, we have extended SAP IQ's buffer manager and transaction manager, and have introduced a new caching layer that utilizes instance storage on AWS EC2. Experiments using the TPC-H benchmark demonstrate that we can gain an order of magnitude reduction in data-at rest storage costs while improving query and load performance.

Skip Supplemental Material Section

Supplemental Material

3448016.3457563.mp4

mp4

52.8 MB

References

  1. Alibaba Object Storage Service. https://www.alibabacloud.com/product/oss/.Google ScholarGoogle Scholar
  2. Amazon Elastic Block Store (EBS). https://aws.amazon.com/ebs/.Google ScholarGoogle Scholar
  3. Amazon Elastic Compute Cloud (EC2). https://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  4. Amazon Elastic File System (EFS). https://aws.amazon.com/efs/.Google ScholarGoogle Scholar
  5. Amazon Simple Storage Service (S3). http://aws.amazon.com/s3/.Google ScholarGoogle Scholar
  6. Amazon Simple Workflow Service (SWF). https://aws.amazon.com/swf/.Google ScholarGoogle Scholar
  7. Apache Hudi. https://hudi.apache.org/.Google ScholarGoogle Scholar
  8. Apache Parquet. https://parquet.apache.org/.Google ScholarGoogle Scholar
  9. Apache Software Foundation, Hadoop. https://hadoop.apache.org/.Google ScholarGoogle Scholar
  10. Apache Spark -- Lightning-fast unified analytics engine. https://spark.apache.org/.Google ScholarGoogle Scholar
  11. Azure Blob Storage. https://azure.microsoft.com/en-us/services/storage/blobs/.Google ScholarGoogle Scholar
  12. Best practices design patterns: Optimizing Amazon S3 performance. https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html.Google ScholarGoogle Scholar
  13. Dremio. https://www.dremio.com/product/.Google ScholarGoogle Scholar
  14. Google BigQuery. https://cloud.google.com/bigquery/.Google ScholarGoogle Scholar
  15. Google Cloud Storage. https://cloud.google.com/storage/.Google ScholarGoogle Scholar
  16. NetApp -- Data Management Solutions for the Cloud. https://www.netapp.com/.Google ScholarGoogle Scholar
  17. SAP HANA Cloud. https://www.sap.com/products/hana/cloud.html.Google ScholarGoogle Scholar
  18. SAP IQ. https://www.sap.com/canada/products/sybase-iq-big-data-management.html.Google ScholarGoogle Scholar
  19. SAP IQ Performance and Tuning Guide -- Zone Maps. https://help.sap.com/viewer/a8982cc084f21015a7b4b7fcdeb0953d/16.1.3.0/en-US/6604c2567d66453da391dee00dcf5d5c.html.Google ScholarGoogle Scholar
  20. TPC Benchmark H (decision support) standard specification. http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  21. A. Agarwal, S. A. Kirk, B. French, N. Marathe, S. Mungikar, and K. Mittal. Tiered index management, 2013. US Patent 10061792B2.Google ScholarGoogle Scholar
  22. A. Ailamaki, D. J. DeWitt, and M. D. Hill. Data page layouts for relational databases on deep memory hierarchies. VLDB J., 11(3):198--215, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Antonopoulos, A. Budovski, C. Diaconu, A. H. Saenz, J. Hu, H. Kodavalla, D. Kossmann, S. Lingam, U. F. Minhas, N. Prakash, V. Purohit, H. Qu, C. S. Ravella, K. Reisteter, S. Shrotri, D. Tang, and V. Wakade. Socrates: The new SQL server in the cloud. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD, pages 1743--1756, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Armbrust, T. Das, S. Paranjpye, R. Xin, S. Zhu, A. Ghodsi, B. Yavuz, M. Murthy, J. Torres, L. Sun, P. A. Boncz, M. Mokhtar, H. V. Hovell, A. Ionescu, A. Luszczak, M. Switakowski, T. Ueshin, X. Li, M. Szafranski, P. Senster, and M. Zaharia. Delta lake: High-performance ACID table storage over cloud object stores. Proc. VLDB Endow., 13(12):3411--3424, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O'Neil, and P. E. O'Neil. A critique of ANSI SQL isolation levels. In Proceedings of the 1995 ACM International Conference on Management of Data, SIGMOD, pages 1--10, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Camacho-Rodr'i guez, A. Chauhan, A. Gates, E. Koifman, O. O'Malley, V. Garg, Z. Haindrich, S. Shelukhin, P. Jayachandran, S. Seth, D. Jaiswal, S. Bouguerra, N. Bangarwa, S. Hariappan, A. Agarwal, J. Dere, D. Dai, T. Nair, N. Dembla, G. Vijayaraghavan, and G. Hagleitner. Apache Hive: From MapReduce to enterprise-grade big data warehousing. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD, pages 1773--1786, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Y. Chan and Y. E. Ioannidis. Bitmap index design and evaluation. In Proceedings of the 1998 ACM International Conference on Management of Data, SIGMOD, pages 355--366, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Comer. The ubiquitous b-tree. ACM Comput. Surv., 11(2):121--137, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. F. Cooper, P. P. S. Narayan, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS to sherpa: Lessons from yahoo!'s cloud database. Proc. VLDB Endow., 12(12):2300--2307, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. W. Lee, A. Motivala, A. Q. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, and P. Unterbrunner. The snowflake elastic data warehouse. In Proceedings of the 2016 ACM International Conference on Management of Data, SIGMOD, pages 215--226. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Das, M. Grbic, I. Ilic, I. Jovandic, A. Jovanovic, V. R. Narasayya, M. Radulovic, M. Stikic, G. Xu, and S. Chaudhuri. Automatically indexing millions of databases in microsoft azure SQL database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD, pages 666--679, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation, OSDI, pages 137--150, 2004.Google ScholarGoogle Scholar
  33. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of the 2007 ACM Symposium on Operating Systems Principles, SOSP, pages 205--220, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. F. F"a rber, S. K. Cha, J. Primsch, C. Bornhö vd, S. Sigg, and W. Lehner. SAP HANA database: data management for modern business applications. SIGMOD Rec., 40(4):45--51, 2011.Google ScholarGoogle Scholar
  35. A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani, and V. Srinivasan. Amazon Redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM International Conference on Management of Data, SIGMOD, pages 1917--1923, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.Google ScholarGoogle Scholar
  37. C. Jia and H. Li. Virtual distributed file system: Alluxio. In Encyclopedia of Big Data Technologies. Springer, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  38. D. E. Knuth. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, 1973.Google ScholarGoogle Scholar
  39. A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandier, L. Doshi, and C. Bear. The vertica analytic database: C-store 7 years later. Proc. VLDB Endow., 5(12):1790--1801, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Li. Cloud native database systems at alibaba: Opportunities and challenges. Proc. VLDB Endow., 12(12):2263--2272, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3--30, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Mungikar and B. French. Smart pre-fetch for sequential access on BTree, 2013. US Patent 9552298B2.Google ScholarGoogle Scholar
  43. T. Neumann. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow., 4(9):539--550, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Ramakrishnan, B. Sridharan, J. R. Douceur, P. Kasturi, B. Krishnamachari-Sampath, K. Krishnamoorthy, P. Li, M. Manu, S. Michaylov, R. Ramos, N. Sharman, Z. Xu, Y. Barakat, C. Douglas, R. Draves, S. S. Naidu, S. Shastry, A. Sikaria, S. Sun, and R. Venkatesan. Azure data lake store: A hyperscale distributed file service for big data analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pages 51--63, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. M. Sacco. Buffer management. In L. Liu and M. T. Ö zsu, editors, Encyclopedia of Database Systems, pages 277--282. Springer US, 2009.Google ScholarGoogle Scholar
  46. R. Sethi, M. Traverso, D. Sundstrom, D. Phillips, W. Xie, Y. Sun, N. Yegitbasi, H. Jin, E. Hwang, N. Shingte, and C. Berner. Presto: SQL on everything. In Proceedings of the 2019 IEEE International Conference on Data Engineering, ICDE, pages 1802--1813, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  47. M. Sharique, A. K. Goel, and M. Andrei. Rollover strategies in a n-bit dictionary compressed column store, 2013. US Patent 9489409B2.Google ScholarGoogle Scholar
  48. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE Conference on Mass Storage Systems and Technologies, MSST, pages 1--10, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented DBMS. In Proceedings of 2005 International Conference on Very Large Data Bases, VLDB, pages 553--564, 2005.Google ScholarGoogle Scholar
  50. M. Stonebraker, D. J. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Tan, T. Ghanem, M. Perron, X. Yu, M. Stonebraker, D. J. DeWitt, M. Serafini, A. Aboulnaga, and T. Kraska. Choosing A cloud DBMS: architectures and tradeoffs. Proc. VLDB Endow., 12(12):2170--2182, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. B. Vandiver, S. Prasad, P. Rana, E. Zik, A. Saeidi, P. Parimal, S. Pantela, and J. Dave. Eon mode: Bringing the vertica columnar database to the cloud. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 ACM International Conference on Management of Data, SIGMOD, pages 797--809, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pages 1041--1052, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Vuppalapati, J. Miron, R. Agarwal, D. Truong, A. Motivala, and T. Cruanes. Building an elastic query engine on disaggregated storage. In Proceedings of the 2020 Symposium on Networked Systems Design and Implementation, USENIX, pages 449--462, 2020.Google ScholarGoogle Scholar

Index Terms

  1. Bringing Cloud-Native Storage to SAP IQ

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
          June 2021
          2969 pages
          ISBN:9781450383431
          DOI:10.1145/3448016

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader