ABSTRACT
In this paper, we describe our journey of transforming SAP IQ into a relational database management system (RDBMS) that utilizes cheap, elastically scalable object stores on the cloud. SAP IQ is a three-decade old, disk-based, columnar RDBMS that is optimized for complex online analytical processing (OLAP) workloads. Traditionally, SAP IQ has been designed to operate on shared storage devices with strong consistency guarantees (e.g., high-caliber storage area network devices). Therefore, deploying SAP IQ on the cloud, as is, would have meant utilizing storage solutions such as NetApp or AWS EFS that provide a POSIX compliant file interface and strong consistency guarantees, but at a much higher monetary cost. These costs can accumulate easily to diminish the economies of scale that one would expect on the cloud, which can be undesirable. Instead, we have enhanced the design of SAP IQ to operate on cloud object stores such as AWS S3 and Azure Blob Storage. Object stores rely on a weaker consistency model, and potentially have higher latency; however, because of these design trade-offs, they are able to offer (i) better pricing, (ii) enhanced durability, (iii) improved elasticity, and (iv) higher throughput. By enhancing SAP IQ to operate under these design trade-offs, we have unlocked many of the opportunities offered by object stores. More specifically, we have extended SAP IQ's buffer manager and transaction manager, and have introduced a new caching layer that utilizes instance storage on AWS EC2. Experiments using the TPC-H benchmark demonstrate that we can gain an order of magnitude reduction in data-at rest storage costs while improving query and load performance.
Supplemental Material
- Alibaba Object Storage Service. https://www.alibabacloud.com/product/oss/.Google Scholar
- Amazon Elastic Block Store (EBS). https://aws.amazon.com/ebs/.Google Scholar
- Amazon Elastic Compute Cloud (EC2). https://aws.amazon.com/ec2/.Google Scholar
- Amazon Elastic File System (EFS). https://aws.amazon.com/efs/.Google Scholar
- Amazon Simple Storage Service (S3). http://aws.amazon.com/s3/.Google Scholar
- Amazon Simple Workflow Service (SWF). https://aws.amazon.com/swf/.Google Scholar
- Apache Hudi. https://hudi.apache.org/.Google Scholar
- Apache Parquet. https://parquet.apache.org/.Google Scholar
- Apache Software Foundation, Hadoop. https://hadoop.apache.org/.Google Scholar
- Apache Spark -- Lightning-fast unified analytics engine. https://spark.apache.org/.Google Scholar
- Azure Blob Storage. https://azure.microsoft.com/en-us/services/storage/blobs/.Google Scholar
- Best practices design patterns: Optimizing Amazon S3 performance. https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html.Google Scholar
- Dremio. https://www.dremio.com/product/.Google Scholar
- Google BigQuery. https://cloud.google.com/bigquery/.Google Scholar
- Google Cloud Storage. https://cloud.google.com/storage/.Google Scholar
- NetApp -- Data Management Solutions for the Cloud. https://www.netapp.com/.Google Scholar
- SAP HANA Cloud. https://www.sap.com/products/hana/cloud.html.Google Scholar
- SAP IQ. https://www.sap.com/canada/products/sybase-iq-big-data-management.html.Google Scholar
- SAP IQ Performance and Tuning Guide -- Zone Maps. https://help.sap.com/viewer/a8982cc084f21015a7b4b7fcdeb0953d/16.1.3.0/en-US/6604c2567d66453da391dee00dcf5d5c.html.Google Scholar
- TPC Benchmark H (decision support) standard specification. http://www.tpc.org/tpch/.Google Scholar
- A. Agarwal, S. A. Kirk, B. French, N. Marathe, S. Mungikar, and K. Mittal. Tiered index management, 2013. US Patent 10061792B2.Google Scholar
- A. Ailamaki, D. J. DeWitt, and M. D. Hill. Data page layouts for relational databases on deep memory hierarchies. VLDB J., 11(3):198--215, 2002.Google ScholarDigital Library
- P. Antonopoulos, A. Budovski, C. Diaconu, A. H. Saenz, J. Hu, H. Kodavalla, D. Kossmann, S. Lingam, U. F. Minhas, N. Prakash, V. Purohit, H. Qu, C. S. Ravella, K. Reisteter, S. Shrotri, D. Tang, and V. Wakade. Socrates: The new SQL server in the cloud. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD, pages 1743--1756, 2019.Google ScholarDigital Library
- M. Armbrust, T. Das, S. Paranjpye, R. Xin, S. Zhu, A. Ghodsi, B. Yavuz, M. Murthy, J. Torres, L. Sun, P. A. Boncz, M. Mokhtar, H. V. Hovell, A. Ionescu, A. Luszczak, M. Switakowski, T. Ueshin, X. Li, M. Szafranski, P. Senster, and M. Zaharia. Delta lake: High-performance ACID table storage over cloud object stores. Proc. VLDB Endow., 13(12):3411--3424, 2020.Google ScholarDigital Library
- H. Berenson, P. A. Bernstein, J. Gray, J. Melton, E. J. O'Neil, and P. E. O'Neil. A critique of ANSI SQL isolation levels. In Proceedings of the 1995 ACM International Conference on Management of Data, SIGMOD, pages 1--10, 1995.Google ScholarDigital Library
- J. Camacho-Rodr'i guez, A. Chauhan, A. Gates, E. Koifman, O. O'Malley, V. Garg, Z. Haindrich, S. Shelukhin, P. Jayachandran, S. Seth, D. Jaiswal, S. Bouguerra, N. Bangarwa, S. Hariappan, A. Agarwal, J. Dere, D. Dai, T. Nair, N. Dembla, G. Vijayaraghavan, and G. Hagleitner. Apache Hive: From MapReduce to enterprise-grade big data warehousing. In Proceedings of the 2019 ACM International Conference on Management of Data, SIGMOD, pages 1773--1786, 2019.Google ScholarDigital Library
- C. Y. Chan and Y. E. Ioannidis. Bitmap index design and evaluation. In Proceedings of the 1998 ACM International Conference on Management of Data, SIGMOD, pages 355--366, 1998.Google ScholarDigital Library
- D. Comer. The ubiquitous b-tree. ACM Comput. Surv., 11(2):121--137, 1979.Google ScholarDigital Library
- B. F. Cooper, P. P. S. Narayan, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS to sherpa: Lessons from yahoo!'s cloud database. Proc. VLDB Endow., 12(12):2300--2307, 2019.Google ScholarDigital Library
- B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. W. Lee, A. Motivala, A. Q. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, and P. Unterbrunner. The snowflake elastic data warehouse. In Proceedings of the 2016 ACM International Conference on Management of Data, SIGMOD, pages 215--226. ACM, 2016.Google ScholarDigital Library
- S. Das, M. Grbic, I. Ilic, I. Jovandic, A. Jovanovic, V. R. Narasayya, M. Radulovic, M. Stikic, G. Xu, and S. Chaudhuri. Automatically indexing millions of databases in microsoft azure SQL database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD, pages 666--679, 2019.Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation, OSDI, pages 137--150, 2004.Google Scholar
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of the 2007 ACM Symposium on Operating Systems Principles, SOSP, pages 205--220, 2007.Google ScholarDigital Library
- F. F"a rber, S. K. Cha, J. Primsch, C. Bornhö vd, S. Sigg, and W. Lehner. SAP HANA database: data management for modern business applications. SIGMOD Rec., 40(4):45--51, 2011.Google Scholar
- A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani, and V. Srinivasan. Amazon Redshift and the case for simpler data warehouses. In Proceedings of the 2015 ACM International Conference on Management of Data, SIGMOD, pages 1917--1923, 2015.Google ScholarDigital Library
- S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.Google Scholar
- C. Jia and H. Li. Virtual distributed file system: Alluxio. In Encyclopedia of Big Data Technologies. Springer, 2019.Google ScholarCross Ref
- D. E. Knuth. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, 1973.Google Scholar
- A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandier, L. Doshi, and C. Bear. The vertica analytic database: C-store 7 years later. Proc. VLDB Endow., 5(12):1790--1801, 2012.Google ScholarDigital Library
- F. Li. Cloud native database systems at alibaba: Opportunities and challenges. Proc. VLDB Endow., 12(12):2263--2272, 2019.Google ScholarDigital Library
- M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3--30, 1998.Google ScholarDigital Library
- S. Mungikar and B. French. Smart pre-fetch for sequential access on BTree, 2013. US Patent 9552298B2.Google Scholar
- T. Neumann. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow., 4(9):539--550, 2011.Google ScholarDigital Library
- R. Ramakrishnan, B. Sridharan, J. R. Douceur, P. Kasturi, B. Krishnamachari-Sampath, K. Krishnamoorthy, P. Li, M. Manu, S. Michaylov, R. Ramos, N. Sharman, Z. Xu, Y. Barakat, C. Douglas, R. Draves, S. S. Naidu, S. Shastry, A. Sikaria, S. Sun, and R. Venkatesan. Azure data lake store: A hyperscale distributed file service for big data analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pages 51--63, 2017.Google ScholarDigital Library
- G. M. Sacco. Buffer management. In L. Liu and M. T. Ö zsu, editors, Encyclopedia of Database Systems, pages 277--282. Springer US, 2009.Google Scholar
- R. Sethi, M. Traverso, D. Sundstrom, D. Phillips, W. Xie, Y. Sun, N. Yegitbasi, H. Jin, E. Hwang, N. Shingte, and C. Berner. Presto: SQL on everything. In Proceedings of the 2019 IEEE International Conference on Data Engineering, ICDE, pages 1802--1813, 2019.Google ScholarCross Ref
- M. Sharique, A. K. Goel, and M. Andrei. Rollover strategies in a n-bit dictionary compressed column store, 2013. US Patent 9489409B2.Google Scholar
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE Conference on Mass Storage Systems and Technologies, MSST, pages 1--10, 2010.Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: A column-oriented DBMS. In Proceedings of 2005 International Conference on Very Large Data Bases, VLDB, pages 553--564, 2005.Google Scholar
- M. Stonebraker, D. J. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010.Google ScholarDigital Library
- J. Tan, T. Ghanem, M. Perron, X. Yu, M. Stonebraker, D. J. DeWitt, M. Serafini, A. Aboulnaga, and T. Kraska. Choosing A cloud DBMS: architectures and tradeoffs. Proc. VLDB Endow., 12(12):2170--2182, 2019.Google ScholarDigital Library
- B. Vandiver, S. Prasad, P. Rana, E. Zik, A. Saeidi, P. Parimal, S. Pantela, and J. Dave. Eon mode: Bringing the vertica columnar database to the cloud. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 ACM International Conference on Management of Data, SIGMOD, pages 797--809, 2018.Google ScholarDigital Library
- A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD, pages 1041--1052, 2017.Google ScholarDigital Library
- M. Vuppalapati, J. Miron, R. Agarwal, D. Truong, A. Motivala, and T. Cruanes. Building an elastic query engine on disaggregated storage. In Proceedings of the 2020 Symposium on Networked Systems Design and Implementation, USENIX, pages 449--462, 2020.Google Scholar
Index Terms
- Bringing Cloud-Native Storage to SAP IQ
Recommendations
Frugal storage for cloud file systems
EuroSys '12: Proceedings of the 7th ACM european conference on Computer SystemsEnterprises are moving their IT infrastructure to cloud service providers with the goal of saving costs and simplifying management overhead. One of the critical services for any enterprise is its file system, where users require real-time access to ...
Differentiated storage services
This article presents a Differentiated Storage Services architecture for file and storage systems. By classifying data at the block-level, a filesystem can request that different classes of data (e.g., file, directory, executable, text) be handled with ...
Comments