skip to main content
10.1145/2588555.2595637acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Orca: a modular query optimizer architecture for big data

Published: 18 June 2014 Publication History

Abstract

The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.

References

[1]
TPC-DS. http://www.tpc.org/tpcds, 2005.
[2]
L. Antova, A. ElHelw, M. Soliman, Z. Gu, M. Petropoulos, and F. Waas. Optimizing Queries over Partitioned Tables in MPP Systems. In SIGMOD, 2014.
[3]
L. Antova, K. Krikellas, and F. M. Waas. Automatic Capture of Minimal, Portable, and Executable Bug Repros using AMPERe. In DBTest, 2012.
[4]
K. Bajda-Pawlikowski, D. J. Abadi, A. Silberschatz, and E. Paulson. Efficient Processing of Data Warehousing Queries in a Split Execution Environment. In SIGMOD, 2011.
[5]
A. Behm, V. R. Borkar, M. J. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. J. Tsotras. ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-world Models. Dist. Parallel Databases, 29(3), 2011.
[6]
R. Chaiken, B. Jenkins, P.- A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. PVLDB, 1(2), 2008.
[7]
L. Chan. Presto: Interacting with petabytes of data at Facebook. http://prestodb.io, 2013.
[8]
Y. Chen, R. L. Cole, W. J. McKenna, S. Perlfiov, A. Sinha, and E. Szedenits, Jr. Partial Join Order Optimization in the Paraccel Analytic Database. In SIGMOD, 2009.
[9]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's Globally-distributed Database. In OSDI, 2012.
[10]
D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. Split Query Processing in Polybase. In SIGMOD, 2013.
[11]
F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA Database: Data Management for Modern Business Applications. SIGMOD Rec., 40(4), 2012.
[12]
G. Graefe. Encapsulation of Parallelism in the Volcano Query Processing System. In SIGMOD, 1990.
[13]
G. Graefe. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull., 18(3), 1995.
[14]
G. Graefe and W. J. McKenna. The Volcano Optimizer Generator: Extensibility and Efficient Search. In ICDE, 1993.
[15]
Z. Gu, M. A. Soliman, and F. M. Waas. Testing the Accuracy of Query Optimizers. In DBTest, 2012.
[16]
Hortonworks. Stinger, Interactive query for Apache Hive. http://hortonworks.com/labs/stinger/, 2013.
[17]
M. Kornacker and J. Erickson. Cloudera Impala: Real-Time Queries in Apache Hadoop, for Real. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html, 2012.
[18]
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The Vertica Analytic Database: C-store 7 Years Later. VLDB Endow., 5(12), 2012.
[19]
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. PVLDB, 3(1):330--339, 2010.
[20]
Pivotal. Greenplum Database. http://www.gopivotal.com/products/pivotal-greenplum- database, 2013.
[21]
Pivotal. HAWQ. http://www.gopivotal.com/sites/ default/files/Hawq_WP_042313_FINAL.pdf, 2013.
[22]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In SIGMOD, 1979.
[23]
S. Shankar, R. Nehme, J. Aguilar-Saborit, A. Chung, M. Elhemali, A. Halverson, E. Robinson, M. S. Subramanian, D. DeWitt, and C. Galindo-Legaria. Query Optimization in Microsoft SQL Server PDW. In SIGMOD, 2012.
[24]
E. Shen and L. Antova. Reversing Statistics for Scalable Test Databases Generation. In Proceedings of the Sixth International Workshop on Testing Database Systems, pages 7:1--7:6, 2013.
[25]
M. Singh and B. Leonhardi. Introduction to the IBM Netezza Warehouse Appliance. In CASCON, 2011.
[26]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, 2005.
[27]
Teradata. http://www.teradata.com/, 2013.
[28]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - A Petabyte Scale Data Warehouse using Hadoop. In ICDE, 2010.
[29]
F. Waas and C. Galindo-Legaria. Counting, Enumerating, and Sampling of Execution Plans in a Cost-based Query Optimizer. In SIGMOD, 2000.
[30]
F. M. Waas and J. M. Hellerstein. Parallelizing Extensible Query Optimizers. In SIGMOD Conference, pages 871--878, 2009.
[31]
R. Weiss. A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server, 2012.

Cited By

View all
  • (2025)GaussDB-AISQL: a composable cloud-native SQL system with AI capabilitiesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40624-219:9Online publication date: 1-Sep-2025
  • (2024)Presto's History-Based Query OptimizerProceedings of the VLDB Endowment10.14778/3685800.368582817:12(4077-4089)Online publication date: 8-Nov-2024
  • (2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cost model
  2. mpp
  3. parallel processing
  4. query optimization

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)19
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)GaussDB-AISQL: a composable cloud-native SQL system with AI capabilitiesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40624-219:9Online publication date: 1-Sep-2025
  • (2024)Presto's History-Based Query OptimizerProceedings of the VLDB Endowment10.14778/3685800.368582817:12(4077-4089)Online publication date: 8-Nov-2024
  • (2024)ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36393002:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Unified Query Optimization in the Fabric Data WarehouseCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653369(18-30)Online publication date: 9-Jun-2024
  • (2024)First Past the Post: Evaluating Query Optimization in MongoDBDatabases Theory and Applications10.1007/978-981-96-1242-0_8(99-113)Online publication date: 13-Dec-2024
  • (2023)AutoSteer: Learned Query Optimization for Any SQL DatabaseProceedings of the VLDB Endowment10.14778/3611540.361154416:12(3515-3527)Online publication date: 1-Aug-2023
  • (2023)The Composable Data Management System ManifestoProceedings of the VLDB Endowment10.14778/3603581.360360416:10(2679-2685)Online publication date: 1-Jun-2023
  • (2023)A Query Optimizer for Range Queries over Multi-Attribute TrajectoriesACM Transactions on Intelligent Systems and Technology10.1145/355581114:1(1-28)Online publication date: 27-Jan-2023
  • (2023)Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)The VLDB Journal10.1007/s00778-023-00785-132:6(1315-1342)Online publication date: 20-Mar-2023
  • (2022)DBinsight: A Tool for Interactively Understanding the Query Processing Pipeline in RDBMSsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557211(4960-4964)Online publication date: 17-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media