Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)

Wang, Zuozhi; Zeng, Kai; Huang, Botong; Chen, Wei; Cui, Xiaozong; Wang, Bo; Liu, Ji; Fan, Liya; Qu, Dachuan; Hou, Zhenyu; Guan, Tao; Li, Chen; Zhou, Jingren

doi:10.1007/s00778-023-00785-1

Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)

Regular Paper
Published: 20 March 2023

Volume 32, pages 1315–1342, (2023)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Zuozhi Wang¹,
Kai Zeng²,
Botong Huang ORCID: orcid.org/0000-0001-7870-4997²,
Wei Chen²,
Xiaozong Cui²,
Bo Wang²,
Ji Liu²,
Liya Fan²,
Dachuan Qu²,
Zhenyu Hou²,
Tao Guan²,
Chen Li¹ &
…
Jingren Zhou²

294 Accesses
Explore all metrics

Abstract

Incremental processing is widely adopted in many applications, ranging from incremental view maintenance, stream computing, to recently emerging progressive data warehouse and intermittent query processing. Despite many algorithms developed on this topic, none of them can produce an incremental plan that always achieves the best performance, since the optimal plan is data dependent. In this paper, we develop a novel cost-based optimizer framework, called Tempura, for optimizing incremental data processing. We propose an incremental query planning model called TIP based on the concept of time-varying relations, which can formally model incremental processing in its most general form. We give a full specification of Tempura, which can not only unify various existing techniques to generate an optimal incremental plan, but also allow the developer to add their rewrite rules. We study how to explore the plan space and search for an optimal incremental plan. We evaluate Tempura in various incremental processing scenarios to show its effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

General dynamic Yannakakis: conjunctive queries with theta joins under updates

Article 19 November 2019

An Efficient Incremental Mining Algorithm for Dynamic Databases

Efficient algorithm for mining high average-utility itemsets in incremental transaction databases

Article 14 February 2017

Notes

Note that Final also needs to filter out empty groups with zero contributing tuples. We omit this detail for simplicity.
Here, we do not assume o_id as the primary key of returns. Say returns could contain multiple records for a returned order due to different costs such as shipping cost, product damage, and inventory carrying cost.

References

Abadi, D.J., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., et al.: The design of the borealis stream processing engine. In: Cidr, vol. 5, pp. 277–289 (2005)
Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: The aqua approximate query answering system. In: ACM Sigmod Record, vol. 28, pp. 574–576. ACM (1999)
Ahmad, Y., Kennedy, O., Koch, C., Nikolic, M.: Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. PVLDB 5(10), 968–979 (2012)
Google Scholar
Aiken, A., Hellerstein, J.M., Widom, J.: Static analysis techniques for predicting the behavior of active database rules. ACM Trans. Database Syst. (TODS) 20(1), 3–41 (1995)
Article Google Scholar
Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)
Article Google Scholar
Babcock, B., Chaudhuri, S., Das, G.: Dynamic sample selection for approximate query processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 539–550. ACM (2003)
Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 107–118 (2005)
Begoli, E., Akidau, T., Hueske, F., Hyde, J., Knight, K., Knowles, K.L.: One SQL to rule them all - an efficient and syntactically idiomatic approach to management of streams and tables. In: Boncz, P.A., Manegold, S., Ailamaki, A., Deshpande, A., Kraska, T. (eds.) Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30–July 5, 2019, pp. 1757–1772. ACM (2019). https://doi.org/10.1145/3299869.3314040
Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M.J., Lemire, D.: Apache calcite: a foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, pp. 221–230. ACM, New York, NY, USA (2018). https://doi.org/10.1145/3183713.3190662
Blakeley, J.A., Larson, P.A., Tompa, F.W.: Efficiently updating materialized views. In: Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, SIGMOD ’86, pp. 61–71. ACM, New York, NY, USA (1986). https://doi.org/10.1145/16894.16861
Buneman, O.P., Clemons, E.K.: Efficiently monitoring relational databases. ACM Trans. Database Syst. 4(3), 368–382 (1979). https://doi.org/10.1145/320083.320099
Article Google Scholar
Chandramouli, B., Bond, C.N., Babu, S., Yang, J.: Query suspend and resume. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 557–568 (2007)
Chandramouli, B., Goldstein, J., Quamar, A.: Scalable progressive analytics on big data in the cloud. Proc. VLDB Endow. 6(14), 1726–1737 (2013). https://doi.org/10.14778/2556549.2556557
Article Google Scholar
Chaudhuri, S., Krishnamurthy, R., Potamianos, S., Shim, K.: Optimizing queries with materialized views. In: Proceedings of the Eleventh International Conference on Data Engineering, ICDE ’95, pp. 190–200. IEEE Computer Society, Washington, DC, USA (1995). http://dl.acm.org/citation.cfm?id=645480.655434
Chaudhuri, S., Das, G., Narasayya, V.: Optimized stratified sampling for approximate query processing. ACM Trans. Database Syst. (TODS) 32(2), 9 (2007)
Article Google Scholar
Ghanem, T.M., Elmagarmid, A.K., Larson, P.Å., Aref, W.G.: Supporting views in data stream management systems. ACM Trans. Database Syst. (TODS) 35(1), 1 (2010)
Article Google Scholar
Graefe, G., Guy, W., Kuno, H.A., Paullley, G.: Robust query processing (dagstuhl seminar 12321). In: Dagstuhl Reports, vol. 2. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2012)
Graefe, G., McKenna, W.J.: The volcano optimizer generator: Extensibility and efficient search. In: Proceedings of IEEE 9th International Conference on Data Engineering, pp. 209–218. IEEE
Graefe, G.: The cascades framework for query optimization. Data Eng. Bull. 18, 19–29 (1995)
Google Scholar
Griffin, T., Libkin, L.: Incremental maintenance of views with duplicates. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, SIGMOD ’95, pp. 328–339. ACM, New York, NY, USA (1995). https://doi.org/10.1145/223784.223849
Griffin, T., Kumar, B.: Algebraic change propagation for semijoin and outerjoin queries. SIGMOD Rec. 27(3), 22–27 (1998). https://doi.org/10.1145/290593.290597
Article Google Scholar
http://www.tpc.org/tpcds/
https://calcite.apache.org
https://databricks.com/blog/2018/03/13/introducing-stream-stream-joins-in-apache-spark-2-3.html
https://flink.apache.org
https://github.com/alibaba/cost-based-incremental-optimizer
https://issues.apache.org/jira/browse/CALCITE-4568
https://www.alibabacloud.com/product/maxcompute
Jia, J., Li, C., Carey, M.J.: Drum: a rhythmic approach to interactive analytics on large data. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 636–645. IEEE (2017)
Kathuria, T., Sudarshan, S.: Efficient and provable multi-query optimization. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS ’17, pp. 53–67. ACM, New York, NY, USA (2017). https://doi.org/10.1145/3034786.3034792
Koch, C.: Incremental query evaluation in a ring of databases. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 87–98 (2010)
Lang, W., Nehme, R.V., Robinson, E., Naughton, J.F.: Partial results in database systems. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pp. 1275–1286. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2588555.2612176
Larson, P., Zhou, J.: Efficient maintenance of materialized outer-join views. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15-20, 2007, pp. 56–65 (2007). https://doi.org/10.1109/ICDE.2007.367851
Law, Y.N., Wang, H., Zaniolo, C.: Query languages and data models for database sequences and data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB ’04, p. 492-503. VLDB Endowment (2004)
Lee, M.K.: Implementing an interpreter for functional rules in a query optimizer (1988)
Maier, D., Li, J., Tucker, P., Tufte, K., Papadimos, V.: Semantics of data streams and operators. In: Eiter, T., Libkin, L. (eds.) Database Theory - ICDT 2005, pp. 37–52. Springer, Berlin, Heidelberg (2005)
Google Scholar
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, resource management, and approximation in a data stream management system. In: CIDR (2003)
Nikolic, M., Dashti, M., Koch, C.: How to win a hot dog eating contest: distributed incremental view maintenance with batch updates. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 511–526. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2882903.2915246
Raman, V., Hellerstein, J.M.: Partial results for online query processing. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 275–286 (2002)
Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pp. 249–260. ACM, New York, NY, USA (2000). https://doi.org/10.1145/342009.335419
Sax, M.J., Wang, G., Weidlich, M., Freytag, J.C.: Streams and tables: Two sides of the same coin. In: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics, BIRTE ’18. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3242153.3242155
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34 (1979)
Soliman, M.A., Antova, L., Raghavan, V., El-Helw, A., Gu, Z., Shen, E., Caragea, G.C., Garcia-Alvarado, C., Rahman, F., Petropoulos, M., Waas, F., Narayanan, S., Krikellas, K., Baldwin, R.: Orca: A modular query optimizer architecture for big data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, pp. 337–348. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2588555.2595637
Tang, D., Shang, Z., Elmore, A.J., Krishnan, S., Franklin, M.J.: Thrifty query execution via incrementability. In: Maier, D., Pottinger, R., Doan, A., Tan, W., Alawini, A., Ngo, H.Q. (eds.) Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14–19, 2020, pp. 1241–1256. ACM (2020). https://doi.org/10.1145/3318464.3389756
Tang, D., Shang, Z., Elmore, A.J., Krishnan, S., Franklin, M.J.: Intermittent query processing. Proc. VLDB Endow. 12(11), 1427–1441 (2019). https://doi.org/10.14778/3342263.3342278
Article Google Scholar
Terry, D., Goldberg, D., Nichols, D., Oki, B.: Continuous queries over append-only databases. In: Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, SIGMOD ’92, pP. 321–330. Association for Computing Machinery, New York, NY, USA (1992). https://doi.org/10.1145/130283.130333
Thakkar, H., Laptev, N., Mousavi, H., Mozafari, B., Russo, V., Zaniolo, C.: Smm: A data stream management system for knowledge discovery. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 757–768. IEEE (2011)
Viglas, S.D., Naughton, J.F.: Rate-based query optimization for streaming information sources. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 37–48 (2002)
Wang, Z., Zeng, K., Huang, B., Chen, W., Cui, X., Wang, B., Liu, J., Fan, L., Qu, D., Ho, Z., Guan, T., Li, C., Zhou, J.: Grosbeak: A data warehouse supporting resource-aware incremental computing. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD ’20. ACM, Portland, Oregon, USA (2020)
Wang, Z., Zeng, K., Huang, B., Chen, W., Cui, X., Wang, B., Liu, J., Fan, L., Qu, D., Hou, Z., Guan, T., Li, C., Zhou, J.: Tempura: a general cost-based optimizer framework for incremental data processing. Proc. VLDB Endow. 14(1), 14–27 (2020). https://doi.org/10.14778/3421424.3421427
Article Google Scholar
Wolf, F., May, N., Willems, P.R., Sattler, K.U.: On the calculation of optimality ranges for relational query execution plans. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD ’18, p. 663-675. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3183713.3183742
Yin, S., Hameurlain, A., Morvan, F.: Robust query optimization methods with respect to estimation errors: a survey. ACM Sigmod Record 44(3), 25–36 (2015)
Article Google Scholar
Yu, Y., Gunda, P.K., Isard, M.: Distributed aggregation for data-parallel computing: interfaces and implementations. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp. 247–260 (2009)
Zeng, K., Agarwal, S., Stoica, I.: iolap: Managing uncertainty for efficient incremental olap. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 1347–1361. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2882903.2915240
Zhang, Y., Hull, B., Balakrishnan, H., Madden, S.: Icedb: Intermittently-connected continuous query processing. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 166–175. IEEE (2007)
Zhou, J., Larson, P.A., Larson, P.A., Freytag, J.C., Lehner, W.: Efficient exploitation of similar subexpressions for query processing. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD ’07, pp. 533–544. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1247480.1247540

Download references

Author information

Authors and Affiliations

University of California, Irvine, USA
Zuozhi Wang & Chen Li
Alibaba Group, Hangzhou, China
Kai Zeng, Botong Huang, Wei Chen, Xiaozong Cui, Bo Wang, Ji Liu, Liya Fan, Dachuan Qu, Zhenyu Hou, Tao Guan & Jingren Zhou

Authors

Zuozhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Botong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaozong Cui
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ji Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liya Fan
View author publications
You can also search for this author in PubMed Google Scholar
Dachuan Qu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Hou
View author publications
You can also search for this author in PubMed Google Scholar
Tao Guan
View author publications
You can also search for this author in PubMed Google Scholar
Chen Li
View author publications
You can also search for this author in PubMed Google Scholar
Jingren Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Botong Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Z., Zeng, K., Huang, B. et al. Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version). The VLDB Journal 32, 1315–1342 (2023). https://doi.org/10.1007/s00778-023-00785-1

Download citation

Received: 20 January 2022
Revised: 14 January 2023
Accepted: 25 January 2023
Published: 20 March 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00778-023-00785-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)

Abstract

Access this article

Similar content being viewed by others

General dynamic Yannakakis: conjunctive queries with theta joins under updates

An Efficient Incremental Mining Algorithm for Dynamic Databases

Efficient algorithm for mining high average-utility itemsets in incremental transaction databases

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)

Abstract

Access this article

Similar content being viewed by others

General dynamic Yannakakis: conjunctive queries with theta joins under updates

An Efficient Incremental Mining Algorithm for Dynamic Databases

Efficient algorithm for mining high average-utility itemsets in incremental transaction databases

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation