Coral: federated query join order optimization based on deep reinforcement learning

Gu, Rong; Zhang, Yi; Yin, Liangliang; Song, Lingyi; Huang, Wenjie; Yuan, Chunfeng; Wang, Zhaokang; Zhu, Guanghui; Huang, Yihua

doi:10.1007/s11280-023-01156-0

Coral: federated query join order optimization based on deep reinforcement learning

Published: 12 June 2023

Volume 26, pages 3093–3118, (2023)
Cite this article

World Wide Web Aims and scope Submit manuscript

Rong Gu ORCID: orcid.org/0000-0002-1565-9997¹,
Yi Zhang¹,
Liangliang Yin¹,
Lingyi Song¹,
Wenjie Huang¹,
Chunfeng Yuan¹,
Zhaokang Wang^1,2,
Guanghui Zhu¹ &
…
Yihua Huang¹

410 Accesses
1 Altmetric
Explore all metrics

Abstract

The rise of diversified data engines has created the need for federated queries. A federated query can take a query and provide data analysis based on data from various data engines. Since the query data originates from multiple data engines, federated queries usually rely on join operation and data migration to complete the query and take a long time. The challenges of optimizing federated queries lie on join order selection and data migration coordination. However, enumerating all join orders is impractical because the set of join orders grows exponentially with the number of relations to be joined. To improve the performance of federated queries, we present a deep reinforcement learning-based approach on optimizing join order and join engine selection for federated queries and design an deep Q-network-based (DQN-based) optimizer. The DQN-based optimizer can generate join search policies that optimize the join order selection for datasets with a given cost model. Based on the DQN-based optimizer, we implement a federated query system Coral which can provide optimization for join order selection of federated queries. With the optimized join order, Coral can transform a federated query into a set of subqueries which will be assigned to and executed on different data engines. We also propose a subquery cache optimization to optimize data migration during the query execution. The extensive experimental evaluation demonstrates that Coral can significantly reduce the query latency of federated queries and achieve a speedup of up to 5.03$\times $ compared to the cutting-edge federated query systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 14

Fig. 15

Reinforcement Learning-Based SPARQL Join Ordering Optimizer

Towards online training for RL-based query optimizer

Article Open access 10 September 2024

Investigating Learning Join Order Optimization Strategies for Rule-based Data Engines

Article 22 November 2024

Notes

Coral is open sourced at https://github.com/PasaLab/Coral

References

Clickhouse. https://clickhouse.tech (2021). Accessed 20 Sep 2021
Elasticsearch. https://www.elastic.co (2021). Accessed 20 Sep 2021
Shamgunov, N.: The MemsQL in-memory database system. In: Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics (IMDM ’14), p. 106 (2014)
Flink. https://flink.apache.org (2021). Accessed 21 Sep 2021
The dzone guide to data persistence. https://dzone.com/guides/data-persistence-2 (2021). Accessed 21 Sep 2021
Xu, L., Cole, R.L., Ting, D.: Learning to optimize federated queries. In: Proceedings of the 2nd ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD’19), pp. 1–7 (2019)
Giannakouris, V., Papailiou, N., Tsoumakos, D., Koziris, N.: MuSQLE: Distributed SQL query execution over multiple engine environments. In: Proceedings of the 4th IEEE International Conference on Big Data (BigData ’16), pp. 452–461 (2016)
Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.B.: The BigDAWG polystore system. ACM SIGMOD Record 44(2), 11–16 (2015)
Article Google Scholar
LeFevre, J., Sankaranarayanan, J., Hacigümüs, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the 33rd ACM International Conference on Management of Data (SIGMOD ’14), pp. 1591–1602 (2014)
Vogt, M., Stiemer, A., Schuldt, H.: ICARUS: Towards a multistore database system. In: Proceedings of the 5th IEEE International Conference on Big Data (BigData ’17), pp. 2490–2499 (2017)
Ying. Research and implementation on cross-platform unified big data SQL query system. Master’s thesis, Nanjing University (2019)
Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M.J., Lemire, D.: Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings of the 37th ACM International Conference on Management of Data (SIGMOD ’18), pp. 221–230 (2018)
Postgresql. https://www.postgresql.org (2021). Accessed 3 Oct 2021
Apache spark. http://spark.apache.org (2021). Accessed 3 Oct 2021
Prestodb. https://prestodb.github.io/ (2021). Accessed 3 Oct 2021
Kostas, T., Sellis, T., Jensen, C.S.: A reinforcement learning approach for adaptive query processing. Technical Report (2008)
Marcus, R., Papaemmanouil, O.: Deep reinforcement learning for join order enumeration. In: Proceedings of the 1st ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD ’18), pp. 3:1–3:4 (2018)
Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J.M., Stoica, I.: Learning to optimize join queries with deep reinforcement learning. arXiv:1808.03196 (2018)
Shi, H., Liu, S., Wu, H., Li, R., Liu, S., Kwok, N., Peng, Y.: Oscillatory particle swarm optimizer. Appl. Soft Comput. 73, 316–327 (2018)
Article Google Scholar
Ying, C., Ying, C., Ban, C.: A performance optimization strategy based on degree of parallelism and allocation fitness. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–8 (2018)
Article Google Scholar
Yan, W., Li, G., Wu, Z., Wang, S., Yu, P.S.: Extracting diverse-shapelets for early classification on time series. World Wide Web 23(6), 3055–3081 (2020)
Article Google Scholar
Wu, Z., Cao, Z., Wang, Y.: Multimedia selection operation placement. Multimed. Tools Appl. 54(1), 69–96 (2011)
Article Google Scholar
Wu, Z., Shen, S., Zhou, H., Li, H., Lu, Z., Zou, D.: An effective approach for the protection of user commodity viewing privacy in e-commerce website. Knowl.-Based Syst. 220, 106952 (2021)
Article Google Scholar
Wu, Z., Li, G., Shen, S., Lian, X., Chen, E., Xu, G.: Constructing dummy query sequences to protect location privacy and query privacy in location-based services. World Wide Web 24(1), 25–49 (2021)
Article Google Scholar
Wu, Z., Shen, S., Lian, X., Su, X., Chen, E.: A dummy-based user privacy protection approach for text information retrieval. Knowl.-Based Syst. 195, 105679 (2020)
Article Google Scholar
Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-LSTM for join order selection. In: Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE ’20), pp. 1297–1308 (2020)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Graefe, G.: Rule-Based Query Optimization in Extensible Database Systems. PhD thesis, University of Wisconsin-Madison (1987)
Goetz, G.: The cascades framework for query optimization. IEEE Database Eng. Bull. 18(3), 19–29 (1995)
Google Scholar
Goldstein, J., Larson, P.Å.: Optimizing queries using materialized views: A practical, scalable solution. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD ’01), pp. 331–342 (2001)
Tpc-h – homepage. http://www.tpc.org/tpch/ (2021). Accessed 5 Oct 2021
Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52(3/4), 591–611 (1965)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to appreciate the comments from the anonymous reviewers. This work is funded in part by the China National Science Foundation (Grant No.62072230), Open Project of State Key Laboratory for Novel Software Technology (Grant No. KFKT2021B33), Jiangsu Province Science and Technology Key Program (grant number BE2021729), the Fundamental Research Funds for the Central Universities (No. 020214380089, 020214380098), and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Rong Gu, Yi Zhang, Liangliang Yin, Lingyi Song, Wenjie Huang, Chunfeng Yuan, Zhaokang Wang, Guanghui Zhu & Yihua Huang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Zhaokang Wang

Authors

Rong Gu
View author publications
You can also search for this author inPubMed Google Scholar
Yi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Liangliang Yin
View author publications
You can also search for this author inPubMed Google Scholar
Lingyi Song
View author publications
You can also search for this author inPubMed Google Scholar
Wenjie Huang
View author publications
You can also search for this author inPubMed Google Scholar
Chunfeng Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Zhaokang Wang
View author publications
You can also search for this author inPubMed Google Scholar
Guanghui Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Yihua Huang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Rong Gu, Zhaokang Wang, Guanghui Zhu or Yihua Huang.

Ethics declarations

Conflicts of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gu, R., Zhang, Y., Yin, L. et al. Coral: federated query join order optimization based on deep reinforcement learning. World Wide Web 26, 3093–3118 (2023). https://doi.org/10.1007/s11280-023-01156-0

Download citation

Received: 12 October 2021
Revised: 16 November 2022
Accepted: 21 February 2023
Published: 12 June 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11280-023-01156-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coral: federated query join order optimization based on deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning-Based SPARQL Join Ordering Optimizer

Towards online training for RL-based query optimizer

Investigating Learning Join Order Optimization Strategies for Rule-based Data Engines

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now