Skip to main content
Log in

Coral: federated query join order optimization based on deep reinforcement learning

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The rise of diversified data engines has created the need for federated queries. A federated query can take a query and provide data analysis based on data from various data engines. Since the query data originates from multiple data engines, federated queries usually rely on join operation and data migration to complete the query and take a long time. The challenges of optimizing federated queries lie on join order selection and data migration coordination. However, enumerating all join orders is impractical because the set of join orders grows exponentially with the number of relations to be joined. To improve the performance of federated queries, we present a deep reinforcement learning-based approach on optimizing join order and join engine selection for federated queries and design an deep Q-network-based (DQN-based) optimizer. The DQN-based optimizer can generate join search policies that optimize the join order selection for datasets with a given cost model. Based on the DQN-based optimizer, we implement a federated query system Coral which can provide optimization for join order selection of federated queries. With the optimized join order, Coral can transform a federated query into a set of subqueries which will be assigned to and executed on different data engines. We also propose a subquery cache optimization to optimize data migration during the query execution. The extensive experimental evaluation demonstrates that Coral can significantly reduce the query latency of federated queries and achieve a speedup of up to 5.03\(\times \) compared to the cutting-edge federated query systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Coral is open sourced at https://github.com/PasaLab/Coral

References

  1. Clickhouse. https://clickhouse.tech (2021). Accessed 20 Sep 2021

  2. Elasticsearch. https://www.elastic.co (2021). Accessed 20 Sep 2021

  3. Shamgunov, N.: The MemsQL in-memory database system. In: Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics (IMDM ’14), p. 106 (2014)

  4. Flink. https://flink.apache.org (2021). Accessed 21 Sep 2021

  5. The dzone guide to data persistence. https://dzone.com/guides/data-persistence-2 (2021). Accessed 21 Sep 2021

  6. Xu, L., Cole, R.L., Ting, D.: Learning to optimize federated queries. In: Proceedings of the 2nd ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD’19), pp. 1–7 (2019)

  7. Giannakouris, V., Papailiou, N., Tsoumakos, D., Koziris, N.: MuSQLE: Distributed SQL query execution over multiple engine environments. In: Proceedings of the 4th IEEE International Conference on Big Data (BigData ’16), pp. 452–461 (2016)

  8. Duggan, J., Elmore, A.J., Stonebraker, M., Balazinska, M., Howe, B., Kepner, J., Madden, S., Maier, D., Mattson, T., Zdonik, S.B.: The BigDAWG polystore system. ACM SIGMOD Record 44(2), 11–16 (2015)

    Article  Google Scholar 

  9. LeFevre, J., Sankaranarayanan, J., Hacigümüs, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the 33rd ACM International Conference on Management of Data (SIGMOD ’14), pp. 1591–1602 (2014)

  10. Vogt, M., Stiemer, A., Schuldt, H.: ICARUS: Towards a multistore database system. In: Proceedings of the 5th IEEE International Conference on Big Data (BigData ’17), pp. 2490–2499 (2017)

  11. Ying. Research and implementation on cross-platform unified big data SQL query system. Master’s thesis, Nanjing University (2019)

  12. Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M.J., Lemire, D.: Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings of the 37th ACM International Conference on Management of Data (SIGMOD ’18), pp. 221–230 (2018)

  13. Postgresql. https://www.postgresql.org (2021). Accessed 3 Oct 2021

  14. Apache spark. http://spark.apache.org (2021). Accessed 3 Oct 2021

  15. Prestodb. https://prestodb.github.io/ (2021). Accessed 3 Oct 2021

  16. Kostas, T., Sellis, T., Jensen, C.S.: A reinforcement learning approach for adaptive query processing. Technical Report (2008)

  17. Marcus, R., Papaemmanouil, O.: Deep reinforcement learning for join order enumeration. In: Proceedings of the 1st ACM International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiMD ’18), pp. 3:1–3:4 (2018)

  18. Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J.M., Stoica, I.: Learning to optimize join queries with deep reinforcement learning. arXiv:1808.03196 (2018)

  19. Shi, H., Liu, S., Wu, H., Li, R., Liu, S., Kwok, N., Peng, Y.: Oscillatory particle swarm optimizer. Appl. Soft Comput. 73, 316–327 (2018)

    Article  Google Scholar 

  20. Ying, C., Ying, C., Ban, C.: A performance optimization strategy based on degree of parallelism and allocation fitness. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–8 (2018)

    Article  Google Scholar 

  21. Yan, W., Li, G., Wu, Z., Wang, S., Yu, P.S.: Extracting diverse-shapelets for early classification on time series. World Wide Web 23(6), 3055–3081 (2020)

    Article  Google Scholar 

  22. Wu, Z., Cao, Z., Wang, Y.: Multimedia selection operation placement. Multimed. Tools Appl. 54(1), 69–96 (2011)

    Article  Google Scholar 

  23. Wu, Z., Shen, S., Zhou, H., Li, H., Lu, Z., Zou, D.: An effective approach for the protection of user commodity viewing privacy in e-commerce website. Knowl.-Based Syst. 220, 106952 (2021)

    Article  Google Scholar 

  24. Wu, Z., Li, G., Shen, S., Lian, X., Chen, E., Xu, G.: Constructing dummy query sequences to protect location privacy and query privacy in location-based services. World Wide Web 24(1), 25–49 (2021)

    Article  Google Scholar 

  25. Wu, Z., Shen, S., Lian, X., Su, X., Chen, E.: A dummy-based user privacy protection approach for text information retrieval. Knowl.-Based Syst. 195, 105679 (2020)

    Article  Google Scholar 

  26. Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-LSTM for join order selection. In: Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE ’20), pp. 1297–1308 (2020)

  27. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  28. Graefe, G.: Rule-Based Query Optimization in Extensible Database Systems. PhD thesis, University of Wisconsin-Madison (1987)

  29. Goetz, G.: The cascades framework for query optimization. IEEE Database Eng. Bull. 18(3), 19–29 (1995)

    Google Scholar 

  30. Goldstein, J., Larson, P.Å.: Optimizing queries using materialized views: A practical, scalable solution. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD ’01), pp. 331–342 (2001)

  31. Tpc-h – homepage. http://www.tpc.org/tpch/ (2021). Accessed 5 Oct 2021

  32. Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52(3/4), 591–611 (1965)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to appreciate the comments from the anonymous reviewers. This work is funded in part by the China National Science Foundation (Grant No.62072230), Open Project of State Key Laboratory for Novel Software Technology (Grant No. KFKT2021B33), Jiangsu Province Science and Technology Key Program (grant number BE2021729), the Fundamental Research Funds for the Central Universities (No. 020214380089, 020214380098), and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rong Gu, Zhaokang Wang, Guanghui Zhu or Yihua Huang.

Ethics declarations

Conflicts of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, R., Zhang, Y., Yin, L. et al. Coral: federated query join order optimization based on deep reinforcement learning. World Wide Web 26, 3093–3118 (2023). https://doi.org/10.1007/s11280-023-01156-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-023-01156-0

Keywords

Navigation