Abstract
In this paper, we present a tool for querying relational DBs that uses a KG as an approach to generate SQL queries from NL specifications. In this approach, we argue that a KG representation of a relational DB schema can become an auxiliary tool in the translation process. Furthermore, we propose to automate the process of generating such a KG. Our approach to provide an NL interface for relational DBs comprises two major tasks: (1) generation of a KG from a relational DB schema and (2) translation of NL queries to SQL based on the semantics provided by the respective KG. We study the effectiveness of our approach using a benchmark dataset containing 82 NL query examples from the Spider dataset, considering the domain of Formula 1. Our approach is able to correctly translate these queries, which is verified against the expected results provided by our benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This is a view that describes the structure of all tables existing in the DB.
- 2.
This is a view that describes all columns present in the DB tables.
- 3.
Resource Description Framework (RDF) is a standard model for data interchange on the Web. It serves as data model for generating KG’s, enabling the representation of structured, linked, and semantic knowledge.
- 4.
Available at https://stanfordnlp.github.io/stanza/.
- 5.
Available at: https://pypi.org/project/pymysql/.
- 6.
Pandas is a Python library for data analysis and manipulation. Available at: https://pandas.pydata.org/.
- 7.
A two-dimensional data structure provided by the Pandas library, which is similar to a table, where the data is organized in rows and columns.
- 8.
Available at: https://openai.com/blog/chatgpt.
References
Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language interfaces for databases. VLDB J. 793–919 (2019). https://doi.org/10.48550/arXiv.1906.08990
Baik, C., Jagadish, H.V., Li, Y.: Bridging the semantic gap with SQL query logs in natural language interfaces to databases. In: Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE), pp. 374–385 (2019). https://doi.org/10.48550/arXiv.1902.00031
Basik, F., et al.: DBPal: a learned NL-interface for databases. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1765–1768 (2018). https://doi.org/10.1145/3183713.3193562
Hogan, A., et al.: Knowledge graphs. ACM Comput. Surv. (Csur) 1–37 (2021). https://doi.org/10.1145/3447772
Kim, H., So, B., Han, W., Lee, H.: Natural language to SQL: where are we today?. Proc. VLDB Endow. 13(10), 1737–1750 (2020). https://doi.org/10.14778/3401960.3401970
Liu, A., Hu, X., Wen, L., Yu, Philip, S.: A Comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability. arXiv preprint arXiv:2303.13547 (2023). https://doi.org/10.48550/arXiv.2303.13547
Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow. 8(1), 73–84 (2014)
Quamar, A., Efthymiou, V., Lei, C., Özcan, F.: Natural language interfaces to data. Found. Trends Databases 319–414 (2022). https://doi.org/10.48550/arXiv.2212.13074
Saha, D., Floratou, A., Sankaranarayanan, K., Minhas, U.F., Mittal, A.R., Özcan, F.: ATHENA: an ontology-driven system for natural language querying over relational data stores. Proc. VLDB Endow. 9(12), 1209–1220 (2016). https://doi.org/10.14778/2994509.2994536
Sen, J., et al.: Athena++: natural language querying for complex nested SQL queries. Proc. VLDB Endow. 13(12), 2747–2759 (2020). https://doi.org/10.14778/3407790.3407858
Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.,: SQLizer: query synthesis from natural language. Proc. ACM Program. Lang. 1(OOPSLA), 1–26 (2017). https://doi.org/10.1145/3133887
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. arXiv: \(\text{Computation}\) and Language (2018). https://doi.org/10.18653/v1/D18-1425
Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL Knowledge-based type-aware neural text-to-SQL generation. arXiv preprint (2018). https://doi.org/10.48550/arXiv.1804.09769
Xu, X., Liu, C., Song, D.: SQLNet generating structured queries from natural language without reinforcement learning. arXiv preprint (2017). https://doi.org/10.48550/arXiv.1711.04436
Acknowledgment
This work is supported by the author’s research grants from CAPES and CNPq. The authors would like to thank the reviewers for their comments on this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Campêlo, R.A., Laender, A.H.F., da Silva, A.S. (2023). Using Knowledge Graphs to Generate SQL Queries from Textual Specifications. In: Sales, T.P., Araújo, J., Borbinha, J., Guizzardi, G. (eds) Advances in Conceptual Modeling. ER 2023. Lecture Notes in Computer Science, vol 14319. Springer, Cham. https://doi.org/10.1007/978-3-031-47112-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-47112-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47111-7
Online ISBN: 978-3-031-47112-4
eBook Packages: Computer ScienceComputer Science (R0)