Skip to main content

Using Knowledge Graphs to Generate SQL Queries from Textual Specifications

  • Conference paper
  • First Online:
Advances in Conceptual Modeling (ER 2023)

Abstract

In this paper, we present a tool for querying relational DBs that uses a KG as an approach to generate SQL queries from NL specifications. In this approach, we argue that a KG representation of a relational DB schema can become an auxiliary tool in the translation process. Furthermore, we propose to automate the process of generating such a KG. Our approach to provide an NL interface for relational DBs comprises two major tasks: (1) generation of a KG from a relational DB schema and (2) translation of NL queries to SQL based on the semantics provided by the respective KG. We study the effectiveness of our approach using a benchmark dataset containing 82 NL query examples from the Spider dataset, considering the domain of Formula 1. Our approach is able to correctly translate these queries, which is verified against the expected results provided by our benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This is a view that describes the structure of all tables existing in the DB.

  2. 2.

    This is a view that describes all columns present in the DB tables.

  3. 3.

    Resource Description Framework (RDF) is a standard model for data interchange on the Web. It serves as data model for generating KG’s, enabling the representation of structured, linked, and semantic knowledge.

  4. 4.

    Available at https://stanfordnlp.github.io/stanza/.

  5. 5.

    Available at: https://pypi.org/project/pymysql/.

  6. 6.

    Pandas is a Python library for data analysis and manipulation. Available at: https://pandas.pydata.org/.

  7. 7.

    A two-dimensional data structure provided by the Pandas library, which is similar to a table, where the data is organized in rows and columns.

  8. 8.

    Available at: https://openai.com/blog/chatgpt.

References

  1. Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language interfaces for databases. VLDB J. 793–919 (2019). https://doi.org/10.48550/arXiv.1906.08990

  2. Baik, C., Jagadish, H.V., Li, Y.: Bridging the semantic gap with SQL query logs in natural language interfaces to databases. In: Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE), pp. 374–385 (2019). https://doi.org/10.48550/arXiv.1902.00031

  3. Basik, F., et al.: DBPal: a learned NL-interface for databases. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1765–1768 (2018). https://doi.org/10.1145/3183713.3193562

  4. Hogan, A., et al.: Knowledge graphs. ACM Comput. Surv. (Csur) 1–37 (2021). https://doi.org/10.1145/3447772

  5. Kim, H., So, B., Han, W., Lee, H.: Natural language to SQL: where are we today?. Proc. VLDB Endow. 13(10), 1737–1750 (2020). https://doi.org/10.14778/3401960.3401970

  6. Liu, A., Hu, X., Wen, L., Yu, Philip, S.: A Comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability. arXiv preprint arXiv:2303.13547 (2023). https://doi.org/10.48550/arXiv.2303.13547

  7. Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow. 8(1), 73–84 (2014)

    Article  Google Scholar 

  8. Quamar, A., Efthymiou, V., Lei, C., Özcan, F.: Natural language interfaces to data. Found. Trends Databases 319–414 (2022). https://doi.org/10.48550/arXiv.2212.13074

  9. Saha, D., Floratou, A., Sankaranarayanan, K., Minhas, U.F., Mittal, A.R., Özcan, F.: ATHENA: an ontology-driven system for natural language querying over relational data stores. Proc. VLDB Endow. 9(12), 1209–1220 (2016). https://doi.org/10.14778/2994509.2994536

    Article  Google Scholar 

  10. Sen, J., et al.: Athena++: natural language querying for complex nested SQL queries. Proc. VLDB Endow. 13(12), 2747–2759 (2020). https://doi.org/10.14778/3407790.3407858

    Article  Google Scholar 

  11. Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.,: SQLizer: query synthesis from natural language. Proc. ACM Program. Lang. 1(OOPSLA), 1–26 (2017). https://doi.org/10.1145/3133887

  12. Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. arXiv: \(\text{Computation}\) and Language (2018). https://doi.org/10.18653/v1/D18-1425

  13. Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL Knowledge-based type-aware neural text-to-SQL generation. arXiv preprint (2018). https://doi.org/10.48550/arXiv.1804.09769

  14. Xu, X., Liu, C., Song, D.: SQLNet generating structured queries from natural language without reinforcement learning. arXiv preprint (2017). https://doi.org/10.48550/arXiv.1711.04436

Download references

Acknowledgment

This work is supported by the author’s research grants from CAPES and CNPq. The authors would like to thank the reviewers for their comments on this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robson A. Campêlo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Campêlo, R.A., Laender, A.H.F., da Silva, A.S. (2023). Using Knowledge Graphs to Generate SQL Queries from Textual Specifications. In: Sales, T.P., Araújo, J., Borbinha, J., Guizzardi, G. (eds) Advances in Conceptual Modeling. ER 2023. Lecture Notes in Computer Science, vol 14319. Springer, Cham. https://doi.org/10.1007/978-3-031-47112-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47112-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47111-7

  • Online ISBN: 978-3-031-47112-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics