CySpider: A Neural Semantic Parsing Corpus with Baseline Models for Property Graphs

Zhao, Ziyu; Liu, Wei; French, Tim; Stewart, Michael

doi:10.1007/978-981-99-8391-9_10

Ziyu Zhao^11,12,
Wei Liu^11,12,
Tim French^11,12 &
…
Michael Stewart^11,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14472))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

624 Accesses

Abstract

Enterprise knowledge graphs are gaining increasing popularity in industrial applications, with a pressing demand for natural language interfaces to support non-technical end-users. For natural language queries to relational databases, the neural semantic parsing task Text-to-SQL achieves strong performance in translating text inputs to SQL queries. However, very few public corpora are available for the training of neural semantic parsing models that convert textual queries to graph query languages. In this research, we develop a generic SQL2Cypher algorithm that can map a SQL query to a set of Cypher clauses, where Cypher is a query language used by a popular property graph database Neo4j. The converted Cypher statement is then combined with the original natural language query to create a parallel corpus that enables end-to-end training of neural semantic parsing models for Text-to-Cypher. To evaluate the dataset quality, we construct a corresponding graph database to obtain execution accuracy. In addition, the Text-to-Cypher corpus features four transformer-based baseline models. The availability of such corpus and baseline models is critical in developing and benchmarking new machine learning methods in advancing natural language interfaces for fact retrieval from large graph-based knowledge repositories. The source code and dataset are available at github(https://github.com/22842219/SemanticParser4Graph).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Carata, L.: Cyp2SQL: cypher to SQL translation (2019)
Google Scholar
Gan, Y., et al.: Natural SQL: making SQL easier to infer from natural language specifications. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2030–2042. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)
Google Scholar
Li, J., et al.: Can LLM already serve as a database interface? A big bench for large-scale database grounded Text-to-SQLs. arXiv preprint arXiv:2305.03111 (2023)
Li, S., Yang, Z., Zhang, X., Zhang, W., Lin, X.: SQL2Cypher: automated data and query migration from RDBMS to GDBMS. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds.) WISE 2021. LNCS, vol. 13081, pp. 510–517. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91560-5_39
Chapter Google Scholar
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP (2020)
Google Scholar
Marton, J., Szárnyas, G., Varró, D.: Formalising openCypher graph queries in relational algebra. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A. (eds.) ADBIS 2017. LNCS, vol. 10509, pp. 182–196. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66917-5_13
Chapter Google Scholar
Ni, P., Okhrati, R., Guan, S., Chang, V.: Knowledge graph and deep learning-based text-to-GraphQL model for intelligent medical consultation chatbot. Inf. Syst. Front. 1–20 (2022)
Google Scholar
Pourreza, M., Rafiei, D.: Din-SQL: decomposed in-context learning of text-to-SQL with self-correction. arXiv preprint arXiv:2304.11015 (2023)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
Rozière, B., et al.: Code Llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)
Rubin, O., Berant, J.: SmBoP: semi-autoregressive bottom-up semantic parsing. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 311–324. Association for Computational Linguistics, Online (2021)
Google Scholar
Saparina, I., Osokin, A.: SPARQLing database queries from intermediate question decompositions. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8984–8998. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)
Google Scholar
Steer, B.A., Alnaimi, A., Lotz, M.A., Cuadrado, F., Vaquero, L.M., Varvenne, J.: Cytosm: declarative property graph queries without data migration. In: Proceedings of the Fifth International Workshop on Graph Data-Management Experiences & Systems, pp. 1–6 (2017)
Google Scholar
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: Rat-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. arXiv preprint arXiv:1911.04942 (2019)
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Code T5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)
Google Scholar
Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL: knowledge-based type-aware neural text-to-SQL generation. arXiv preprint arXiv:1804.09769 (2018)
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and Text-to-SQL task. arXiv preprint arXiv:1809.08887 (2018)
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017)
Google Scholar

Download references

Acknowledgment

This research is supported by the Australian Research Council through the Centre of Transforming Maintenance through Data Science (grant number IC180100030), funded by the Australian Government.

Author information

Authors and Affiliations

UWA NLP-TLP Group, 35 Stirling Hwy, Crawley, Perth, WA, 6009, Australia
Ziyu Zhao, Wei Liu, Tim French & Michael Stewart
School of Physics, Mathematics and Computing, The University of Western Australia, Crawley, Australia
Ziyu Zhao, Wei Liu, Tim French & Michael Stewart

Authors

Ziyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tim French
View author publications
You can also search for this author in PubMed Google Scholar
Michael Stewart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziyu Zhao .

Editor information

Editors and Affiliations

The University of Sydney, Darlington, NSW, Australia
Tongliang Liu
Monash University, Clayton, VIC, Australia
Geoff Webb
The University of Newcastle, Callaghan, NSW, Australia
Lin Yue
CSIRO Data61, Sydney, NSW, Australia
Dadong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Z., Liu, W., French, T., Stewart, M. (2024). CySpider: A Neural Semantic Parsing Corpus with Baseline Models for Property Graphs. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_10

Download citation

DOI: https://doi.org/10.1007/978-981-99-8391-9_10
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CySpider: A Neural Semantic Parsing Corpus with Baseline Models for Property Graphs