Abstract
Enterprise knowledge graphs are gaining increasing popularity in industrial applications, with a pressing demand for natural language interfaces to support non-technical end-users. For natural language queries to relational databases, the neural semantic parsing task Text-to-SQL achieves strong performance in translating text inputs to SQL queries. However, very few public corpora are available for the training of neural semantic parsing models that convert textual queries to graph query languages. In this research, we develop a generic SQL2Cypher algorithm that can map a SQL query to a set of Cypher clauses, where Cypher is a query language used by a popular property graph database Neo4j. The converted Cypher statement is then combined with the original natural language query to create a parallel corpus that enables end-to-end training of neural semantic parsing models for Text-to-Cypher. To evaluate the dataset quality, we construct a corresponding graph database to obtain execution accuracy. In addition, the Text-to-Cypher corpus features four transformer-based baseline models. The availability of such corpus and baseline models is critical in developing and benchmarking new machine learning methods in advancing natural language interfaces for fact retrieval from large graph-based knowledge repositories. The source code and dataset are available at github(https://github.com/22842219/SemanticParser4Graph).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carata, L.: Cyp2SQL: cypher to SQL translation (2019)
Gan, Y., et al.: Natural SQL: making SQL easier to infer from natural language specifications. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2030–2042. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)
Li, J., et al.: Can LLM already serve as a database interface? A big bench for large-scale database grounded Text-to-SQLs. arXiv preprint arXiv:2305.03111 (2023)
Li, S., Yang, Z., Zhang, X., Zhang, W., Lin, X.: SQL2Cypher: automated data and query migration from RDBMS to GDBMS. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds.) WISE 2021. LNCS, vol. 13081, pp. 510–517. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91560-5_39
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP (2020)
Marton, J., Szárnyas, G., Varró, D.: Formalising openCypher graph queries in relational algebra. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A. (eds.) ADBIS 2017. LNCS, vol. 10509, pp. 182–196. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66917-5_13
Ni, P., Okhrati, R., Guan, S., Chang, V.: Knowledge graph and deep learning-based text-to-GraphQL model for intelligent medical consultation chatbot. Inf. Syst. Front. 1–20 (2022)
Pourreza, M., Rafiei, D.: Din-SQL: decomposed in-context learning of text-to-SQL with self-correction. arXiv preprint arXiv:2304.11015 (2023)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
Rozière, B., et al.: Code Llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)
Rubin, O., Berant, J.: SmBoP: semi-autoregressive bottom-up semantic parsing. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 311–324. Association for Computational Linguistics, Online (2021)
Saparina, I., Osokin, A.: SPARQLing database queries from intermediate question decompositions. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8984–8998. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)
Steer, B.A., Alnaimi, A., Lotz, M.A., Cuadrado, F., Vaquero, L.M., Varvenne, J.: Cytosm: declarative property graph queries without data migration. In: Proceedings of the Fifth International Workshop on Graph Data-Management Experiences & Systems, pp. 1–6 (2017)
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: Rat-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. arXiv preprint arXiv:1911.04942 (2019)
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Code T5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)
Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL: knowledge-based type-aware neural text-to-SQL generation. arXiv preprint arXiv:1804.09769 (2018)
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and Text-to-SQL task. arXiv preprint arXiv:1809.08887 (2018)
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017)
Acknowledgment
This research is supported by the Australian Research Council through the Centre of Transforming Maintenance through Data Science (grant number IC180100030), funded by the Australian Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhao, Z., Liu, W., French, T., Stewart, M. (2024). CySpider: A Neural Semantic Parsing Corpus with Baseline Models for Property Graphs. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-8391-9_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)