Skip to main content

CySpider: A Neural Semantic Parsing Corpus with Baseline Models for Property Graphs

  • Conference paper
  • First Online:
AI 2023: Advances in Artificial Intelligence (AI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14472))

Included in the following conference series:

  • 624 Accesses

Abstract

Enterprise knowledge graphs are gaining increasing popularity in industrial applications, with a pressing demand for natural language interfaces to support non-technical end-users. For natural language queries to relational databases, the neural semantic parsing task Text-to-SQL achieves strong performance in translating text inputs to SQL queries. However, very few public corpora are available for the training of neural semantic parsing models that convert textual queries to graph query languages. In this research, we develop a generic SQL2Cypher algorithm that can map a SQL query to a set of Cypher clauses, where Cypher is a query language used by a popular property graph database Neo4j. The converted Cypher statement is then combined with the original natural language query to create a parallel corpus that enables end-to-end training of neural semantic parsing models for Text-to-Cypher. To evaluate the dataset quality, we construct a corresponding graph database to obtain execution accuracy. In addition, the Text-to-Cypher corpus features four transformer-based baseline models. The availability of such corpus and baseline models is critical in developing and benchmarking new machine learning methods in advancing natural language interfaces for fact retrieval from large graph-based knowledge repositories. The source code and dataset are available at github(https://github.com/22842219/SemanticParser4Graph).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://db-engines.com/en/ranking.

  2. 2.

    https://github.com/jOOQ/jOOQ.

  3. 3.

    https://github.com/neo4j-contrib/sql2cypher.

  4. 4.

    https://github.com/mozilla/moz-sql-parser.

References

  1. Carata, L.: Cyp2SQL: cypher to SQL translation (2019)

    Google Scholar 

  2. Gan, Y., et al.: Natural SQL: making SQL easier to infer from natural language specifications. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2030–2042. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021)

    Google Scholar 

  3. Li, J., et al.: Can LLM already serve as a database interface? A big bench for large-scale database grounded Text-to-SQLs. arXiv preprint arXiv:2305.03111 (2023)

  4. Li, S., Yang, Z., Zhang, X., Zhang, W., Lin, X.: SQL2Cypher: automated data and query migration from RDBMS to GDBMS. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds.) WISE 2021. LNCS, vol. 13081, pp. 510–517. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91560-5_39

    Chapter  Google Scholar 

  5. Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)

  6. Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP (2020)

    Google Scholar 

  7. Marton, J., Szárnyas, G., Varró, D.: Formalising openCypher graph queries in relational algebra. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A. (eds.) ADBIS 2017. LNCS, vol. 10509, pp. 182–196. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66917-5_13

    Chapter  Google Scholar 

  8. Ni, P., Okhrati, R., Guan, S., Chang, V.: Knowledge graph and deep learning-based text-to-GraphQL model for intelligent medical consultation chatbot. Inf. Syst. Front. 1–20 (2022)

    Google Scholar 

  9. Pourreza, M., Rafiei, D.: Din-SQL: decomposed in-context learning of text-to-SQL with self-correction. arXiv preprint arXiv:2304.11015 (2023)

  10. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  11. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)

    MathSciNet  Google Scholar 

  12. Rozière, B., et al.: Code Llama: open foundation models for code. arXiv preprint arXiv:2308.12950 (2023)

  13. Rubin, O., Berant, J.: SmBoP: semi-autoregressive bottom-up semantic parsing. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 311–324. Association for Computational Linguistics, Online (2021)

    Google Scholar 

  14. Saparina, I., Osokin, A.: SPARQLing database queries from intermediate question decompositions. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8984–8998. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)

    Google Scholar 

  15. Steer, B.A., Alnaimi, A., Lotz, M.A., Cuadrado, F., Vaquero, L.M., Varvenne, J.: Cytosm: declarative property graph queries without data migration. In: Proceedings of the Fifth International Workshop on Graph Data-Management Experiences & Systems, pp. 1–6 (2017)

    Google Scholar 

  16. Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: Rat-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. arXiv preprint arXiv:1911.04942 (2019)

  17. Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Code T5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021)

    Google Scholar 

  18. Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.: TypeSQL: knowledge-based type-aware neural text-to-SQL generation. arXiv preprint arXiv:1804.09769 (2018)

  19. Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and Text-to-SQL task. arXiv preprint arXiv:1809.08887 (2018)

  20. Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017)

    Google Scholar 

Download references

Acknowledgment

This research is supported by the Australian Research Council through the Centre of Transforming Maintenance through Data Science (grant number IC180100030), funded by the Australian Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziyu Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Z., Liu, W., French, T., Stewart, M. (2024). CySpider: A Neural Semantic Parsing Corpus with Baseline Models for Property Graphs. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8391-9_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8390-2

  • Online ISBN: 978-981-99-8391-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics