skip to main content
10.1145/3528588.3528655acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

From zero to hero: generating training data for question-to-cypher models

Published: 01 February 2023 Publication History

Abstract

Graph databases employ graph structures such as nodes, attributes and edges to model and store relationships among data. To access this data, graph query languages (GQL) such as Cypher are typically used, which might be difficult to master for end-users. In the context of relational databases, sequence to SQL models, which translate natural language questions to SQL queries, have been proposed. While these Neural Machine Translation (NMT) models increase the accessibility of relational databases, NMT models for graph databases are not yet available mainly due to the lack of suitable parallel training data. In this short paper we sketch an architecture which enables the generation of synthetic training data for the graph query language Cypher.

References

[1]
Artur Baranowski and Nico Hochgeschwender. 2021. Grammar-Constrained Neural Semantic Parsing with LR Parsers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 1275--1279.
[2]
Djamila Romaissa Beddiar, Md Saroar Jahan, and Mourad Oussalah. 2021. Data Expansion using Back Translation and Paraphrasing for Hate Speech Detection. CoRR abs/2106.04681 (2021). arXiv:2106.04681
[3]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1533--1544.
[4]
Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Katrin Erk and Noah A. Smith (Eds.). Association for Computational Linguistics, Stroudsburg, PA, USA, 1631--1640.
[5]
Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Jian Yin, Hong Chi, James Cao, Peng Chen, and Ming Zhou. 2018. Question Generation from SQL Queries Improves Neural Semantic Parsing. Association for Computational Linguistics, Brussels, Belgium, 1597--1607.
[6]
Ann-Kathrin Hartmann, Edgard Marx, and Tommaso Soru. 2018. Generating a Large Dataset for Neural Question Answering over the DBpedia Knowledge Base. (2018).
[7]
Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-Source Toolkit for Neural Machine Translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, Vancouver, Canada, 67--72.
[8]
Seiichiro Kondo, Kengo Hotate, Masahiro Kaneko, and Mamoru Komachi. 2021. Sentence concatenation approach to data augmentation for neural machine translation. arXiv preprint arXiv:2104.08478 (2021).
[9]
Joosung Lee. 2021. Paraphrasing via Ranking Many Candidates. CoRR abs/2107.09274 (2021). arXiv:2107.09274
[10]
A. Nayak, A. Poriya, and Dikshay Poojary. 2013. Article: Type of nosql databases and its comparison with relational databases. International Journal of Applied Information Systems 5 (01 2013), 16--19.
[11]
Sitalakshmi Venkatraman, Kiran Fahd Samuel Kaspi, and Ramanathan Venkatraman. 2016. SQL Versus NoSQL Movement with Big Data Analytics. International Journal of Information Technology and Computer Science 8, 12 (2016), 59--66.
[12]
Xinyi Wang, Hieu Pham, Zihang Dai, and Graham Neubig. 2018. SwitchOut: an efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512 (2018).
[13]
Kun Wu, Lijie Wang, Zhenghua Li, Ao Zhang, Xinyan Xiao, Hua Wu, Min Zhang, and Haifeng Wang. 2021. Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing. In Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8974--8983.
[14]
Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, and Vadim Sheinin. 2018. Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks. CoRR abs/1804.00823 (2018). arXiv:1804.00823
[15]
Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017). arXiv:1709.00103

Cited By

View all
  • (2024)Graph Detective: A User Interface for Intuitive Graph Exploration Through Visualized QueriesProceedings of the ACM Symposium on Document Engineering 202410.1145/3685650.3685660(1-9)Online publication date: 20-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
NLBSE '22: Proceedings of the 1st International Workshop on Natural Language-based Software Engineering
May 2022
87 pages
ISBN:9781450393430
DOI:10.1145/3528588
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SQL
  2. cypher
  3. data generation
  4. machine learning
  5. neural machine translation

Qualifiers

  • Short-paper

Funding Sources

Conference

ICSE '22
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)4
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Graph Detective: A User Interface for Intuitive Graph Exploration Through Visualized QueriesProceedings of the ACM Symposium on Document Engineering 202410.1145/3685650.3685660(1-9)Online publication date: 20-Aug-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media