Translating natural language questions to SQL queries (nested queries)

Swamidorai, Sindhuja; Murthy, T Satyanarayana; Sriharsha, K V

doi:10.1007/s11042-023-16987-2

Translating natural language questions to SQL queries (nested queries)

Published: 21 October 2023

Volume 83, pages 45391–45405, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sindhuja Swamidorai¹,
T Satyanarayana Murthy²^na1 &
K V Sriharsha ORCID: orcid.org/0000-0002-0453-0645³^na1

144 Accesses
Explore all metrics

Abstract

Real world questions are generally complex and need the user to extract information from multiple tables in a database using complex SQL queries like nested queries. Though the overall accuracy in translation of Natural Language queries to SQL queries lies close to 75%, the accuracy of complex queries is still quite less, around 60% in the current state-of-the-art models. In this vein, this study proposes to improve the current IRNet framework for translating natural language queries to nested SQL queries, one type of complex queries. Data oversampling is first used to boost the representation of nested queries in order to achieve this goal. Second, a novel loss function that computes the overall loss while accounting for the complexity of SQL, as measured by the quantity of SELECT columns and keywords in the SQL query. The proposed method exhibited a 5% improvement in prediction of hard and extra hard queries when tested on Spider’s development dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Querying knowledge graphs in natural language

Article Open access 06 January 2021

COMBINE: A Pipeline for SQL Generation from Natural Language

Sparse Single-Hidden Layer Feedforward Network for Mapping Natural Language Questions to SQL Queries

References

Parikh P, Chatterjee O, Jain M, Harsh A, Shahani G, Biswas R, Arya K (1999) Auto-Query-A simple natural language to SQL query generator for an e-learning platform. Academie Press, New York
Google Scholar
Wong A, Joiner D, Chiu C, Elsayed M, Pereira K, Khmelevsky Y, Mahony J (2021) A Survey of Natural Language Processing Implementation for Data Query Systems. IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE) 2021:1–8
Google Scholar
Baig MS, Imran A, Yasin AU, Butt AH, Khan MI (2022) Natural Language to SQL Queries: A Review. Technol 4(1):147–162
Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27
Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S et al. (2018) Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv:1809.08887
Wang B, Shin R, Liu X, Polozov O, Richardson M (2019) Rat-sql: Relation-aware schema encoding and linking for text-to-sql parser. arXiv:1911.04942
Zhong V, Xiong C, Socher R (2017) Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv:1709.0010
Yu T, Yasunaga M, Yang K, Zhang R, Wang D, Li Z, Radev D (2018) Syntaxsqlnet: Syntax tree networks for complex and cross-domaintext-to-sql task. arXiv:1810.05237
Lin K, Bogin B, Neumann M, Berant J, Gardner M (2019) Grammar-based neural text-to-sql generation. arXiv:1905.13326
Yu T, Li Z, Zhang Z, Zhang R, Radev D (2018) Typesql: Knowledge-based type-aware neural text-to-sql generation. arXiv:1804.09769
Guo J, Zhan Z, Gao Y, Xiao Y, Lou J-G, Liu T, Zhang D (2019) Towards complex text-to-sql in cross-domain database with intermediate representation. arXiv:1905.08205
Gan Y, Chen X, Xie J, Purver M, Woodward JR, Drake J, Zhang Q (2021) Natural SQL: making SQL easier to infer from natural language specifications. arXiv:2109.05153
Li Q, Li L, Li Q, Zhong J (2019) A comprehensive exploration on spider with fuzzy decision text-to-SQL model. IEEE Trans Ind Inf 16(4):2542–2550 (IEEE)
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Scholak T, Schucher N, Bahdanau D (2021) PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. arXiv:2109.05093
Ning Z, Zhang D, Zhang L, Yu H, Wan F (2021) Review of question answering technology based on Text to SQL. 2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA) 143–146. IEEE
Abbas S, Khan MU, Lee SU-J, Abbas A, Bashir AK (2022) A Review of NLIDB with Deep Learning: Findings. Challenges and Open Issues. IEEE Access, IEEE
Google Scholar
Brunner U, Stockinger K (2021) Valuenet: A natural language-to-sql system that learns from database information. 2021 IEEE 37th International Conference on Data Engineering (ICDE) 2177–2182. IEEE
Guo A, Zhao X, Ma W (2021) ER-SQL: Learning enhanced representation for Text-to-SQL using table contents. Neurocomput 465:359–370 (Elseiver)
Article Google Scholar
Rubin O, Berant J (2020) SmBoP: Semi-autoregressive bottom-up semantic parsing. arXiv:2010.12412
Bogin B, Gardner M, Berant J (2019) Representing schema structure with graph neural networks for text-to-SQL parsing. arXiv:1905.06241
Hui B, Geng R, Wang L, Qin B, Li B, Sun J, Li Y (2022) S\(^{2}\) SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers. arXiv:2203.06958
Bai T, Ge Y, Guo S, Zhang Z, Gong L (2020) Enhanced natural language interface for web-based information retrieval. IEEE Access 9:4233–4241 (IEEE)
Article Google Scholar
Clark K, Luong M-T, Le QV, Manning CD (2020) Electra: Pre-training text encoders as discriminators rather than generators. arXiv:2003.10555
Chen Z, Chen L, Zhao Y, Cao R, Xu Z, Zhu S, Yu K (2021) ShadowGNN: Graph projection neural network for text-to-SQL parser. arXiv:2104.04689
Cao R, Chen L, Chen Z, Zhao Y, Zhu S, Yu K (2021) LGESQL: line graph enhanced text-to-SQL model with mixed local and non-local relations. arXiv:2106.01093
Xie T, Wu CH, Shi P, Zhong R, Scholak T, Yasunaga M, Wu C-S, Zhong MY (2022) Pengcheng and Wang, Sida I and others,Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv:2201.05966
Aurelio YS, de Almeida GM, de Castro CL, Braga AP (2019) Learning from imbalanced data sets w, ith weighted cross-entropy function. Neural Process Lett 50(2):1937–1949 (Springer)
Article Google Scholar
Shorten C, Khoshgoftaar TM, Furht B (2021) Text data augmentation for deep learning. J Big Data 8(1):1–34 (Springer)
Article Google Scholar
Choi DH, Shin MC, Kim EG, Shin DR (2021) Ryansql: Recursively applyin,g sketch-based slot fillings for complex text-to-sql in cross-domain databases. Comput Linguist 47(2):309–332. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info
Cai R, Yuan J, Xu B, Hao Z (2021) SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL. Adv Neural Inf Process Syst 34:7664–7676
Google Scholar
Calimeri F, Cauteruccio F, Cinelli L, Marzullo A, Stamile C, Terracina G, Durand-Dubief F, Sappey-Marinier D (2021) A logic-based framework leveraging neural networks for studying the evolution of neurological disorders. Theory Pract Log Program 21(1):80–124
Article MathSciNet Google Scholar

Download references

Acknowledgements

The author would also like to thank the Student society, upGrad Education pvt ltd, and department of Information Technology, CBIT,hyderabad for providing ML laboratory to conduct the experiment(s).

Author information

T Satyanarayana Murthy and K V Sriharsha contributed equally to this work.

Authors and Affiliations

Data Science, UpGrad, Street-1, Bengaluru, 500075, Karnataka, India
Sindhuja Swamidorai
Information Technology, CBIT, Gandipet, Hyderabad, 560071, Telangana, India
T Satyanarayana Murthy
Computer Applications, NIT Trichy, Thuvvakudi, Tiruchirappalli, 620015, Tamil Nadu, India
K V Sriharsha

Authors

Sindhuja Swamidorai
View author publications
You can also search for this author in PubMed Google Scholar
T Satyanarayana Murthy
View author publications
You can also search for this author in PubMed Google Scholar
K V Sriharsha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K V Sriharsha.

Ethics declarations

Competing Interest

On behalf of all authors, the corresponding author states that there is no confict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Swamidorai, S., Murthy, T.S. & Sriharsha, K.V. Translating natural language questions to SQL queries (nested queries). Multimed Tools Appl 83, 45391–45405 (2024). https://doi.org/10.1007/s11042-023-16987-2

Download citation

Received: 23 January 2023
Revised: 24 August 2023
Accepted: 04 September 2023
Published: 21 October 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-16987-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Translating natural language questions to SQL queries (nested queries)

Abstract

Access this article

Similar content being viewed by others

Querying knowledge graphs in natural language

COMBINE: A Pipeline for SQL Generation from Natural Language

Sparse Single-Hidden Layer Feedforward Network for Mapping Natural Language Questions to SQL Queries

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Translating natural language questions to SQL queries (nested queries)

Abstract

Access this article

Similar content being viewed by others

Querying knowledge graphs in natural language

COMBINE: A Pipeline for SQL Generation from Natural Language

Sparse Single-Hidden Layer Feedforward Network for Mapping Natural Language Questions to SQL Queries

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation