Abstract
Real world questions are generally complex and need the user to extract information from multiple tables in a database using complex SQL queries like nested queries. Though the overall accuracy in translation of Natural Language queries to SQL queries lies close to 75%, the accuracy of complex queries is still quite less, around 60% in the current state-of-the-art models. In this vein, this study proposes to improve the current IRNet framework for translating natural language queries to nested SQL queries, one type of complex queries. Data oversampling is first used to boost the representation of nested queries in order to achieve this goal. Second, a novel loss function that computes the overall loss while accounting for the complexity of SQL, as measured by the quantity of SELECT columns and keywords in the SQL query. The proposed method exhibited a 5% improvement in prediction of hard and extra hard queries when tested on Spider’s development dataset.
Similar content being viewed by others
References
Parikh P, Chatterjee O, Jain M, Harsh A, Shahani G, Biswas R, Arya K (1999) Auto-Query-A simple natural language to SQL query generator for an e-learning platform. Academie Press, New York
Wong A, Joiner D, Chiu C, Elsayed M, Pereira K, Khmelevsky Y, Mahony J (2021) A Survey of Natural Language Processing Implementation for Data Query Systems. IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE) 2021:1–8
Baig MS, Imran A, Yasin AU, Butt AH, Khan MI (2022) Natural Language to SQL Queries: A Review. Technol 4(1):147–162
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27
Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S et al. (2018) Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv:1809.08887
Wang B, Shin R, Liu X, Polozov O, Richardson M (2019) Rat-sql: Relation-aware schema encoding and linking for text-to-sql parser. arXiv:1911.04942
Zhong V, Xiong C, Socher R (2017) Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv:1709.0010
Yu T, Yasunaga M, Yang K, Zhang R, Wang D, Li Z, Radev D (2018) Syntaxsqlnet: Syntax tree networks for complex and cross-domaintext-to-sql task. arXiv:1810.05237
Lin K, Bogin B, Neumann M, Berant J, Gardner M (2019) Grammar-based neural text-to-sql generation. arXiv:1905.13326
Yu T, Li Z, Zhang Z, Zhang R, Radev D (2018) Typesql: Knowledge-based type-aware neural text-to-sql generation. arXiv:1804.09769
Guo J, Zhan Z, Gao Y, Xiao Y, Lou J-G, Liu T, Zhang D (2019) Towards complex text-to-sql in cross-domain database with intermediate representation. arXiv:1905.08205
Gan Y, Chen X, Xie J, Purver M, Woodward JR, Drake J, Zhang Q (2021) Natural SQL: making SQL easier to infer from natural language specifications. arXiv:2109.05153
Li Q, Li L, Li Q, Zhong J (2019) A comprehensive exploration on spider with fuzzy decision text-to-SQL model. IEEE Trans Ind Inf 16(4):2542–2550 (IEEE)
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Scholak T, Schucher N, Bahdanau D (2021) PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. arXiv:2109.05093
Ning Z, Zhang D, Zhang L, Yu H, Wan F (2021) Review of question answering technology based on Text to SQL. 2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA) 143–146. IEEE
Abbas S, Khan MU, Lee SU-J, Abbas A, Bashir AK (2022) A Review of NLIDB with Deep Learning: Findings. Challenges and Open Issues. IEEE Access, IEEE
Brunner U, Stockinger K (2021) Valuenet: A natural language-to-sql system that learns from database information. 2021 IEEE 37th International Conference on Data Engineering (ICDE) 2177–2182. IEEE
Guo A, Zhao X, Ma W (2021) ER-SQL: Learning enhanced representation for Text-to-SQL using table contents. Neurocomput 465:359–370 (Elseiver)
Rubin O, Berant J (2020) SmBoP: Semi-autoregressive bottom-up semantic parsing. arXiv:2010.12412
Bogin B, Gardner M, Berant J (2019) Representing schema structure with graph neural networks for text-to-SQL parsing. arXiv:1905.06241
Hui B, Geng R, Wang L, Qin B, Li B, Sun J, Li Y (2022) S\(^{2}\) SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers. arXiv:2203.06958
Bai T, Ge Y, Guo S, Zhang Z, Gong L (2020) Enhanced natural language interface for web-based information retrieval. IEEE Access 9:4233–4241 (IEEE)
Clark K, Luong M-T, Le QV, Manning CD (2020) Electra: Pre-training text encoders as discriminators rather than generators. arXiv:2003.10555
Chen Z, Chen L, Zhao Y, Cao R, Xu Z, Zhu S, Yu K (2021) ShadowGNN: Graph projection neural network for text-to-SQL parser. arXiv:2104.04689
Cao R, Chen L, Chen Z, Zhao Y, Zhu S, Yu K (2021) LGESQL: line graph enhanced text-to-SQL model with mixed local and non-local relations. arXiv:2106.01093
Xie T, Wu CH, Shi P, Zhong R, Scholak T, Yasunaga M, Wu C-S, Zhong MY (2022) Pengcheng and Wang, Sida I and others,Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv:2201.05966
Aurelio YS, de Almeida GM, de Castro CL, Braga AP (2019) Learning from imbalanced data sets w, ith weighted cross-entropy function. Neural Process Lett 50(2):1937–1949 (Springer)
Shorten C, Khoshgoftaar TM, Furht B (2021) Text data augmentation for deep learning. J Big Data 8(1):1–34 (Springer)
Choi DH, Shin MC, Kim EG, Shin DR (2021) Ryansql: Recursively applyin,g sketch-based slot fillings for complex text-to-sql in cross-domain databases. Comput Linguist 47(2):309–332. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info
Cai R, Yuan J, Xu B, Hao Z (2021) SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL. Adv Neural Inf Process Syst 34:7664–7676
Calimeri F, Cauteruccio F, Cinelli L, Marzullo A, Stamile C, Terracina G, Durand-Dubief F, Sappey-Marinier D (2021) A logic-based framework leveraging neural networks for studying the evolution of neurological disorders. Theory Pract Log Program 21(1):80–124
Acknowledgements
The author would also like to thank the Student society, upGrad Education pvt ltd, and department of Information Technology, CBIT,hyderabad for providing ML laboratory to conduct the experiment(s).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
On behalf of all authors, the corresponding author states that there is no confict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Swamidorai, S., Murthy, T.S. & Sriharsha, K.V. Translating natural language questions to SQL queries (nested queries). Multimed Tools Appl 83, 45391–45405 (2024). https://doi.org/10.1007/s11042-023-16987-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16987-2