Skip to main content
Log in

Translating natural language questions to SQL queries (nested queries)

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Real world questions are generally complex and need the user to extract information from multiple tables in a database using complex SQL queries like nested queries. Though the overall accuracy in translation of Natural Language queries to SQL queries lies close to 75%, the accuracy of complex queries is still quite less, around 60% in the current state-of-the-art models. In this vein, this study proposes to improve the current IRNet framework for translating natural language queries to nested SQL queries, one type of complex queries. Data oversampling is first used to boost the representation of nested queries in order to achieve this goal. Second, a novel loss function that computes the overall loss while accounting for the complexity of SQL, as measured by the quantity of SELECT columns and keywords in the SQL query. The proposed method exhibited a 5% improvement in prediction of hard and extra hard queries when tested on Spider’s development dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Parikh P, Chatterjee O, Jain M, Harsh A, Shahani G, Biswas R, Arya K (1999) Auto-Query-A simple natural language to SQL query generator for an e-learning platform. Academie Press, New York

    Google Scholar 

  2. Wong A, Joiner D, Chiu C, Elsayed M, Pereira K, Khmelevsky Y, Mahony J (2021) A Survey of Natural Language Processing Implementation for Data Query Systems. IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE) 2021:1–8

    Google Scholar 

  3. Baig MS, Imran A, Yasin AU, Butt AH, Khan MI (2022) Natural Language to SQL Queries: A Review. Technol 4(1):147–162

    Google Scholar 

  4. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27

  5. Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S et al. (2018) Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv:1809.08887

  6. Wang B, Shin R, Liu X, Polozov O, Richardson M (2019) Rat-sql: Relation-aware schema encoding and linking for text-to-sql parser. arXiv:1911.04942

  7. Zhong V, Xiong C, Socher R (2017) Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv:1709.0010

  8. Yu T, Yasunaga M, Yang K, Zhang R, Wang D, Li Z, Radev D (2018) Syntaxsqlnet: Syntax tree networks for complex and cross-domaintext-to-sql task. arXiv:1810.05237

  9. Lin K, Bogin B, Neumann M, Berant J, Gardner M (2019) Grammar-based neural text-to-sql generation. arXiv:1905.13326

  10. Yu T, Li Z, Zhang Z, Zhang R, Radev D (2018) Typesql: Knowledge-based type-aware neural text-to-sql generation. arXiv:1804.09769

  11. Guo J, Zhan Z, Gao Y, Xiao Y, Lou J-G, Liu T, Zhang D (2019) Towards complex text-to-sql in cross-domain database with intermediate representation. arXiv:1905.08205

  12. Gan Y, Chen X, Xie J, Purver M, Woodward JR, Drake J, Zhang Q (2021) Natural SQL: making SQL easier to infer from natural language specifications. arXiv:2109.05153

  13. Li Q, Li L, Li Q, Zhong J (2019) A comprehensive exploration on spider with fuzzy decision text-to-SQL model. IEEE Trans Ind Inf 16(4):2542–2550 (IEEE)

    Article  Google Scholar 

  14. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  15. Scholak T, Schucher N, Bahdanau D (2021) PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. arXiv:2109.05093

  16. Ning Z, Zhang D, Zhang L, Yu H, Wan F (2021) Review of question answering technology based on Text to SQL. 2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA) 143–146. IEEE

  17. Abbas S, Khan MU, Lee SU-J, Abbas A, Bashir AK (2022) A Review of NLIDB with Deep Learning: Findings. Challenges and Open Issues. IEEE Access, IEEE

    Google Scholar 

  18. Brunner U, Stockinger K (2021) Valuenet: A natural language-to-sql system that learns from database information. 2021 IEEE 37th International Conference on Data Engineering (ICDE) 2177–2182. IEEE

  19. Guo A, Zhao X, Ma W (2021) ER-SQL: Learning enhanced representation for Text-to-SQL using table contents. Neurocomput 465:359–370 (Elseiver)

    Article  Google Scholar 

  20. Rubin O, Berant J (2020) SmBoP: Semi-autoregressive bottom-up semantic parsing. arXiv:2010.12412

  21. Bogin B, Gardner M, Berant J (2019) Representing schema structure with graph neural networks for text-to-SQL parsing. arXiv:1905.06241

  22. Hui B, Geng R, Wang L, Qin B, Li B, Sun J, Li Y (2022) S\(^{2}\) SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers. arXiv:2203.06958

  23. Bai T, Ge Y, Guo S, Zhang Z, Gong L (2020) Enhanced natural language interface for web-based information retrieval. IEEE Access 9:4233–4241 (IEEE)

    Article  Google Scholar 

  24. Clark K, Luong M-T, Le QV, Manning CD (2020) Electra: Pre-training text encoders as discriminators rather than generators. arXiv:2003.10555

  25. Chen Z, Chen L, Zhao Y, Cao R, Xu Z, Zhu S, Yu K (2021) ShadowGNN: Graph projection neural network for text-to-SQL parser. arXiv:2104.04689

  26. Cao R, Chen L, Chen Z, Zhao Y, Zhu S, Yu K (2021) LGESQL: line graph enhanced text-to-SQL model with mixed local and non-local relations. arXiv:2106.01093

  27. Xie T, Wu CH, Shi P, Zhong R, Scholak T, Yasunaga M, Wu C-S, Zhong MY (2022) Pengcheng and Wang, Sida I and others,Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. arXiv:2201.05966

  28. Aurelio YS, de Almeida GM, de Castro CL, Braga AP (2019) Learning from imbalanced data sets w, ith weighted cross-entropy function. Neural Process Lett 50(2):1937–1949 (Springer)

    Article  Google Scholar 

  29. Shorten C, Khoshgoftaar TM, Furht B (2021) Text data augmentation for deep learning. J Big Data 8(1):1–34 (Springer)

    Article  Google Scholar 

  30. Choi DH, Shin MC, Kim EG, Shin DR (2021) Ryansql: Recursively applyin,g sketch-based slot fillings for complex text-to-sql in cross-domain databases. Comput Linguist 47(2):309–332. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info

  31. Cai R, Yuan J, Xu B, Hao Z (2021) SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL. Adv Neural Inf Process Syst 34:7664–7676

    Google Scholar 

  32. Calimeri F, Cauteruccio F, Cinelli L, Marzullo A, Stamile C, Terracina G, Durand-Dubief F, Sappey-Marinier D (2021) A logic-based framework leveraging neural networks for studying the evolution of neurological disorders. Theory Pract Log Program 21(1):80–124

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The author would also like to thank the Student society, upGrad Education pvt ltd, and department of Information Technology, CBIT,hyderabad for providing ML laboratory to conduct the experiment(s).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K V Sriharsha.

Ethics declarations

Competing Interest

On behalf of all authors, the corresponding author states that there is no confict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Swamidorai, S., Murthy, T.S. & Sriharsha, K.V. Translating natural language questions to SQL queries (nested queries). Multimed Tools Appl 83, 45391–45405 (2024). https://doi.org/10.1007/s11042-023-16987-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16987-2

Keywords

Navigation