Large Language Model for Querying Databases in Portuguese

Figueiredo, Lourenço; Pinheiro, Paulo; Cavique, Luís; Marques, Nuno

doi:10.1007/978-3-031-73503-5_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14969))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

192 Accesses

Abstract

This study introduces a system that helps non-expert users find information easily without knowing database languages or asking technicians for help. A specific domain is explored, focusing on a subscrip- tion-based sports facility, which serves as an open-source version of a real case study. Utilizing the star schema, the available data in the database is structured to provide accessibility through Portuguese Natural Language queries. Using a Large Language Model (LLM), SQL queries are generated based on the question and the provided star schema. We created a dataset with 115 highly challenging questions drawn from real-world usage scenarios to validate the correctness of the system. Challenges found during testing, like attribute value interpretation, out-of-scope questions, and temporal interval adequacy issues, highlight the insufficiency of the star schema alone in providing the needed context for generating accurate SQL queries by the LLM. Addressing these challenges through enhanced contextual information shows significant improvement in query correctness, with validation results increasing from 57.76% to 88.79%. This study shows the potential and limitations of LLMs in generating SQL queries from Portuguese Natural Language queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Research on the Text2SQL Method Based on Schema Linking Enhanced

Understanding SPARQL Queries: Are We Already There? Multilingual Natural Language Generation Based on SPARQL Queries and Large Language Models

A comparative survey of recent natural language interfaces for databases

Article Open access 28 August 2019

Notes

1.
https://platform.openai.com/docs/api-reference/chat/create.
2.
https://platform.openai.com/docs/api-reference/chat/create.
3.
Additional commented illustrative examples and the full set of questions are available at https://bit.ly/4bJ3cbs.
4.
https://platform.openai.com/docs/models/continuous-model-upgrades.

References

Llama3 blog. https://ai.meta.com/blog/meta-llama-3/
What is a database? https://www.oracle.com/database/what-is-database/
What is natural language processing (NLP)? https://www.ibm.com/topics/natural-language-processing
Brown, T.E.A.: Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. Curran Associates, Inc. (2020)
Google Scholar
Butler, M.A.: Issues and challenges of archiving and storing digital information: preserving the past for future scholars. J. Libr. Adm. 24(4), 61–79 (1997)
Article Google Scholar
Chang, Y., et al.: A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. (2024). https://doi.org/10.1145/3641289. Just Accepted
Deng, J., Lin, Y.: The benefits and challenges of chatgpt: an overview. Front. Comput. Intell. Syst. 2(2), 81–83 (2022)
Article Google Scholar
Gemini Team, et al.: Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530 (2024)
Huang, L., et al.: A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions (2023)
Google Scholar
Jiang, A.Q., et al.: Mistral 7b (2023). https://arxiv.org/abs/2310.06825
Kaplan, J., et al.: Scaling laws for neural language models (2020)
Google Scholar
Katsogiannis-Meimarakis, G., Xydas, M., Koutrika, G.: Natural language interfaces for databases with deep learning. Proc. VLDB Endow. 16(12), 3878–3881 (2023). https://doi.org/10.14778/3611540.3611575
Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural language processing: state of the art, current trends and challenges. Multim. Tools Appl. 82(3), 3713–3744 (2023)
Article Google Scholar
Liddy, E.D.: Natural language processing (2001)
Google Scholar
Majhadi, K., Machkour, M.: The history and recent advances of natural language interfaces for databases querying. E3S Web Conf. 229, 01039 (2021). https://doi.org/10.1051/e3sconf/202122901039
OpenAI, J.A.e.a.: Gpt-4 technical report (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Xu, K., et al.: Graph2seq: graph to sequence learning with attention-based neural networks (2018)
Google Scholar
Y., S.L., et al.: Natural language to SQL: automated query formation using NLP techniques. E3S Web Conf. 391, 01115 (2023). https://doi.org/10.1051/e3sconf/202339101115
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Nova de Lisboa, FCT, Costa da Caparica, Portugal
Lourenço Figueiredo & Nuno Marques
CEDIS, Sintra, Portugal
Paulo Pinheiro
Universidade Aberta and Lasige, FCUL, Lisbon, Portugal
Luís Cavique
NOVA LINCS, NOVA School of Science and Technology, Costa da Caparica, Portugal
Nuno Marques

Authors

Lourenço Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar
Luís Cavique
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Marques
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lourenço Figueiredo .

Editor information

Editors and Affiliations

University of Minho, Braga, Portugal
Manuel Filipe Santos
University of Minho, Braga, Portugal
José Machado
University of Minho, Braga, Portugal
Paulo Novais
University of Minho, Braga, Portugal
Paulo Cortez
Polytechnic Institute of Viana do Castelo, Viana do Castelo, Portugal
Pedro Miguel Moreira

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Figueiredo, L., Pinheiro, P., Cavique, L., Marques, N. (2025). Large Language Model for Querying Databases in Portuguese. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14969. Springer, Cham. https://doi.org/10.1007/978-3-031-73503-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-73503-5_1
Published: 16 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73502-8
Online ISBN: 978-3-031-73503-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Large Language Model for Querying Databases in Portuguese