Abstract
The development and diffusion of ontologies allowed the creation of large banks of information regarding multiple domains known as knowledge bases. Ontologies propose a way to represent information providing semantic meaning that allows the data to be machine-interpretable. However, enjoying such rich knowledge is a difficult task for the majority of potential users who do not know either the knowledge-base definition or how to write queries with SPARQL. Systems able to translate natural language questions into SPARQL queries have the potential to overcome this problem. In this paper, we propose an approach that combines the Named Entity Recognition and Neural Machine Translation tasks to perform an automatic translation of natural language questions into executables SPARQL queries. The resulting approach provides robustness to the presence of terms that do not occur in the training set. We evaluate the potential of our approach by using Monument and QALD-9, which are well-known datasets for Question Answering over the DBpedia ontology.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Note that we are interested in computing the answers, and not in reproducing syntactically the gold query.
- 2.
- 3.
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. TACL 5, 135–146 (2017)
Chen, Y., Li, H., Hua, Y., Qi, G.: Formal query building with query structure prediction for complex question answering over knowledge base. In: IJCAI (2020)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014)
Francois, C.: Deep Learning with Python. Manning Publications Company (2017)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML. Proceedings of ML Research, vol. 70, pp. 1243–1252. PMLR (2017)
Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing? Int. J. Hum.-Comput. Stud. 43(5–6), 907–928 (1995)
Hartmann, A., Marx, E., Soru, T.: Generating a large dataset for neural question answering over the DBpedia knowledge base (2018)
Hochreiter, S.: Recurrent neural net learning and vanishing gradient. Int. J. Uncert. Fuzz. KB Syst. 6(2), 107–116 (1998)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
Kapanipathi, et al.: Question answering over knowledge bases by leveraging semantic parsing and neuro-symbolic reasoning. arXiv preprint arXiv:2012.01707 (2020)
Klinger, R., Tomanek, K.: Classical probabilistic models and conditional random fields. Citeseer (2007)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Luong, M., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Luz, F.F., Finger, M.: Semantic parsing natural language into SPARQL: improving target language representation with neural attention. CoRR abs/1803.04329 (2018)
Ngomo, N.: 9th challenge on question answering over linked data (QALD-9). Language 7(1) (2018)
Panchbhai, A., Soru, T., Marx, E.: Exploring sequence-to-sequence models for SPARQL pattern composition. In: Villazón-Terrazas, B., Ortiz-Rodríguez, F., Tiwari, S.M., Shandilya, S.K. (eds.) KGSWC 2020. CCIS, vol. 1232, pp. 158–165. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65384-2_12
Pradel, C., Haemmerlé, O., Hernandez, N.: Natural language query interpretation into SPARQL using patterns (2013)
Soru, T., et al.: SPARQL as a foreign language. SEMANTiCS 2017 - Posters and Demos (2017). https://arxiv.org/abs/1708.07624
Steinmetz, N., Arning, A., Sattler, K.: From natural language questions to SPARQL queries: a pattern-based approach. In: BTW. LNI, vol. P-289, pp. 289–308. Gesellschaft für Informatik, Bonn (2019)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS, pp. 3104–3112 (2014)
W3C: Semantic web standards (2014). https://www.w3.org
Yin, X., Gromann, D., Rudolph, S.: Neural machine translating from natural language to SPARQL. CoRR abs/1906.09302 (2019)
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. arXiv preprint arXiv:1809.08887 (2018)
Zhang, R., et al.: Editing-based SQL query generation for cross-domain context-dependent questions. arXiv preprint arXiv:1909.00786 (2019)
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Borroto, M., Ricca, F., Cuteri, B. (2022). A Neural-Machine-Translation System Resilient to Out of Vocabulary Words for Translating Natural Language to SPARQL. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. Lecture Notes in Computer Science(), vol 13196. Springer, Cham. https://doi.org/10.1007/978-3-031-08421-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-08421-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08420-1
Online ISBN: 978-3-031-08421-8
eBook Packages: Computer ScienceComputer Science (R0)