Abstract
Natural language interfaces to databases is a growing field that enables end users to interact with relational databases without technical database skills. These interfaces solve the problem of synthesizing SQL queries based on natural language input from the user. There are considerable research interests around the topic but there are few systems to date that are deployed on top of an active enterprise data mart. We present our NL2SQL system designed for the banking sector, which can generate a SQL query from a user’s natural language question. The system is comprised of the NL2SQL model we developed, as well as the data simulation and the adaptive feedback framework to continuously improve model performance. The architecture of this NL2SQL model is built on our research on WikiSQL data, which we extended to support multitable scenarios via our unique table expand process. The data simulation and the feedback loop help the model continuously adjust to linguistic variation introduced by the domain specific knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases - an introduction. CoRR cmp-lg/9503016 (1995). http://arxiv.org/abs/cmp-lg/9503016
Aunalytics: Dayreak analytic database. https://www.aunalytics.com/products/daybreak/
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Dhamdhere, K., McCurley, K.S., Nahmias, R., Sundararajan, M., Yan, Q.: Analyza: exploring data with conversation. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 493–504. IUI 2017. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3025171.3025227, https://doi.org/10.1145/3025171.3025227
Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 731–742. Association for Computational Linguistics, Melbourne, Australia. July 2018. https://doi.org/10.18653/v1/P18-1068, https://www.aclweb.org/anthology/P18-1068
Elastic: Elasticsearch. https://www.elastic.co/enterprise-search
Facebook: Duckling. https://duckling.wit.ai/
Hwang, W., Yim, J., Park, S., Seo, M.: A comprehensive exploration on WikiSQL with table-aware word contextualization. CoRR abs/1902.01069 (2019). http://arxiv.org/abs/1902.01069
Inmon, B.: Data mart does not equal data warehouse (1999)
Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 963–973. Association for Computational Linguistics, Vancouver, Canada, July 2017. https://doi.org/10.18653/v1/P17-1089, https://www.aclweb.org/anthology/P17-1089
Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state of the art. Foundations Trends® Comput. Graph. Vis. 12(1–3), 1–308 (2020). https://doi.org/10.1561/0600000079, http://dx.doi.org/10.1561/0600000079
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, 7–9 May 2015, San Diego, CA, USA, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Kurita, K., Vyas, N., Pareek, A., Black, A.W., Tsvetkov, Y.: Measuring bias in contextualized word representations. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 166–172. Association for Computational Linguistics, Florence, Italy, August 2019. https://doi.org/10.18653/v1/W19-3823, https://www.aclweb.org/anthology/W19-3823
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions. Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)
Li, F., Jagadish, H.V.: NaLIR: an interactive natural language interface for querying relational databases. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 709–712. SIGMOD 2014. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2588555.2594519, https://doi.org/10.1145/2588555.2594519
Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4870–4888. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.438, https://www.aclweb.org/anthology/2020.findings-emnlp.438
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. Association for Computational Linguistics, Baltimore, Maryland, June 2014. https://doi.org/10.3115/v1/P14-5010, https://www.aclweb.org/anthology/P14-5010
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001). https://doi.org/10.1145/375360.375365, https://doi.org/10.1145/375360.375365
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). http://www.aclweb.org/anthology/D14-1162
Peterson, S.: Stars: A pattern language for query optimized schema (1994). http://c2.com/ppr/stars.html
Setlur, V., Battersby, S.E., Tory, M., Gossweiler, R., Chang, A.X.: Eviza: a natural language interface for visual analysis. In: Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pp. 365–377. UIST 2016. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2984511.2984588, https://doi.org/10.1145/2984511.2984588
Setlur, V., Tory, M., Djalali, A.: Inferencing underspecified natural language utterances in visual analysis. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 40–51. IUI 2019. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3301275.3302270, https://doi.org/10.1145/3301275.3302270
Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng. 19(1), 221–248 (2017). https://doi.org/10.1146/annurev-bioeng-071516-044442, https://doi.org/10.1146/annurev-bioeng-071516-044442, pMID: 28301734
Sun, T., et al.: Mitigating gender bias in natural language processing: literature review. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1630–1640. Association for Computational Linguistics, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1159, https://www.aclweb.org/anthology/P19-1159
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-sql parsers. CoRR abs/1911.04942 (2019). http://arxiv.org/abs/1911.04942
Wang, P., Shi, T., Reddy, C.K.: Text-to-SQL generation for question answering on electronic medical records. In: Huang, Y., King, I., Liu, T., van Steen, M. (eds.) WWW 2020: The Web Conference 2020, 20–24 April 2020, Taipei, Taiwan, pp. 350–361. ACM/IW3C2 (2020). https://doi.org/10.1145/3366423.3380120, https://doi.org/10.1145/3366423.3380120
Weir, N., et al.: DBPal: a fully pluggable NL2SQL training pipeline. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2347–2361. SIGMOD 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3318464.3380589, https://doi.org/10.1145/3318464.3380589
Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989). https://doi.org/10.1162/neco.1989.1.2.270
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771 (2019). http://arxiv.org/abs/1910.03771
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). http://arxiv.org/abs/1609.08144
Xu, X., Liu, C., Song, D.: SQLNet: generating structured queries from natural language without reinforcement learning. CoRR abs/1711.04436 (2017). http://arxiv.org/abs/1711.04436
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. CoRR abs/1809.08887 (2018). http://arxiv.org/abs/1809.08887
Zeng, J., et al.: Photon: A robust cross-domain Text-to-SQL system. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 204–214. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-demos.24, https://www.aclweb.org/anthology/2020.acl-demos.24
Zhong, V., Lewis, M., Wang, S.I., Zettlemoyer, L.: Grounded adaptation for zero-shot executable semantic parsing (2021)
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017). http://arxiv.org/abs/1709.00103
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Dong, K., Lu, K., Xia, X., Cieslak, D., Chawla, N.V. (2021). An Optimized NL2SQL System for Enterprise Data Mart. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-86517-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)