An Optimized NL2SQL System for Enterprise Data Mart

Dong, Kaiwen; Lu, Kai; Xia, Xin; Cieslak, David; Chawla, Nitesh V.

doi:10.1007/978-3-030-86517-7_21

Kaiwen Dong ORCID: orcid.org/0000-0001-8244-9562¹²,
Kai Lu¹²,
Xin Xia¹²,
David Cieslak¹² &
…
Nitesh V. Chawla ORCID: orcid.org/0000-0003-3932-5956¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12979))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1488 Accesses
1 Citations

Abstract

Natural language interfaces to databases is a growing field that enables end users to interact with relational databases without technical database skills. These interfaces solve the problem of synthesizing SQL queries based on natural language input from the user. There are considerable research interests around the topic but there are few systems to date that are deployed on top of an active enterprise data mart. We present our NL2SQL system designed for the banking sector, which can generate a SQL query from a user’s natural language question. The system is comprised of the NL2SQL model we developed, as well as the data simulation and the adaptive feedback framework to continuously improve model performance. The architecture of this NL2SQL model is built on our research on WikiSQL data, which we extended to support multitable scenarios via our unique table expand process. The data simulation and the feedback loop help the model continuously adjust to linguistic variation introduced by the domain specific knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Structured Design Solves Multiple Tables of NL2SQL

NLP2SQL Using Semi-supervised Learning

Learning Seq2Seq Model with Dynamic Schema Linking for NL2SQL

References

Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases - an introduction. CoRR cmp-lg/9503016 (1995). http://arxiv.org/abs/cmp-lg/9503016
Aunalytics: Dayreak analytic database. https://www.aunalytics.com/products/daybreak/
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Dhamdhere, K., McCurley, K.S., Nahmias, R., Sundararajan, M., Yan, Q.: Analyza: exploring data with conversation. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 493–504. IUI 2017. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3025171.3025227, https://doi.org/10.1145/3025171.3025227
Dong, L., Lapata, M.: Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 731–742. Association for Computational Linguistics, Melbourne, Australia. July 2018. https://doi.org/10.18653/v1/P18-1068, https://www.aclweb.org/anthology/P18-1068
Elastic: Elasticsearch. https://www.elastic.co/enterprise-search
Facebook: Duckling. https://duckling.wit.ai/
Hwang, W., Yim, J., Park, S., Seo, M.: A comprehensive exploration on WikiSQL with table-aware word contextualization. CoRR abs/1902.01069 (2019). http://arxiv.org/abs/1902.01069
Inmon, B.: Data mart does not equal data warehouse (1999)
Google Scholar
Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 963–973. Association for Computational Linguistics, Vancouver, Canada, July 2017. https://doi.org/10.18653/v1/P17-1089, https://www.aclweb.org/anthology/P17-1089
Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state of the art. Foundations Trends® Comput. Graph. Vis. 12(1–3), 1–308 (2020). https://doi.org/10.1561/0600000079, http://dx.doi.org/10.1561/0600000079
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, 7–9 May 2015, San Diego, CA, USA, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Kurita, K., Vyas, N., Pareek, A., Black, A.W., Tsvetkov, Y.: Measuring bias in contextualized word representations. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 166–172. Association for Computational Linguistics, Florence, Italy, August 2019. https://doi.org/10.18653/v1/W19-3823, https://www.aclweb.org/anthology/W19-3823
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions. Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)
Google Scholar
Li, F., Jagadish, H.V.: NaLIR: an interactive natural language interface for querying relational databases. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 709–712. SIGMOD 2014. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2588555.2594519, https://doi.org/10.1145/2588555.2594519
Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4870–4888. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.438, https://www.aclweb.org/anthology/2020.findings-emnlp.438
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. Association for Computational Linguistics, Baltimore, Maryland, June 2014. https://doi.org/10.3115/v1/P14-5010, https://www.aclweb.org/anthology/P14-5010
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001). https://doi.org/10.1145/375360.375365, https://doi.org/10.1145/375360.375365
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). http://www.aclweb.org/anthology/D14-1162
Peterson, S.: Stars: A pattern language for query optimized schema (1994). http://c2.com/ppr/stars.html
Setlur, V., Battersby, S.E., Tory, M., Gossweiler, R., Chang, A.X.: Eviza: a natural language interface for visual analysis. In: Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pp. 365–377. UIST 2016. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2984511.2984588, https://doi.org/10.1145/2984511.2984588
Setlur, V., Tory, M., Djalali, A.: Inferencing underspecified natural language utterances in visual analysis. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 40–51. IUI 2019. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3301275.3302270, https://doi.org/10.1145/3301275.3302270
Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Ann. Rev. Biomed. Eng. 19(1), 221–248 (2017). https://doi.org/10.1146/annurev-bioeng-071516-044442, https://doi.org/10.1146/annurev-bioeng-071516-044442, pMID: 28301734
Sun, T., et al.: Mitigating gender bias in natural language processing: literature review. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1630–1640. Association for Computational Linguistics, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-1159, https://www.aclweb.org/anthology/P19-1159
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-sql parsers. CoRR abs/1911.04942 (2019). http://arxiv.org/abs/1911.04942
Wang, P., Shi, T., Reddy, C.K.: Text-to-SQL generation for question answering on electronic medical records. In: Huang, Y., King, I., Liu, T., van Steen, M. (eds.) WWW 2020: The Web Conference 2020, 20–24 April 2020, Taipei, Taiwan, pp. 350–361. ACM/IW3C2 (2020). https://doi.org/10.1145/3366423.3380120, https://doi.org/10.1145/3366423.3380120
Weir, N., et al.: DBPal: a fully pluggable NL2SQL training pipeline. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2347–2361. SIGMOD 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3318464.3380589, https://doi.org/10.1145/3318464.3380589
Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989). https://doi.org/10.1162/neco.1989.1.2.270
Article Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. CoRR abs/1910.03771 (2019). http://arxiv.org/abs/1910.03771
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). http://arxiv.org/abs/1609.08144
Xu, X., Liu, C., Song, D.: SQLNet: generating structured queries from natural language without reinforcement learning. CoRR abs/1711.04436 (2017). http://arxiv.org/abs/1711.04436
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. CoRR abs/1809.08887 (2018). http://arxiv.org/abs/1809.08887
Zeng, J., et al.: Photon: A robust cross-domain Text-to-SQL system. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 204–214. Association for Computational Linguistics, Online, July 2020. https://doi.org/10.18653/v1/2020.acl-demos.24, https://www.aclweb.org/anthology/2020.acl-demos.24
Zhong, V., Lewis, M., Wang, S.I., Zettlemoyer, L.: Grounded adaptation for zero-shot executable semantic parsing (2021)
Google Scholar
Zhong, V., Xiong, C., Socher, R.: Seq2SQL: generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017). http://arxiv.org/abs/1709.00103

Download references

Author information

Authors and Affiliations

Aunalytics, South Bend, IN, 46545, USA
Kaiwen Dong, Kai Lu, Xin Xia, David Cieslak & Nitesh V. Chawla

Authors

Kaiwen Dong
View author publications
You can also search for this author in PubMed Google Scholar
Kai Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Xia
View author publications
You can also search for this author in PubMed Google Scholar
David Cieslak
View author publications
You can also search for this author in PubMed Google Scholar
Nitesh V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaiwen Dong .

Editor information

Editors and Affiliations

Facebook AI, Seattle, WA, USA
Yuxiao Dong
Torre Telefonica, Barcelona, Spain
Nicolas Kourtellis
Bielefeld University, CITEC, Bielefeld, Germany
Barbara Hammer
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, K., Lu, K., Xia, X., Cieslak, D., Chawla, N.V. (2021). An Optimized NL2SQL System for Enterprise Data Mart. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-86517-7_21
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)