ABSTRACT
Natural language query understanding for unstructured textual sources has seen significant progress over the last couple of decades. For structured data, while the ecosystem has evolved with regard to data storage and retrieval mechanisms, the query language has remained predominantly SQL (or SQL-like). Towards making the latter more natural there has been recent research emphasis on Natural Language Interface to DataBases (NLIDB) systems. Piggybacking on the rise of 'deep learning' systems, the state-of-the-art NLIDB solutions over large parallel and standard benchmarks (viz, WikiSQL and Spider) primarily rely on attention based sequence-to-sequence models.
Building industry grade NLIDB solutions for making big data ecosystem accessible by truly natural and unstructured querying mechanism presents several challenges. These include lack of availability of parallel corpora, diversity in underlying data schema, wide variability in the nature of queries to context and dialog management in interactive systems. In this paper, we present an end-to-end system Query Enterprise Data (QED) towards making enterprise descriptive analytics and reporting easier and natural. We elaborate in detail how we addressed the challenges mentioned above and novel features such as handling incomplete queries in incremental fashion as well as highlight the role of an assistive user interface that provides a better user experience. Finally, we conclude the paper with observations and lessons learnt from the experience of transferring and deploying a research solution to industry grade practical deployment.
- Katrin Affolter, Kurt Stockinger, and Abraham Bernstein. 2019. A Comparative Survey of Recent Natural Language Interfaces for Databases. arXiv preprint arXiv:1906.08990 (2019).Google Scholar
- Ion Androutsopoulos, Graeme D Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases--an introduction. Natural language engineering 1, 1 (1995), 29--81.Google Scholar
- Sonia Bergamaschi, Francesco Guerra, Matteo Interlandi, Raquel Trillo-Lado, and Yannis Velegrakis. 2013. QUEST: a keyword search system for relational data based on semantic and machine learning techniques. Proceedings of the VLDB Endowment 6, 12 (2013), 1222--1225.Google ScholarDigital Library
- Lukas Blunschi, Claudio Jossen, Donald Kossmann, Magdalini Mori, and Kurt Stockinger. 2012. Soda: Generating sql for business users. Proceedings of the VLDB Endowment 5, 10 (2012), 932--943.Google ScholarDigital Library
- Ben Bogin, Matt Gardner, and Jonathan Berant. 2019. Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing. arXiv preprint arXiv:1905.06241 (2019).Google Scholar
- Li Dong and Mirella Lapata. 2016. Language to Logical Form with Neural Attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 33--43.Google ScholarCross Ref
- Li Dong and Mirella Lapata. 2018. Coarse-to-Fine Decoding for Neural Semantic Parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 731--742.Google ScholarCross Ref
- William A. Gale, Kenneth W. Church, and David Yarowsky. 1992. One Sense Per Discourse. In Proceedings of the Workshop on Speech and Natural Language (HLT '91). Association for Computational Linguistics, Stroudsburg, PA, USA, 233--237. https://doi.org/10.3115/1075527.1075579Google ScholarDigital Library
- Shantanu Godbole and Shourya Roy. 2008. Text Classification, Business Intelligence, and Interactivity: Automating C-Sat Analysis for Services Industry. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 911--919. https://doi.org/10.1145/1401890.1401999Google ScholarDigital Library
- Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. arXiv preprint arXiv:1905.08205 (2019).Google Scholar
- Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a Neural Semantic Parser from User Feedback. In 55th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Fei Li and Hosagrahar V Jagadish. 2014. NaLIR: an interactive natural language interface for querying relational databases. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 709--712.Google ScholarDigital Library
- Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R Mittal, and Fatma Özcan. 2016. ATHENA: an ontology-driven system for natural language querying over relational data stores. Proceedings of the VLDB Endowment 9, 12 (2016), 1209--1220.Google ScholarDigital Library
- Alkis Simitsis, Georgia Koutrika, and Yannis Ioannidis. 2008. Précis: from unstructured keywords as queries to structured databases as answers. The VLDB JournalâĂŤThe International Journal on Very Large Data Bases 17, 1 (2008), 117--149.Google Scholar
- Dezhao Song, Frank Schilder, Charese Smiley, Chris Brew, Tom Zielund, Hiroko Bretz, Robert Martin, Chris Dale, John Duprey, Tim Miller, et al. 2015. TR discover: A natural language interface for querying and analyzing interlinked datasets. In International Semantic Web Conference. Springer, 21--37.Google ScholarDigital Library
- Xiaojun Xu, Chang Liu, and Dawn Song. 2017. Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436 (2017).Google Scholar
- Pengcheng Yin, Zhengdong Lu, Hang Li, and Ben Kao. 2016. Neural enquirer: learning to query tables in natural language. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, 2308--2314.Google ScholarCross Ref
- Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir R. Radev. 2018. SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task. CoRR abs/1810.05237 (2018). arXiv:1810.05237 http://arxiv.org/abs/1810.05237Google Scholar
- Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911--3921.Google ScholarCross Ref
- Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017), 1--10.Google Scholar
Recommendations
Interactive natural language interface
To override the complexity of SQL, and to facilitate the manipulation of data in databases for common people (not SQL professionals), many researches have turned out to use natural language instead of SQL. The idea of using natural language instead of ...
Generic interactive natural language interface to databases (GINLIDB)
EC'09: Proceedings of the 10th WSEAS international conference on evolutionary computingTo override the complexity of SQL, and to facilitate the manipulation of data in databases for common people (not SQL professionals), many researches have turned out to use natural language instead of SQL. The idea of using natural language instead of ...
Natural language querying of databases
Natural language (NL) interfaces for database (DB) query formulation have always been recognized as a much-needed enhancement for DB end-users. NL systems, however, have shortcomings that have led some DB researchers to question their practicality. The ...
Comments