skip to main content
10.1145/3371158.3371198acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

A Natural Language and Interactive End-to-End Querying and Reporting System

Published:15 January 2020Publication History

ABSTRACT

Natural language query understanding for unstructured textual sources has seen significant progress over the last couple of decades. For structured data, while the ecosystem has evolved with regard to data storage and retrieval mechanisms, the query language has remained predominantly SQL (or SQL-like). Towards making the latter more natural there has been recent research emphasis on Natural Language Interface to DataBases (NLIDB) systems. Piggybacking on the rise of 'deep learning' systems, the state-of-the-art NLIDB solutions over large parallel and standard benchmarks (viz, WikiSQL and Spider) primarily rely on attention based sequence-to-sequence models.

Building industry grade NLIDB solutions for making big data ecosystem accessible by truly natural and unstructured querying mechanism presents several challenges. These include lack of availability of parallel corpora, diversity in underlying data schema, wide variability in the nature of queries to context and dialog management in interactive systems. In this paper, we present an end-to-end system Query Enterprise Data (QED) towards making enterprise descriptive analytics and reporting easier and natural. We elaborate in detail how we addressed the challenges mentioned above and novel features such as handling incomplete queries in incremental fashion as well as highlight the role of an assistive user interface that provides a better user experience. Finally, we conclude the paper with observations and lessons learnt from the experience of transferring and deploying a research solution to industry grade practical deployment.

References

  1. Katrin Affolter, Kurt Stockinger, and Abraham Bernstein. 2019. A Comparative Survey of Recent Natural Language Interfaces for Databases. arXiv preprint arXiv:1906.08990 (2019).Google ScholarGoogle Scholar
  2. Ion Androutsopoulos, Graeme D Ritchie, and Peter Thanisch. 1995. Natural language interfaces to databases--an introduction. Natural language engineering 1, 1 (1995), 29--81.Google ScholarGoogle Scholar
  3. Sonia Bergamaschi, Francesco Guerra, Matteo Interlandi, Raquel Trillo-Lado, and Yannis Velegrakis. 2013. QUEST: a keyword search system for relational data based on semantic and machine learning techniques. Proceedings of the VLDB Endowment 6, 12 (2013), 1222--1225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lukas Blunschi, Claudio Jossen, Donald Kossmann, Magdalini Mori, and Kurt Stockinger. 2012. Soda: Generating sql for business users. Proceedings of the VLDB Endowment 5, 10 (2012), 932--943.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ben Bogin, Matt Gardner, and Jonathan Berant. 2019. Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing. arXiv preprint arXiv:1905.06241 (2019).Google ScholarGoogle Scholar
  6. Li Dong and Mirella Lapata. 2016. Language to Logical Form with Neural Attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 33--43.Google ScholarGoogle ScholarCross RefCross Ref
  7. Li Dong and Mirella Lapata. 2018. Coarse-to-Fine Decoding for Neural Semantic Parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 731--742.Google ScholarGoogle ScholarCross RefCross Ref
  8. William A. Gale, Kenneth W. Church, and David Yarowsky. 1992. One Sense Per Discourse. In Proceedings of the Workshop on Speech and Natural Language (HLT '91). Association for Computational Linguistics, Stroudsburg, PA, USA, 233--237. https://doi.org/10.3115/1075527.1075579Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shantanu Godbole and Shourya Roy. 2008. Text Classification, Business Intelligence, and Interactivity: Automating C-Sat Analysis for Services Industry. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 911--919. https://doi.org/10.1145/1401890.1401999Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. arXiv preprint arXiv:1905.08205 (2019).Google ScholarGoogle Scholar
  11. Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a Neural Semantic Parser from User Feedback. In 55th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  12. Fei Li and Hosagrahar V Jagadish. 2014. NaLIR: an interactive natural language interface for querying relational databases. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 709--712.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R Mittal, and Fatma Özcan. 2016. ATHENA: an ontology-driven system for natural language querying over relational data stores. Proceedings of the VLDB Endowment 9, 12 (2016), 1209--1220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alkis Simitsis, Georgia Koutrika, and Yannis Ioannidis. 2008. Précis: from unstructured keywords as queries to structured databases as answers. The VLDB JournalâĂŤThe International Journal on Very Large Data Bases 17, 1 (2008), 117--149.Google ScholarGoogle Scholar
  15. Dezhao Song, Frank Schilder, Charese Smiley, Chris Brew, Tom Zielund, Hiroko Bretz, Robert Martin, Chris Dale, John Duprey, Tim Miller, et al. 2015. TR discover: A natural language interface for querying and analyzing interlinked datasets. In International Semantic Web Conference. Springer, 21--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xiaojun Xu, Chang Liu, and Dawn Song. 2017. Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436 (2017).Google ScholarGoogle Scholar
  17. Pengcheng Yin, Zhengdong Lu, Hang Li, and Ben Kao. 2016. Neural enquirer: learning to query tables in natural language. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. AAAI Press, 2308--2314.Google ScholarGoogle ScholarCross RefCross Ref
  18. Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir R. Radev. 2018. SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task. CoRR abs/1810.05237 (2018). arXiv:1810.05237 http://arxiv.org/abs/1810.05237Google ScholarGoogle Scholar
  19. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911--3921.Google ScholarGoogle ScholarCross RefCross Ref
  20. Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017), 1--10.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CoDS COMAD 2020: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD
    January 2020
    399 pages
    ISBN:9781450377386
    DOI:10.1145/3371158

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 January 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    CoDS COMAD 2020 Paper Acceptance Rate78of275submissions,28%Overall Acceptance Rate197of680submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader