ABSTRACT
Natural language and touch-based interfaces are making data querying significantly easier. But typed SQL remains the gold standard for query sophistication although it is painful in many querying environments. Recent advancements in automatic speech recognition raise the tantalizing possibility of bridging this gap by enabling spoken SQL queries. In this work, we outline our vision of one such new query interface and system for regular SQL that is primarily speech-driven. We propose an end-to-end architecture for making spoken SQL querying effective and efficient and present initial empirical results to understand the feasibility of such an approach. We identify several open research questions and propose alternative solutions that we plan to explore.
- Google Cloud Speech API. cloud.google.com/speech.Google Scholar
- Nuance MagicSpeech. australia.nuance.com/products/speechmagic/index.htm.Google Scholar
- Oracle SQL Developer. oracle.com/technetwork/issue-archive/2008/08-mar/o28sql-100636.html.Google Scholar
- D. Amodei et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In ICML, 2016. Google ScholarDigital Library
- C. Chelba and F. Jelinek. Exploiting Syntactic Structure for Language Modeling. In ACL, 2008.Google Scholar
- A. Crotty et al. Vizdom: Interactive Analytics through Pen and Touch. In VLDB Demo, 2014. Google ScholarDigital Library
- G. Hinton et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal Processing Magazine, 2012.Google Scholar
- S. Lajoie et al. Application of Spoken and Natural Language Technologies to Lotus Notes Based Messaging and Communication, 2002. dtic.mil/dtic/tr/fulltext/u2/a402014.pdf.Google Scholar
- F. Li et al. Constructing an Interactive Natural Language Interface for Relational Databases. In VLDB, 2015. Google ScholarDigital Library
- G. Lyons et al. Making the Case for Query-by-Voice with EchoQuery. In SIGMOD Demo, 2016. Google ScholarDigital Library
- T. Matsuzaki et al. Probabilistic CFG with Latent Annotations. In ACL, 2005. Google ScholarDigital Library
- A. Nandi et al. Gestural Query Specification. In VLDB, 2014. Google ScholarDigital Library
- L. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice-Hall, Inc., 1993. Google ScholarDigital Library
- S. Ruan et al. Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices. CoRR, abs/1608.07323.Google Scholar
- M. M. Zloof. Query by Example. In National Computer Conference and Exposition, 1975. Google ScholarDigital Library
Recommendations
SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataSpeech-driven querying is becoming popular in new device environments such as smartphones, tablets, and even conversational assistants. However, such querying is largely restricted to natural language. Typed SQL remains the gold standard for ...
SpeakQL: Towards Speech-driven Multimodal Querying
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataSpeech-based inputs have become popular in many applications on constrained device environments such as smartphones and tablets, and even personal conversational assistants such as Siri, Alexa, and Cortana. Inspired by this recent success of speech-...
Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataIn this demonstration, we present SpeakQL, a speech-driven query system and interface for structured data. SpeakQL supports a tractable and practically useful subset of regular SQL, allowing users to query in any domain with unbounded vocabulary with ...
Comments