Abstract
Traditionally, by query answering we mean the problem of finding all answers to a given query over a given database. But what happens if the number of answers is prohibitively big - which may easily occur in a Big Data context? In such situations, it seems preferable to have a mechanism that produces one answer after the other with certain guarantees on the time between any two outputs and to let the user decide when to stop. This leads us to the enumeration problem, which has received a lot of interest recently [1]. However, in order for the user to get a "realistic" picture of the entirety of answers, two crucial questions arise: first, how big is the portion of output answers compared with the total number of answers? And second, do the output answers reflect the variety of the complete set of answers? The first question refers to the counting problem, where we are interested in the total number of answers. The second question leads us to the problem of uniform generation, where we request that the answers be uniformly generated and thus form an unbiased sample of the complete set of answers.
- E. Boros, B. Kimelfeld, R. Pichler, and N. Schweikardt. Enumeration in data management (dagstuhl seminar 19211). Dagstuhl Reports, 9(5):89--109, 2019.Google Scholar
- M. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43:169--188, 1986. Google ScholarDigital Library
Recommendations
Technical Perspective: Efficient Query Processing for Dynamically Changing Datasets
The paper Efficient Query Processing for Dynamically Changing Datasets, by Muhammad Idris, Mart´n Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner studies two central aspects of answering queries: (1) enumerating the answers to a query and (...
Technical Perspective: Query Answers - Fewer is Faster
We often write queries using LIMIT k, indicating that only k answers are to be returned. This feature is present in most query languages, for different data models: SQL, SPARQL, Cypher etc. For example, in a repository of about 250M SPARQL queries, ...
Technical Perspective for: Query Games in Databases
When a data analyst runs some query to analyze her data, she often wants to ask some follow-up questions, about the result of the query. Why-questions take many shapes, and occur in many scenarios. Why is a particular tuple in the answer? Why is it not ...
Comments