NetCube: Fast, Approximate Database Queries Using Bayesian Networks

NetCube: Fast, Approximate Database Queries Using Bayesian Networks

Dimitris Margaritis, Christos Faloutsos, Sebastian Thrun
Copyright: © 2009 |Pages: 19
ISBN13: 9781605660981|ISBN10: 1605660981|EISBN13: 9781605660998
DOI: 10.4018/978-1-60566-098-1.ch023
Cite Chapter Cite Chapter

MLA

Margaritis, Dimitris, et al. "NetCube: Fast, Approximate Database Queries Using Bayesian Networks." Selected Readings on Database Technologies and Applications, edited by Terry Halpin, IGI Global, 2009, pp. 471-489. https://doi.org/10.4018/978-1-60566-098-1.ch023

APA

Margaritis, D., Faloutsos, C., & Thrun, S. (2009). NetCube: Fast, Approximate Database Queries Using Bayesian Networks. In T. Halpin (Ed.), Selected Readings on Database Technologies and Applications (pp. 471-489). IGI Global. https://doi.org/10.4018/978-1-60566-098-1.ch023

Chicago

Margaritis, Dimitris, Christos Faloutsos, and Sebastian Thrun. "NetCube: Fast, Approximate Database Queries Using Bayesian Networks." In Selected Readings on Database Technologies and Applications, edited by Terry Halpin, 471-489. Hershey, PA: IGI Global, 2009. https://doi.org/10.4018/978-1-60566-098-1.ch023

Export Reference

Mendeley
Favorite

Abstract

We present a novel method for answering count queries from a large database approximately and quickly. Our method implements an approximate DataCube of the application domain, which can be used to answer any conjunctive count query that can be formed by the user. The DataCube is a conceptual device that in principle stores the number of matching records for all possible such queries. However, because its size and generation time are inherently exponential, our approach uses one or more Bayesian networks to implement it approximately. Bayesian networks are statistical graphical models that can succinctly represent the underlying joint probability distribution of the domain, and can therefore be used to calculate approximate counts for any conjunctive query combination of attribute values and “don’t cares.” The structure and parameters of these networks are learned from the database in a preprocessing stage. By means of such a network, the proposed method, called NetCube, exploits correlations and independencies among attributes to answer a count query quickly without accessing the database. Our preprocessing algorithm scales linearly on the size of the database, and is thus scalable; it is also parallelizable with a straightforward parallel implementation. We give an algorithm for estimating the count result of arbitrary que ries that is fast (constant) on the database size. Our experimental results show that NetCubes have fast generation and use, achieve excellent compression and have low reconstruction error. Moreover, they naturally allow for visualization and data mining, at no extra cost.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.