Article

MauveDB: supporting model-based user views in database systems

Authors:

Amol Deshpande,

Samuel MaddenAuthors Info & Claims

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

Pages 73 - 84

https://doi.org/10.1145/1142473.1142483

Published: 27 June 2006 Publication History

Abstract

Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks --- tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly into applications. The traditional approach to dealing with this problem is to first process the data using statistical or probabilistic models that can provide more robust interpretations of the data. Current database systems, however, do not provide adequate support for applying models to such data, especially when those models need to be frequently updated as new data arrives in the system. Hence, most scientists and engineers who depend on models for managing their data do not use database systems for archival or querying at all; at best, databases serve as a persistent raw data store.In this paper we define a new abstraction called model-based views and present the architecture of MauveDB, the system we are building to support such views. Just as traditional database views provide logical data independence, model-based views provide independence from the details of the underlying data generating mechanism and hide the irregularities of the data by using models to present a consistent view to the users. MauveDB supports a declarative language for defining model-based views, allows declarative querying over such views using SQL, and supports several different materialization strategies and techniques to efficiently maintain them in the face of frequent updates. We have implemented a prototype system that currently supports views based on regression and interpolation, using the Apache Derby open source DBMS, and we present results that show the utility and performance benefits that can be obtained by supporting several different types of model-based views in a database system.

References

[1]

I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey. Computer Networks, 38, 2002.

Digital Library

[2]

Periklis Andritsos, Ariel Fuxman, and Renee J. Miller. Clean answers over dirty databases. In ICDE, 2006.

Digital Library

[3]

The Apache Derby Project. Web Site. http://db.apache.org/derby/.

[4]

D. Barbara, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE TKDE, 4(5):487--502, 1992.

Digital Library

[5]

Tim Brooke and Jenna Burrell. From ethnography to design in a vineyard. In Proceeedings of the Design User Experiences (DUX) Conference, June 2003.

Digital Library

[6]

A. Cerpa, J. Elson, D.Estrin, L. Girod, M. Hamilton, and J. Zhao. Habitat monitoring: Application driver for wireless communications technology. In Proceedings of ACM SIGCOMM 2001 Workshop on Data Communications in Latin America and the Caribbean.

Digital Library

[7]

Surajit Chaudhuri, Vivek Narasayya, and Sunita Sarawagi. Efficient evaluation of queries with mining predicates. In Proceedings of ICDE, 2002.

Digital Library

[8]

Reynold Cheng, Dmitri V. Kalashnikov, and Sunil Prabhakar. Evaluating probabilistic queries over imprecise data. In Proceedings of SIGMOD, 2003.

Digital Library

[9]

M. Chu, H. Haussecker, and F. Zhao. Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks. In Intl Journal of High Performance Computing Applications, 2002.

Digital Library

[10]

Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.

Digital Library

[11]

Dorothy E. Denning et al. Views for multilevel database security. IEEE Trans. Softw. Eng., 1987.

Digital Library

[12]

Amol Deshpande, Carlos Guestrin, Sam Madden, Joe Hellerstein, and Wei Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.

Digital Library

[13]

Norbert Fuhr and Thomas Rolleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1):32--66, 1997.

Digital Library

[14]

G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins, 1989.

[15]

G. Grahne. Horn tables - an efficient tool for handling incomplete information in databases. In PODS, 1989.

Digital Library

[16]

S. Grumbach, P. Rigaux, and L. Segoufin. Manipulating interpolated data is easier than you thought. In VLDB, 2000.

Digital Library

[17]

C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, and S. Madden. Distributed regression: an efficient frame- work for modeling sensor network data. In IPSN, 2004.

Digital Library

[18]

A. Gupta and I.S. Mumick. Materialized views: techniques, implementations, and applications. MIT Press, 1999.

Digital Library

[19]

David Hand, Heikki Mannila, and Padhraic Smyth. Principles of Data Mining. MIT Press, 2001.

Digital Library

[20]

DB2 Intelligent Miner. Web Site. http://www-306.ibm.com/software/data/iminer/.

[21]

T. Imielinski and W. Lipski Jr. Incomplete infor- mation in relational databases. JACM, 31(4), 1984.

Digital Library

[22]

C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. In MOBICOM, 2000.

Digital Library

[23]

A. Jain, E. Change, and Y. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, 2004.

Digital Library

[24]

L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian. Probview: a flexible probabilistic database system. ACM TODS, 22(3), 1997.

Digital Library

[25]

Suk Kyoon Lee. An extended relational database model for uncertain and imprecise information. In VLDB, 1992.

Digital Library

[26]

L. Liao, D. Fox, and H. Kautz. Location-based activity recognition using relational markov networks. In IJCAI, 2005.

Digital Library

[27]

Sam Madden. Intel lab data, 2004. http://berkeley.intel-research.net/labdata.

[28]

Samuel Madden, Wei Hong, Joseph M. Hellerstein, and Michael Franklin. TinyDB web page. http://telegraph.cs.berkeley.edu/tinydb.

[29]

A. Mainwaring, J. Polastre, R. Szewczyk, and D. Culler. Wireless sensor networks for habitat monitoring. In ACM Workshop on Sensor Networks and Applications, 2002.

Digital Library

[30]

Erin McKean, editor. The Oxford English Dictionary (2nd Edition). Oxford Univeristy Press, 2005.

[31]

Leonore Neugebauer. Optimization and evaluation of database queries including embedded interpolation procedures. In Proceedings of SIGMOD, 1991.

Digital Library

[32]

George M. Phillips. Interpolation and Approximation by Polynomials. Springer-Verlag, 2003.

[33]

PMML 3.0 Specification. Web Site. http://www.dmg.org/v3-0/GeneralStructure.html.

[34]

S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with databases: alternatives and implications. In Proceedings of SIGMOD, 1998.

Digital Library

[35]

Business Analytics Software Solutions (SAS). Web Site. http://www.sas.com/technologies/analytics.

[36]

J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.

[37]

Y. Xia, S. Prabhakar, S. Lei, R. Cheng, and R. Shah. Indexing continuously changing data with mean-variance tree. In ACM SAC, 2005.

Digital Library

[38]

Y. Yao and J. Gehrke. Query processing in sensor networks. In CIDR, 2003.

Cited By

Francia MRizzi SMarcel P(2024)Explaining cube measures through Intentional AnalyticsInformation Systems10.1016/j.is.2023.102338121(102338)Online publication date: Mar-2024
https://doi.org/10.1016/j.is.2023.102338
Yang ZChen S(2023)MOST: Model-Based Compression with Outlier Storage for Time Series DataProceedings of the ACM on Management of Data10.1145/36267371:4(1-29)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626737
Liao NMo DLuo SLi XYin P(2022)SCARAProceedings of the VLDB Endowment10.14778/3551793.355186615:11(3240-3248)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551866
Show More Cited By

Index Terms

MauveDB: supporting model-based user views in database systems
1. Information systems
  1. Data management systems

Recommendations

Top-k best probability queries and semantics ranking properties on probabilistic databases

There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalized services, and decision making. In probabilistic relational databases, the most common problem in answering top-k ...
Incremental Recomputation of Active Relational Expressions

Database updates are small and incremental compared to database contents. It is therefore desirable that recomputations of active relational expressions-such as views, derived data, integrity constraints, active queries, and monitors-can also be ...
Ranking queries on uncertain data

Uncertain data is inherent in a few important applications. It is far from trivial to extend ranking queries (also known as top-k queries), a popular type of queries on certain data, to uncertain data. In this paper, we cast ranking queries on uncertain ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

June 2006

830 pages

ISBN:1595934340

DOI:10.1145/1142473

General Chairs:
Clement Yu
University of Illinois at Chicago
,
Peter Scheuermann
Northwestern University
,
Program Chair:
Surajit Chaudhuri
Microsoft Research

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGMOD/PODS06

Sponsor:

SIGMOD/PODS06: International Conference on Management of Data and Symposium on Principles Database and Systems

June 27 - 29, 2006

IL, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

147
Total Citations
View Citations
1,200
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Francia MRizzi SMarcel P(2024)Explaining cube measures through Intentional AnalyticsInformation Systems10.1016/j.is.2023.102338121(102338)Online publication date: Mar-2024
https://doi.org/10.1016/j.is.2023.102338
Yang ZChen S(2023)MOST: Model-Based Compression with Outlier Storage for Time Series DataProceedings of the ACM on Management of Data10.1145/36267371:4(1-29)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626737
Liao NMo DLuo SLi XYin P(2022)SCARAProceedings of the VLDB Endowment10.14778/3551793.355186615:11(3240-3248)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551866
Zhou XChen L(2022)Migrating social event recommendation over microblogsProceedings of the VLDB Endowment10.14778/3551793.355186415:11(3213-3225)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551864
Ma QTriantafillou P(2022)Query-centric regressionInformation Systems10.1016/j.is.2021.101736104:COnline publication date: 9-Apr-2022
https://dl.acm.org/doi/10.1016/j.is.2021.101736
Jensen SPedersen TThomsen C(2021) Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB + 2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00123(1380-1391)Online publication date: Apr-2021
https://doi.org/10.1109/ICDE51399.2021.00123
Francia MMarcel PPeralta VRizzi S(2021)Enhancing Cubes with Models to Describe Multidimensional DataInformation Systems Frontiers10.1007/s10796-021-10147-324:1(31-48)Online publication date: 11-Jun-2021
https://doi.org/10.1007/s10796-021-10147-3
Zhao KYu JRong YLiao MHuang J(2021)Towards Expectation-Maximization by SQL in RDBMSDatabase Systems for Advanced Applications10.1007/978-3-030-73197-7_53(778-794)Online publication date: 6-Apr-2021
https://doi.org/10.1007/978-3-030-73197-7_53
Orr LBalazinska MSuciu DMaier DPottinger RDoan ATan WAlawini ANgo H(2020)Sample Debiasing in the Themis Open World Database SystemProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380606(257-268)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3380606
Wu YTannen VDavidson SMaier DPottinger RDoan ATan WAlawini ANgo H(2020)PrIU: A Provenance-Based Approach for Incrementally Updating Regression ModelsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380571(447-462)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3318464.3380571
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten