skip to main content
10.1145/1142473.1142483acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

MauveDB: supporting model-based user views in database systems

Published: 27 June 2006 Publication History

Abstract

Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks --- tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly into applications. The traditional approach to dealing with this problem is to first process the data using statistical or probabilistic models that can provide more robust interpretations of the data. Current database systems, however, do not provide adequate support for applying models to such data, especially when those models need to be frequently updated as new data arrives in the system. Hence, most scientists and engineers who depend on models for managing their data do not use database systems for archival or querying at all; at best, databases serve as a persistent raw data store.In this paper we define a new abstraction called model-based views and present the architecture of MauveDB, the system we are building to support such views. Just as traditional database views provide logical data independence, model-based views provide independence from the details of the underlying data generating mechanism and hide the irregularities of the data by using models to present a consistent view to the users. MauveDB supports a declarative language for defining model-based views, allows declarative querying over such views using SQL, and supports several different materialization strategies and techniques to efficiently maintain them in the face of frequent updates. We have implemented a prototype system that currently supports views based on regression and interpolation, using the Apache Derby open source DBMS, and we present results that show the utility and performance benefits that can be obtained by supporting several different types of model-based views in a database system.

References

[1]
I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey. Computer Networks, 38, 2002.
[2]
Periklis Andritsos, Ariel Fuxman, and Renee J. Miller. Clean answers over dirty databases. In ICDE, 2006.
[3]
The Apache Derby Project. Web Site. http://db.apache.org/derby/.
[4]
D. Barbara, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE TKDE, 4(5):487--502, 1992.
[5]
Tim Brooke and Jenna Burrell. From ethnography to design in a vineyard. In Proceeedings of the Design User Experiences (DUX) Conference, June 2003.
[6]
A. Cerpa, J. Elson, D.Estrin, L. Girod, M. Hamilton, and J. Zhao. Habitat monitoring: Application driver for wireless communications technology. In Proceedings of ACM SIGCOMM 2001 Workshop on Data Communications in Latin America and the Caribbean.
[7]
Surajit Chaudhuri, Vivek Narasayya, and Sunita Sarawagi. Efficient evaluation of queries with mining predicates. In Proceedings of ICDE, 2002.
[8]
Reynold Cheng, Dmitri V. Kalashnikov, and Sunil Prabhakar. Evaluating probabilistic queries over imprecise data. In Proceedings of SIGMOD, 2003.
[9]
M. Chu, H. Haussecker, and F. Zhao. Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks. In Intl Journal of High Performance Computing Applications, 2002.
[10]
Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
[11]
Dorothy E. Denning et al. Views for multilevel database security. IEEE Trans. Softw. Eng., 1987.
[12]
Amol Deshpande, Carlos Guestrin, Sam Madden, Joe Hellerstein, and Wei Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.
[13]
Norbert Fuhr and Thomas Rolleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1):32--66, 1997.
[14]
G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins, 1989.
[15]
G. Grahne. Horn tables - an efficient tool for handling incomplete information in databases. In PODS, 1989.
[16]
S. Grumbach, P. Rigaux, and L. Segoufin. Manipulating interpolated data is easier than you thought. In VLDB, 2000.
[17]
C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, and S. Madden. Distributed regression: an efficient frame- work for modeling sensor network data. In IPSN, 2004.
[18]
A. Gupta and I.S. Mumick. Materialized views: techniques, implementations, and applications. MIT Press, 1999.
[19]
David Hand, Heikki Mannila, and Padhraic Smyth. Principles of Data Mining. MIT Press, 2001.
[20]
DB2 Intelligent Miner. Web Site. http://www-306.ibm.com/software/data/iminer/.
[21]
T. Imielinski and W. Lipski Jr. Incomplete infor- mation in relational databases. JACM, 31(4), 1984.
[22]
C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. In MOBICOM, 2000.
[23]
A. Jain, E. Change, and Y. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, 2004.
[24]
L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian. Probview: a flexible probabilistic database system. ACM TODS, 22(3), 1997.
[25]
Suk Kyoon Lee. An extended relational database model for uncertain and imprecise information. In VLDB, 1992.
[26]
L. Liao, D. Fox, and H. Kautz. Location-based activity recognition using relational markov networks. In IJCAI, 2005.
[27]
Sam Madden. Intel lab data, 2004. http://berkeley.intel-research.net/labdata.
[28]
Samuel Madden, Wei Hong, Joseph M. Hellerstein, and Michael Franklin. TinyDB web page. http://telegraph.cs.berkeley.edu/tinydb.
[29]
A. Mainwaring, J. Polastre, R. Szewczyk, and D. Culler. Wireless sensor networks for habitat monitoring. In ACM Workshop on Sensor Networks and Applications, 2002.
[30]
Erin McKean, editor. The Oxford English Dictionary (2nd Edition). Oxford Univeristy Press, 2005.
[31]
Leonore Neugebauer. Optimization and evaluation of database queries including embedded interpolation procedures. In Proceedings of SIGMOD, 1991.
[32]
George M. Phillips. Interpolation and Approximation by Polynomials. Springer-Verlag, 2003.
[33]
PMML 3.0 Specification. Web Site. http://www.dmg.org/v3-0/GeneralStructure.html.
[34]
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with databases: alternatives and implications. In Proceedings of SIGMOD, 1998.
[35]
Business Analytics Software Solutions (SAS). Web Site. http://www.sas.com/technologies/analytics.
[36]
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
[37]
Y. Xia, S. Prabhakar, S. Lei, R. Cheng, and R. Shah. Indexing continuously changing data with mean-variance tree. In ACM SAC, 2005.
[38]
Y. Yao and J. Gehrke. Query processing in sensor networks. In CIDR, 2003.

Cited By

View all

Index Terms

  1. MauveDB: supporting model-based user views in database systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
    June 2006
    830 pages
    ISBN:1595934340
    DOI:10.1145/1142473
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. query processing
    2. regression
    3. sensor networks
    4. statistical models
    5. uncertain data
    6. views

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Explaining cube measures through Intentional AnalyticsInformation Systems10.1016/j.is.2023.102338121(102338)Online publication date: Mar-2024
    • (2023)MOST: Model-Based Compression with Outlier Storage for Time Series DataProceedings of the ACM on Management of Data10.1145/36267371:4(1-29)Online publication date: 12-Dec-2023
    • (2022)SCARAProceedings of the VLDB Endowment10.14778/3551793.355186615:11(3240-3248)Online publication date: 1-Jul-2022
    • (2022)Migrating social event recommendation over microblogsProceedings of the VLDB Endowment10.14778/3551793.355186415:11(3213-3225)Online publication date: 1-Jul-2022
    • (2022)Query-centric regressionInformation Systems10.1016/j.is.2021.101736104:COnline publication date: 9-Apr-2022
    • (2021) Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB + 2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00123(1380-1391)Online publication date: Apr-2021
    • (2021)Enhancing Cubes with Models to Describe Multidimensional DataInformation Systems Frontiers10.1007/s10796-021-10147-324:1(31-48)Online publication date: 11-Jun-2021
    • (2021)Towards Expectation-Maximization by SQL in RDBMSDatabase Systems for Advanced Applications10.1007/978-3-030-73197-7_53(778-794)Online publication date: 6-Apr-2021
    • (2020)Sample Debiasing in the Themis Open World Database SystemProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380606(257-268)Online publication date: 11-Jun-2020
    • (2020)PrIU: A Provenance-Based Approach for Incrementally Updating Regression ModelsProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380571(447-462)Online publication date: 11-Jun-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media