skip to main content
10.1145/1142473.1142483acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

MauveDB: supporting model-based user views in database systems

Published:27 June 2006Publication History

ABSTRACT

Real-world data --- especially when generated by distributed measurement infrastructures such as sensor networks --- tends to be incomplete, imprecise, and erroneous, making it impossible to present it to users or feed it directly into applications. The traditional approach to dealing with this problem is to first process the data using statistical or probabilistic models that can provide more robust interpretations of the data. Current database systems, however, do not provide adequate support for applying models to such data, especially when those models need to be frequently updated as new data arrives in the system. Hence, most scientists and engineers who depend on models for managing their data do not use database systems for archival or querying at all; at best, databases serve as a persistent raw data store.In this paper we define a new abstraction called model-based views and present the architecture of MauveDB, the system we are building to support such views. Just as traditional database views provide logical data independence, model-based views provide independence from the details of the underlying data generating mechanism and hide the irregularities of the data by using models to present a consistent view to the users. MauveDB supports a declarative language for defining model-based views, allows declarative querying over such views using SQL, and supports several different materialization strategies and techniques to efficiently maintain them in the face of frequent updates. We have implemented a prototype system that currently supports views based on regression and interpolation, using the Apache Derby open source DBMS, and we present results that show the utility and performance benefits that can be obtained by supporting several different types of model-based views in a database system.

References

  1. I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey. Computer Networks, 38, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Periklis Andritsos, Ariel Fuxman, and Renee J. Miller. Clean answers over dirty databases. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. The Apache Derby Project. Web Site. http://db.apache.org/derby/.Google ScholarGoogle Scholar
  4. D. Barbara, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE TKDE, 4(5):487--502, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tim Brooke and Jenna Burrell. From ethnography to design in a vineyard. In Proceeedings of the Design User Experiences (DUX) Conference, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Cerpa, J. Elson, D.Estrin, L. Girod, M. Hamilton, and J. Zhao. Habitat monitoring: Application driver for wireless communications technology. In Proceedings of ACM SIGCOMM 2001 Workshop on Data Communications in Latin America and the Caribbean. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Surajit Chaudhuri, Vivek Narasayya, and Sunita Sarawagi. Efficient evaluation of queries with mining predicates. In Proceedings of ICDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Reynold Cheng, Dmitri V. Kalashnikov, and Sunil Prabhakar. Evaluating probabilistic queries over imprecise data. In Proceedings of SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Chu, H. Haussecker, and F. Zhao. Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks. In Intl Journal of High Performance Computing Applications, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dorothy E. Denning et al. Views for multilevel database security. IEEE Trans. Softw. Eng., 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Amol Deshpande, Carlos Guestrin, Sam Madden, Joe Hellerstein, and Wei Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Norbert Fuhr and Thomas Rolleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1):32--66, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins, 1989.Google ScholarGoogle Scholar
  15. G. Grahne. Horn tables - an efficient tool for handling incomplete information in databases. In PODS, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Grumbach, P. Rigaux, and L. Segoufin. Manipulating interpolated data is easier than you thought. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Guestrin, P. Bodik, R. Thibaux, M. Paskin, and S. Madden. Distributed regression: an efficient frame- work for modeling sensor network data. In IPSN, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Gupta and I.S. Mumick. Materialized views: techniques, implementations, and applications. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Hand, Heikki Mannila, and Padhraic Smyth. Principles of Data Mining. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. DB2 Intelligent Miner. Web Site. http://www-306.ibm.com/software/data/iminer/.Google ScholarGoogle Scholar
  21. T. Imielinski and W. Lipski Jr. Incomplete infor- mation in relational databases. JACM, 31(4), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalable and robust communication paradigm for sensor networks. In MOBICOM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Jain, E. Change, and Y. Wang. Adaptive stream resource management using kalman filters. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. V. S. Lakshmanan, N. Leone, R. Ross, and V. S. Subrahmanian. Probview: a flexible probabilistic database system. ACM TODS, 22(3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Suk Kyoon Lee. An extended relational database model for uncertain and imprecise information. In VLDB, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Liao, D. Fox, and H. Kautz. Location-based activity recognition using relational markov networks. In IJCAI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sam Madden. Intel lab data, 2004. http://berkeley.intel-research.net/labdata.Google ScholarGoogle Scholar
  28. Samuel Madden, Wei Hong, Joseph M. Hellerstein, and Michael Franklin. TinyDB web page. http://telegraph.cs.berkeley.edu/tinydb.Google ScholarGoogle Scholar
  29. A. Mainwaring, J. Polastre, R. Szewczyk, and D. Culler. Wireless sensor networks for habitat monitoring. In ACM Workshop on Sensor Networks and Applications, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Erin McKean, editor. The Oxford English Dictionary (2nd Edition). Oxford Univeristy Press, 2005.Google ScholarGoogle Scholar
  31. Leonore Neugebauer. Optimization and evaluation of database queries including embedded interpolation procedures. In Proceedings of SIGMOD, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. George M. Phillips. Interpolation and Approximation by Polynomials. Springer-Verlag, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  33. PMML 3.0 Specification. Web Site. http://www.dmg.org/v3-0/GeneralStructure.html.Google ScholarGoogle Scholar
  34. S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with databases: alternatives and implications. In Proceedings of SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Business Analytics Software Solutions (SAS). Web Site. http://www.sas.com/technologies/analytics.Google ScholarGoogle Scholar
  36. J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.Google ScholarGoogle Scholar
  37. Y. Xia, S. Prabhakar, S. Lei, R. Cheng, and R. Shah. Indexing continuously changing data with mean-variance tree. In ACM SAC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Yao and J. Gehrke. Query processing in sensor networks. In CIDR, 2003.Google ScholarGoogle Scholar

Index Terms

  1. MauveDB: supporting model-based user views in database systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
      June 2006
      830 pages
      ISBN:1595934340
      DOI:10.1145/1142473

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 June 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader