ABSTRACT
Many scientific, financial, data mining and sensor network applications need to work with continuous, rather than discrete data e.g., temperature as a function of location, or stock prices or vehicle trajectories as a function of time. Querying raw or discrete data is unsatisfactory for these applications -- e.g., in a sensor network, it is necessary to interpolate sensor readings to predict values at locations where sensors are not deployed. In other situations, raw data can be inaccurate owing to measurement errors, and it is useful to fit continuous functions to raw data and query the functions, rather than raw data itself -- e.g., fitting a smooth curve to noisy sensor readings, or a smooth trajectory to GPS data containing gaps or outliers. Existing databases do not support storing or querying continuous functions, short of brute-force discretization of functions into a collection of tuples. We present FunctionDB, a novel database system that treats mathematical functions as first-class citizens that can be queried like traditional relations. The key contribution of FunctionDB is an efficient and accurate algebraic query processor - for the broad class of multi-variable polynomial functions, FunctionDB executes queries directly on the algebraic representation of functions without materializing them into discrete points, using symbolic operations: zero finding, variable substitution, and integration. Even when closed form solutions are intractable, FunctionDB leverages symbolic approximation operations to improve performance. We evaluate FunctionDB on real data sets from a temperature sensor network, and on traffic traces from Boston roads. We show that operating in the functional domain has substantial advantages in terms of accuracy (15-30%) and up to order of magnitude (10x-100x) performance wins over existing approaches that represent models as discrete collections of points.
- PostGIS. http://postgis.refractions.net/.Google Scholar
- Y. Ahmad and U. C¸ etintemel. Declarative temporal data models for sensor-driven query processing. In DMSN, 2007. Google ScholarDigital Library
- A. Brodsky, V. E. Segal, J. Chen, and P. A. Exarkhopoulo. The CCUBE Constraint Object-Oriented Database System. In SIGMOD, 1999. Google ScholarDigital Library
- A. Deshpande and S. Madden. MauveDB: Supporting Model-Based User Views in Database Systems. In SIGMOD, 2006. Google ScholarDigital Library
- S. Grumbach, P. Rigaux, and L. Segoufin. The DEDALE system for complex spatial queries. In SIGMOD, pages 213--224, 1998. Google ScholarDigital Library
- S. Grumbach, P. Rigaux, and L. Segoufin. Manipulating Interpolated Data is Easier than You Thought. In The VLDB Journal, pages 156--165, 2000. Google ScholarDigital Library
- R. H. Guting, M. H. Bohlen, M. Erwig, C. S. Jensen, N. A. Lorentzos, M. Schneider, and M. Vazirgiannis. A Foundation for Representing and Querying Moving Objects. ACM TODS, 25(1):1--42, 2000. Google ScholarDigital Library
- D. Haroud and B. Faltings. Global consistency for continuous constraints. In Principles and Practice of Constraint Programming, pages 40--50, 1994. Google ScholarDigital Library
- B. Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. K. Miu, E. Shih, H. Balakrishnan, and S. Madden. CarTel: A Distributed Mobile Sensor Computing System. In Sensys, Boulder, CO, November 2006. Google ScholarDigital Library
- P. C. Kanellakis, G. M. Kuper, and P. Z. Revesz. Constraint Query Languages. In PODS, 1990. Google ScholarDigital Library
- E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An Online Algorithm For Segmenting Time Series. In ICDM, pages 289--296, 2001. Google ScholarDigital Library
- R. A. O. L. Breiman, J. H. Friedman and C. J. Stone. Classification And Regression Trees. Wadsworth International Group, 1984.Google Scholar
- W. Y. Loh. Regression Trees With Unbiased Variable Selection And Interaction Detection. Statistica Sinica, 12:361--386, 2002.Google Scholar
- R. Martin, H. Shou, I. Voiculescu, A. Bowyer, and G. Wang. Comparison of Interval Methods For Plotting Algebraic Curves. Computer Aided Geometric Design, 19(7):553--587, 2002. Google ScholarDigital Library
- P. Z. Revesz. Constraint databases: A survey. In Semantics in Databases, pages 209--246, 1995. Google ScholarDigital Library
- P. Z. Revesz, R. Chen, P. Kanjamala, Y. Li, Y. Liu, and Y. Wang. The MLPQ/GIS Constraint Database System. In SIGMOD, 2000. Google ScholarDigital Library
- G. Taubin. Rasterizing algebraic curves and surfaces. IEEE Comp. Graphics and Applications, 14(2):14--23, 1994. Google ScholarDigital Library
- A. Thiagarajan. Representing and Querying Regression Models in an RDBMS. Master's thesis, MIT, Sep 2007.Google Scholar
- M. Vazirgiannis and O. Wolfson. A Spatiotemporal Model and Language for Moving Objects on Road Networks. In SSTD, pages 20--35, 2001. Google ScholarDigital Library
Index Terms
- Querying continuous functions in a database system
Recommendations
Querying Imprecise Data in Sensor Databases
MDM '08: Proceedings of the The Ninth International Conference on Mobile Data ManagementSensors are used to monitor some physical phenomena such as contamination, climate, building, and so on. The sensors collect and communicate their readings to the sensor databases for making decisions and answering various user queries. Due to ...
Improved continuous query plan with cluster weighted dominant querying in synthetic datasets
AbstractThe arrival of large voluminous continuous queries sets for a given query leads an insignificant insights. The elimination of certain data tuples occurs in order to balance the system load. The streaming query removes the improper data tuples and ...
Querying a Collection of Continuous Functions
We introduce a new query primitive called <italic>Function Query</italic> (FQ). An FQ operates on a set of math functions and retrieves the functions whose output with a given input satisfies a query condition (e.g., being among top k, within a given ...
Comments