Skip to main content

Querying and Cleaning Uncertain Data

  • Conference paper
Quality of Context (QuaCon 2009)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5786))

Included in the following conference series:

Abstract

The management of uncertainty in large databases has recently attracted tremendous research interest. Data uncertainty is inherent in many emerging and important applications, including location-based services, wireless sensor networks, biometric and biological databases, and data stream applications. In these systems, it is important to manage data uncertainty carefully, in order to make correct decisions and provide high-quality services to users. To enable the development of these applications, uncertain database systems have been proposed. They consider data uncertainty as a “first-class citizen”, and use generic data models to capture uncertainty, as well as provide query operators that return answers with statistical confidences.

We summarize our work on uncertain databases in recent years. We explain how data uncertainty can be modeled, and present a classification of probabilistic queries (e.g., range query and nearest-neighbor query). We further study how probabilistic queries can be efficiently evaluated and indexed. We also highlight the issue of removing uncertainty under a stringent cleaning budget, with an attempt of generating high-quality probabilistic answers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antova, L., Koch, C., Olteanu, D.: Query language support for incomplete information in the maybms system. In: Proc. VLDB (2007)

    Google Scholar 

  2. Böhm, C., Pryakhin, A., Schubert, M.: The gauss-tree: Efficient object identification in databases of probabilistic feature vectors. In: Proc. ICDE (2006)

    Google Scholar 

  3. Chen, J., Cheng, R.: Efficient evaluation of imprecise location-dependent queries. In: Proc. ICDE (2007)

    Google Scholar 

  4. Chen, J., Cheng, R.: Quality-aware probing of uncertain data with resource constraints. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 491–508. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Cheng, R., Chen, J., Mokbel, M., Chow, C.: Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data. In: Proc. ICDE (2008)

    Google Scholar 

  6. Cheng, R., Chen, J., Xie, X.: Cleaning uncertain data with quality guarantees. In: Proc. VLDB (2008)

    Google Scholar 

  7. Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proc. EDBT (2009)

    Google Scholar 

  8. Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proc. ACM SIGMOD, pp. 551–562 (2003)

    Google Scholar 

  9. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9) (September 2004)

    Google Scholar 

  10. Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J., Xia, Y.: Efficient join processing over uncertain data. In: Proc. CIKM (2006)

    Google Scholar 

  11. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proc. VLDB, pp. 876–887 (2004)

    Google Scholar 

  12. Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic spatial queries on existentially uncertain data. In: Proc. SSTD, pp. 400–417 (2005)

    Google Scholar 

  13. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)

    Google Scholar 

  14. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: Proc. VLDB (2004)

    Google Scholar 

  15. Pfoser, D., Jensen, C.: Capturing the uncertainty of moving-objects representations. In: Proc. SSDBM (1999)

    Google Scholar 

  16. Kriegel, H., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 337–348. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Lazaridis, I., Mehrotra, S.: Approximate selection queries over imprecise data. In: ICDE (2004)

    Google Scholar 

  18. Ljosa, V., Singh, A.: Apla: Indexing arbitrary probability distributions. In: Proc. ICDE, pp. 946–955 (2007)

    Google Scholar 

  19. Mar, O., Sarma, A., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB (2006)

    Google Scholar 

  20. Mayfield, C., Singh, S., Cheng, R., Prabhakar, S.: Orion: A database system for managing uncertain data, ver. 0.1 (2006), http://orion.cs.purdue.edu

  21. Parker, A., Subrahmanian, V., Grant, J.: A logical formulation of probabilistic spatial databases. IEEE TKDE 19(11) (2007)

    Google Scholar 

  22. Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: Proc. VLDB (2007)

    Google Scholar 

  23. Sarma, A., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: Proc. ICDE (2006)

    Google Scholar 

  24. Shannon, C.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)

    MATH  Google Scholar 

  25. Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database support for probabilistic attributes and tuples. In: Proc. ICDE (2008)

    Google Scholar 

  26. Sistla, P.A., Wolfson, O., Chamberlain, S., Dao, S.: Querying the uncertain position of moving objects. In: Etzion, O., Jajodia, S., Sripada, S. (eds.) Dagstuhl Seminar 1997. LNCS, vol. 1399, Springer, Heidelberg (1998)

    Google Scholar 

  27. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proc. VLDB, pp. 922–933 (2005)

    Google Scholar 

  28. Tao, Y., Xiao, X., Cheng, R.: Range search on multidimensional uncertain data. ACM TODS 32(3) (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cheng, R. (2009). Querying and Cleaning Uncertain Data. In: Rothermel, K., Fritsch, D., Blochinger, W., Dürr, F. (eds) Quality of Context. QuaCon 2009. Lecture Notes in Computer Science, vol 5786. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04559-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04559-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04558-5

  • Online ISBN: 978-3-642-04559-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics