Skip to main content

A Top-k Filter for Logic-Based Similarity Conditions on Probabilistic Databases

  • Conference paper
Advances in Databases and Information Systems (ADBIS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7503))

Abstract

Probabilistic databases have been established as a powerful technique for managing and analysing large uncertain data sets. A major challenge for probabilistic databases is query evaluation. There exist even simple relational queries for which the exact probability computation is \(\#\mathcal{P}\)-hard. Consequently, if we are only interested in the k highest ranked tuples, then an efficient pre-filtering can reduce the computation time significantly. In this work we present a top-k filter which computes a small candidate set for a top-k answer based on a complex relational query in polynomial time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE, pp. 983–992 (2008)

    Google Scholar 

  2. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)

    Article  Google Scholar 

  3. Fuhr, N., Roelleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. IS 15(1), 32–66 (1997)

    Article  Google Scholar 

  4. Ilyas, I.F., Soliman, M.A.: Probabilistic Ranking Techniques in Relational Databases. Synthesis Lectures on DM. Morgan & Claypool (2011)

    Google Scholar 

  5. Karp, R.M., Luby, M., Madras, N.: Monte-carlo approximation algorithms for enumeration problems. Journal of Algorithms 10(3), 429–448 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  6. Koch, C.: MayBMS: A System for Managing Large Uncertain and Probabilistic Databases. In: Managing and Mining Uncertain Data, ch. 6. Springer (2008)

    Google Scholar 

  7. Lehrack, S., Saretz, S., Schmitt, I.: QSQL2: Query Language Support for Logic-Based Similarity Conditions on Probabilistic Databases. In: RCIS (2012)

    Google Scholar 

  8. Lehrack, S., Schmitt, I.: QSQL: Incorporating Logic-Based Retrieval Conditions into SQL. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5981, pp. 429–443. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Lehrack, S., Schmitt, I.: A Probabilistic Interpretation for a Geometric Similarity Measure. In: Liu, W. (ed.) ECSQARU 2011. LNCS, vol. 6717, pp. 749–760. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Lehrack, S., Schmitt, I.: A Unifying Probability Measure for Logic-Based Similarity Conditions on Uncertain Relational Data. In: NTSS, pp. 14–19 (2011)

    Google Scholar 

  11. Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. VLDB J. 20(2), 249–275 (2011)

    Article  Google Scholar 

  12. Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: ICDE, pp. 145–156 (2010)

    Google Scholar 

  13. Olteanu, D., Wen, H.: Ranking Query Answers in Probabilistic Databases: Complexity and Efficient Algorithms. In: ICDE (to appear, 2012)

    Google Scholar 

  14. Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)

    Google Scholar 

  15. Re, C., Suciu, D.: Approximate lineage for probabilistic databases. PVLDB 1(1), 797–808 (2008)

    Google Scholar 

  16. Re, C., Suciu, D.: Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can’t-Do. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 5–18. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Sarma, A.D., Benjelloun, O., Halevy, A.Y., Widom, J.: Working models for uncertain data. In: ICDE, p. 7 (2006)

    Google Scholar 

  18. Schaefer, F., Schulze, A.: OpenInfRA – Storing and retrieving information in a heterogenous documentation system. In: CAA (2012)

    Google Scholar 

  19. Schmitt, I.: QQL: A DB&IR Query Language. VLDB J. 17(1), 39–56 (2008)

    Article  Google Scholar 

  20. Soliman, M.A., Ilyas, I.F., Saleeb, M.: Building ranked mashups of unstructured sources with uncertain information. Proc. VLDB Endow 3, 826–837 (2010)

    Google Scholar 

  21. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)

    Google Scholar 

  22. Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Managing and Mining Uncertain Data, pp. 113–148. Springer, Heidelberg (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lehrack, S., Saretz, S. (2012). A Top-k Filter for Logic-Based Similarity Conditions on Probabilistic Databases. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2012. Lecture Notes in Computer Science, vol 7503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33074-2_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33074-2_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33073-5

  • Online ISBN: 978-3-642-33074-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics