Skip to main content

Incorporating Domain Knowledge and User Expertise in Probabilistic Tuple Merging

  • Conference paper
Scalable Uncertainty Management (SUM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6929))

Included in the following conference series:

Abstract

Today, probabilistic databases (PDB) become helpful in several application areas. In the context of cleaning a single PDB or integrating multiple PDBs, duplicate tuples need to be merged. A basic approach for merging probabilistic tuples is simply to build the union of their sets of possible instances. In a merging process, however, often additional domain knowledge or user expertise is available. For that reason, in this paper we extend the basic approach with aggregation functions, knowledge rules, and instance weights for incorporating external knowledge in the merging process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andritsos, P., Fuxman, A., Miller, R.J.: Clean Answers over Dirty Databases: A Probabilistic Approach. In: ICDE, p. 30–41 (2006)

    Google Scholar 

  2. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: Databases with Uncertainty and Lineage. In: VLDB, pp. 953–964 (2006)

    Google Scholar 

  3. Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. 41(1) (2008)

    Google Scholar 

  4. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)

    Article  Google Scholar 

  5. Dayal, U.: Processing Queries Over Generalization Hierarchies in a Multidatabase System. In: VLDB, pp. 342–353 (1983)

    Google Scholar 

  6. DeMichiel, L.G.: Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains. IEEE Trans. Knowl. Data Eng. 1(4), 485–493 (1989)

    Article  Google Scholar 

  7. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  8. Khoussainova, N., Balazinska, M., Suciu, D.: Probabilistic event extraction from rfid data. In: ICDE, pp. 1480–1482 (2008)

    Google Scholar 

  9. Koch, C.: MayBMS: A System for Managing Large Uncertain and Probabilistic Databases. In: Managing and Mining Uncertain Data. Springer, Heidelberg (2009)

    Google Scholar 

  10. Lim, E.-P., Srivastava, J., Shekhar, S.: An Evidential Reasoning Approach to Attribute Value Conflict Resolution in Database Integration. IEEE Trans. Knowl. Data Eng. 8(5), 707–723 (1996)

    Article  Google Scholar 

  11. Motro, A., Anokhin, P.: Fusionplex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources. Information Fusion 7(2), 176–196 (2006)

    Article  Google Scholar 

  12. Panse, F., Ritter, N.: Tuple Merging in Probabilistic Databases. In: MUD, pp. 113–127 (2010)

    Google Scholar 

  13. Panse, F., van Keulen, M., de Keijzer, A., Ritter, N.: Duplicate Detection in Probabilistic Data. In: NTII, pp. 179–182 (2010)

    Google Scholar 

  14. Robertson, E., Wyss, C.M.: Optimal Tuple Merge is NP-Complete. Technical Report TR599, IUCS (2004)

    Google Scholar 

  15. Suciu, D., Connolly, A., Howe, B.: Embracing Uncertainty in Large-Scale Computational Astrophysics. In: MUD, pp. 63–77 (2009)

    Google Scholar 

  16. Tseng, F.S.-C., Chen, A.L.P., Yang, W.-P.: Answering Heterogeneous Database Queries with Degrees of Uncertainty. Distributed and Parallel Databases 1(3), 281–302 (1993)

    Article  Google Scholar 

  17. van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18(5), 1191–1217 (2009)

    Article  Google Scholar 

  18. Wang, D.Z., Michelakis, E., Franklin, M.J., Garofalakis, M., Hellerstein, J.M.: Probabilistic declarative information extraction. In: ICDE, pp. 173–176 (2010)

    Google Scholar 

  19. Whang, S.E., Benjelloun, O., Garcia-Molina, H.: Generic entity resolution with negative rules. VLDB J. 18(6), 1261–1277 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Panse, F., Ritter, N. (2011). Incorporating Domain Knowledge and User Expertise in Probabilistic Tuple Merging. In: Benferhat, S., Grant, J. (eds) Scalable Uncertainty Management. SUM 2011. Lecture Notes in Computer Science(), vol 6929. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23963-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23963-2_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23962-5

  • Online ISBN: 978-3-642-23963-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics