Skip to main content

Integration of Web Sources Under Uncertainty and Dependencies Using Probabilistic XML

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Included in the following conference series:

Abstract

We study in this vision paper the problem of integrating several web data sources under uncertainty and dependencies. We present a concrete application with web sources about objects in the maritime domain where uncertainties and dependencies are omnipresent. Uncertainties are mainly caused by imprecise information trackers and imperfect human knowledge. Dependencies come from the recurrent copying relationships occurring among the sources. We answer the issue of data integration in such a setting by reformulating it as the merge of several uncertain versions of the same global XML document. As an initial result, we put forward a probabilistic XML data integration model by getting some intuitions from the versioning model with uncertain data we proposed in [5]. We explain how this model can be used for materializing the integration outcome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All the screen-shots given in Fig. 2 were captured January 8th, 2014 from http://www.flickr.com/search/?q=CostaSerena, http://en.wikipedia.org/wiki/Costa_Serena, http://www.shippingexplorer.net/en/vessels/view/14429-costa-serena, http://www.marinetraffic.com/ais/details/ships/247187600, and http://www.grosstonnage.com/.

References

  1. Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. 18, 1041–1064 (2009)

    Article  Google Scholar 

  2. Agrawal, P., Sarma, A.D., Ullman, J., Widom, J.: Foundations of uncertain-data integration. Proc. VLDB Endow. 3, 1080–1090 (2010)

    Article  Google Scholar 

  3. Ayat, N., Afsarmanesh, H., Akbarinia, R., Valduriez, P.: An uncertain data integration system. In: Meersman, R., et al. (eds.) OTM 2012, Part II. LNCS, vol. 7566, pp. 825–842. Springer, Heidelberg (2012)

    Google Scholar 

  4. Ba, M.L., Abdessalem, T., Senellart, P.: Merging uncertain multi-version XML documents. In: Proceedings of DChanges, Florence, Italy (2013)

    Google Scholar 

  5. Ba, M.L., Abdessalem, T., Senellart, P.: Uncertain version control in open collaborative editing of tree-structured documents. In: Proceedings of Document Engineering (2013)

    Google Scholar 

  6. Cobena, G., Abdessalem, T., Hinnach, Y.: A comparative study for XML change detection. In: BDA (2002)

    Google Scholar 

  7. Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of SIGMOD (2008)

    Google Scholar 

  8. Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of VLDB (2007)

    Google Scholar 

  9. Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. Proc. VLDB Endow. 3, 1358–1369 (2010)

    Article  Google Scholar 

  10. Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endow. 2, 550–561 (2009)

    Article  Google Scholar 

  11. Kharlamov, E., Nutt, W., Senellart, P.: Updating probabilistic xml. In: Proceedings of EDBT/ICDT Workshops (2010)

    Google Scholar 

  12. Kimelfeld, B., Senellart, P.: Probabilistic XML: models and complexity. In: Ma, Z., Yan, L. (eds.) Advances in Probabilistic Databases for Uncertain Information Management. Springer, Heidelberg (2013)

    Google Scholar 

  13. Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep Web: is the problem solved? In: Proceedings of VLDB, Sept 2013

    Google Scholar 

  14. Lindholm, T., Kangasharju, J., Tarkoma, S.: Fast and simple XML tree differencing by sequence alignment. In: Proceedings on Document Engineering (2006)

    Google Scholar 

  15. Peters, L.: Change detection in XML trees: a survey. In: TSIT Conference (2005)

    Google Scholar 

  16. van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18, 1191–1217 (2009)

    Article  Google Scholar 

  17. van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of ICDE (2005)

    Google Scholar 

Download references

Acknowledgements

We are grateful to Pierre Senellart and Stephane Bressan for their precious remarks and suggestions. This work was partially funded by the NORMATIS project, and the French government under the STIC-Asia program, CCIPX project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Lamine Ba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ba, M.L., Montenez, S., Tang, R., Abdessalem, T. (2014). Integration of Web Sources Under Uncertainty and Dependencies Using Probabilistic XML. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43984-5_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43983-8

  • Online ISBN: 978-3-662-43984-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics