Abstract
We study in this vision paper the problem of integrating several web data sources under uncertainty and dependencies. We present a concrete application with web sources about objects in the maritime domain where uncertainties and dependencies are omnipresent. Uncertainties are mainly caused by imprecise information trackers and imperfect human knowledge. Dependencies come from the recurrent copying relationships occurring among the sources. We answer the issue of data integration in such a setting by reformulating it as the merge of several uncertain versions of the same global XML document. As an initial result, we put forward a probabilistic XML data integration model by getting some intuitions from the versioning model with uncertain data we proposed in [5]. We explain how this model can be used for materializing the integration outcome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
All the screen-shots given in Fig. 2 were captured January 8th, 2014 from http://www.flickr.com/search/?q=CostaSerena, http://en.wikipedia.org/wiki/Costa_Serena, http://www.shippingexplorer.net/en/vessels/view/14429-costa-serena, http://www.marinetraffic.com/ais/details/ships/247187600, and http://www.grosstonnage.com/.
References
Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. 18, 1041–1064 (2009)
Agrawal, P., Sarma, A.D., Ullman, J., Widom, J.: Foundations of uncertain-data integration. Proc. VLDB Endow. 3, 1080–1090 (2010)
Ayat, N., Afsarmanesh, H., Akbarinia, R., Valduriez, P.: An uncertain data integration system. In: Meersman, R., et al. (eds.) OTM 2012, Part II. LNCS, vol. 7566, pp. 825–842. Springer, Heidelberg (2012)
Ba, M.L., Abdessalem, T., Senellart, P.: Merging uncertain multi-version XML documents. In: Proceedings of DChanges, Florence, Italy (2013)
Ba, M.L., Abdessalem, T., Senellart, P.: Uncertain version control in open collaborative editing of tree-structured documents. In: Proceedings of Document Engineering (2013)
Cobena, G., Abdessalem, T., Hinnach, Y.: A comparative study for XML change detection. In: BDA (2002)
Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of SIGMOD (2008)
Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of VLDB (2007)
Dong, X.L., Berti-Equille, L., Hu, Y., Srivastava, D.: Global detection of complex copying relationships between sources. Proc. VLDB Endow. 3, 1358–1369 (2010)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating conflicting data: the role of source dependence. Proc. VLDB Endow. 2, 550–561 (2009)
Kharlamov, E., Nutt, W., Senellart, P.: Updating probabilistic xml. In: Proceedings of EDBT/ICDT Workshops (2010)
Kimelfeld, B., Senellart, P.: Probabilistic XML: models and complexity. In: Ma, Z., Yan, L. (eds.) Advances in Probabilistic Databases for Uncertain Information Management. Springer, Heidelberg (2013)
Li, X., Dong, X.L., Lyons, K., Meng, W., Srivastava, D.: Truth finding on the deep Web: is the problem solved? In: Proceedings of VLDB, Sept 2013
Lindholm, T., Kangasharju, J., Tarkoma, S.: Fast and simple XML tree differencing by sequence alignment. In: Proceedings on Document Engineering (2006)
Peters, L.: Change detection in XML trees: a survey. In: TSIT Conference (2005)
van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18, 1191–1217 (2009)
van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of ICDE (2005)
Acknowledgements
We are grateful to Pierre Senellart and Stephane Bressan for their precious remarks and suggestions. This work was partially funded by the NORMATIS project, and the French government under the STIC-Asia program, CCIPX project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ba, M.L., Montenez, S., Tang, R., Abdessalem, T. (2014). Integration of Web Sources Under Uncertainty and Dependencies Using Probabilistic XML. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-662-43984-5_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)