Abstract
This chapter describes a framework for querying heterogeneous XML data sources, that extends previous approaches for approximate query evaluation, by providing techniques for combining partial answers coming from different sources. This approach does not rely on a global schema shared by the sources, but it automatically adapts the query to the available data, providing the user with the XML elements satisfying the query to a certain extent. Based on this framework, a query language is described which allows the collection of as much information as possible from several heterogeneous XML sources. An algorithm for approximately evaluating a query on a single source and a strategy to join partial results coming from different sources are provided. Finally, an experimental validation of the approach in a peer-to-peer application scenario is presented.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Benjelloun, O., Milo, T.: The Active XML project: an overview. Journal on Very Large Databases 17(5), 1019–1040 (2008)
Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Proc. Int. Conf. on Extending Database Technology, pp. 496–513 (2002)
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and Content Scoring for XML. In: Proc. Int. Conf. on Very Large Databases, pp. 361–372 (2005)
Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 83–94 (2004)
Augsten, N., Bhlen, M.H., Dyreson, C.E., Gamper, J.: Approximate Joins for Data-Centric XML. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 814–823 (2008)
Baru, C.K., Gupta, A., Ludscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., Chu, V.: XML-based information mediation with mix. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 597–599 (1999)
Beneventano, D., Bergamaschi, S., Guerra, F., Vincini, M.: The SEWASIE Network of Mediator Agents for Semantic Search. Journal of Univ. Comp. Science 13(12), 1936–1969 (2007)
Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, L.V.S., Pottinger, R.: HePToX: Marrying XML and heterogeneity in your P2P databases. In: Proc. Int. Conf. on Very Large Databases, pp. 1267–1270 (2005)
Camillo, S.D., Heuser, C.A., Mello, R.S.: Querying heterogeneous XML sources through a conceptual schema. In: Proc. Int. Conf. on Conceptual Modeling, pp. 186–199 (2003)
Chen, C.X., Mihaila, G.A., Padmanabhan, S., Rouvellou, I.: Query translation scheme for heterogeneous XML data sources. In: Proc. ACM Int. Work. on Web Information and Data Management, pp. 31–38 (2005)
Do, H., Rahm, E.: COMA - A system for flexible combination of schema matching approaches. In: Proc. Int. Conf. on Very Large Databases, pp. 610–621 (2002)
Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 509–520 (2001)
Fagin, R.: Combining Fuzzy Information from Multiple Systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)
Fazzinga, B., Flesca, S., Pugliese, A.: Retrieving XML data from heterogeneous sources through vague querying. ACM Trans. on Internet Technology 9(2) (2009)
Fuhr, N., Grojohann, K.: XIRQL: An XML query language based on information retrieval concepts. ACM Trans. on Information Systems 22(2), 313–356 (2004)
Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Integrating XML data sources using approximate joins. ACM Trans. on Database Systems 31(1), 161–207 (2006)
Halevy, A.Y., Ives, Z.G., Madhavan, J., Mork, P., Suciu, D., Tatarinov, I.: The Piazza Peer Data Management System. IEEE Trans. on Knowledge and Data Engineering 16(7) (2004)
Leitão, L., Calado, P., Weis, M.: Structure-based inference of xml similarity for fuzzy duplicate detection. In: Proc. Int. Conf. on Information and Knowledge Management, pp. 293–302 (2007)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with Cupid. In: Proc. Int. Conf. on Very Large Databases, pp. 49–58 (2001)
Mandreoli, F., Martoglia, R., Tiberio, P.: Approximate query answering for a heterogeneous XML document base. In: Proc. Int. Conf. on Web Information Systems Engineering, pp. 337–351 (2004)
Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries on heterogeneous data sources. In: Proc. Int. Conf. on Very Large Databases, pp. 241–250 (2001)
Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. Journal of the ACM 51(1), 2–45 (2004)
Milano, D., Scannapieco, M., Catarci, T.: Structure-aware XML Object Identification. IEEE Data Eng. Bull. 29(2), 67–74 (2006)
Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A., Nilsson, M., Palmr, M., Risch, T.: EDUTELLA: A P2P networking infrastructure based on RDF. In: Proc. Int. World Wide Web Conf., pp. 604–615 (2002)
Pan, H.: Relevance Feedback in XML Retrieval. In: Proc. Int. Conf. on Extending Database Technology Workshops, pp. 187–196 (2004)
Pitoura, E., Abiteboul, S., Pfoser, D., Samaras, G., Vazirgiannis, M.: DBGlobe: A service-oriented P2P system for global computing. ACM SIGMOD Record 32(3), 77–82 (2003)
Polyzotis, N., Garofalakis, M.N.: Xsketch synopses for xml data graphs. ACM Transaction on Database Systems 31(3), 1014–1063 (2006)
Puhlmann, S., Weis, M., Naumann, F.: XML Duplicate Detection Using Sorted Neighborhoods. In: Proc. Int. Conf. on Extending Database Technology, pp. 773–791 (2006)
Ribeiro, L., Hrder, T.: Entity Identification in XML Documents. Grundlagen von Datenbanken, 130–134 (2006)
Rodriguez-Gianolli, P., Mylopoulos, J.: A semantic approach to XML-based data integration. In: Proc. Int. Conf. on Conceptual Modeling, pp. 117–132 (2001)
Schlieder, T.: Schema-driven evaluation of approximate tree-pattern queries. In: Proc. Int. Conf. on Extending Database Technology, pp. 514–532 (2002)
Tatarinov, I., Halevy, A.Y.: Efficient query reformulation in peer-data management systems. In: Proc. ACM SIGMOD Conf. on Management of Data (2004)
Theobald, A., Weikum, G.: Adding Relevance to XML. In: Proc. Int. Work. on the Web and Databases, pp. 35–40 (2000)
Vdovjak, R., Houben, G.: RDF-based architecture for semantic integration of heterogeneous information sources. In: Proc. Work. on Information Integration on the Web, pp. 51–57 (2001)
WordNet, http://wordnet.princeton.edu/
The World Wide Web Consortium. Extensible Markup Language (XML), http://www.w3.org/XML
The World Wide Web Consortium. XML Path Language, http://www.w3.org/TR/xpath
Yu, C., Popa, L.: Constraint-based XML query rewriting for data integration. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 371–382 (2004)
Zhang, K., Stgatman, R., Shasha, D.: Simple fast algorithm for the editing distance between trees and related problems. SIAM J. on Computing 18(6), 1245–1262 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fazzinga, B. (2010). Exploiting Vague Queries to Collect Data from Heterogeneous XML Sources. In: Ma, Z., Yan, L. (eds) Soft Computing in XML Data Management. Studies in Fuzziness and Soft Computing, vol 255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14010-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-14010-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14009-9
Online ISBN: 978-3-642-14010-5
eBook Packages: EngineeringEngineering (R0)