Skip to main content

Exploiting Vague Queries to Collect Data from Heterogeneous XML Sources

  • Chapter
Soft Computing in XML Data Management

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 255))

  • 616 Accesses

Abstract

This chapter describes a framework for querying heterogeneous XML data sources, that extends previous approaches for approximate query evaluation, by providing techniques for combining partial answers coming from different sources. This approach does not rely on a global schema shared by the sources, but it automatically adapts the query to the available data, providing the user with the XML elements satisfying the query to a certain extent. Based on this framework, a query language is described which allows the collection of as much information as possible from several heterogeneous XML sources. An algorithm for approximately evaluating a query on a single source and a strategy to join partial results coming from different sources are provided. Finally, an experimental validation of the approach in a peer-to-peer application scenario is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Benjelloun, O., Milo, T.: The Active XML project: an overview. Journal on Very Large Databases 17(5), 1019–1040 (2008)

    Article  Google Scholar 

  2. Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Proc. Int. Conf. on Extending Database Technology, pp. 496–513 (2002)

    Google Scholar 

  3. Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and Content Scoring for XML. In: Proc. Int. Conf. on Very Large Databases, pp. 361–372 (2005)

    Google Scholar 

  4. Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 83–94 (2004)

    Google Scholar 

  5. Augsten, N., Bhlen, M.H., Dyreson, C.E., Gamper, J.: Approximate Joins for Data-Centric XML. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 814–823 (2008)

    Google Scholar 

  6. Baru, C.K., Gupta, A., Ludscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., Chu, V.: XML-based information mediation with mix. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 597–599 (1999)

    Google Scholar 

  7. Beneventano, D., Bergamaschi, S., Guerra, F., Vincini, M.: The SEWASIE Network of Mediator Agents for Semantic Search. Journal of Univ. Comp. Science 13(12), 1936–1969 (2007)

    Google Scholar 

  8. http://www.bittorrent.com

  9. Bonifati, A., Chang, E.Q., Ho, T., Lakshmanan, L.V.S., Pottinger, R.: HePToX: Marrying XML and heterogeneity in your P2P databases. In: Proc. Int. Conf. on Very Large Databases, pp. 1267–1270 (2005)

    Google Scholar 

  10. Camillo, S.D., Heuser, C.A., Mello, R.S.: Querying heterogeneous XML sources through a conceptual schema. In: Proc. Int. Conf. on Conceptual Modeling, pp. 186–199 (2003)

    Google Scholar 

  11. Chen, C.X., Mihaila, G.A., Padmanabhan, S., Rouvellou, I.: Query translation scheme for heterogeneous XML data sources. In: Proc. ACM Int. Work. on Web Information and Data Management, pp. 31–38 (2005)

    Google Scholar 

  12. Do, H., Rahm, E.: COMA - A system for flexible combination of schema matching approaches. In: Proc. Int. Conf. on Very Large Databases, pp. 610–621 (2002)

    Google Scholar 

  13. Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 509–520 (2001)

    Google Scholar 

  14. Fagin, R.: Combining Fuzzy Information from Multiple Systems. J. Comput. Syst. Sci. 58(1), 83–99 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  15. Fazzinga, B., Flesca, S., Pugliese, A.: Retrieving XML data from heterogeneous sources through vague querying. ACM Trans. on Internet Technology 9(2) (2009)

    Google Scholar 

  16. Fuhr, N., Grojohann, K.: XIRQL: An XML query language based on information retrieval concepts. ACM Trans. on Information Systems 22(2), 313–356 (2004)

    Article  Google Scholar 

  17. Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Integrating XML data sources using approximate joins. ACM Trans. on Database Systems 31(1), 161–207 (2006)

    Article  Google Scholar 

  18. Halevy, A.Y., Ives, Z.G., Madhavan, J., Mork, P., Suciu, D., Tatarinov, I.: The Piazza Peer Data Management System. IEEE Trans. on Knowledge and Data Engineering 16(7) (2004)

    Google Scholar 

  19. Leitão, L., Calado, P., Weis, M.: Structure-based inference of xml similarity for fuzzy duplicate detection. In: Proc. Int. Conf. on Information and Knowledge Management, pp. 293–302 (2007)

    Google Scholar 

  20. Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with Cupid. In: Proc. Int. Conf. on Very Large Databases, pp. 49–58 (2001)

    Google Scholar 

  21. Mandreoli, F., Martoglia, R., Tiberio, P.: Approximate query answering for a heterogeneous XML document base. In: Proc. Int. Conf. on Web Information Systems Engineering, pp. 337–351 (2004)

    Google Scholar 

  22. Manolescu, I., Florescu, D., Kossmann, D.: Answering XML queries on heterogeneous data sources. In: Proc. Int. Conf. on Very Large Databases, pp. 241–250 (2001)

    Google Scholar 

  23. Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. Journal of the ACM 51(1), 2–45 (2004)

    Article  MathSciNet  Google Scholar 

  24. Milano, D., Scannapieco, M., Catarci, T.: Structure-aware XML Object Identification. IEEE Data Eng. Bull. 29(2), 67–74 (2006)

    Google Scholar 

  25. http://www.napster.com

  26. Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A., Nilsson, M., Palmr, M., Risch, T.: EDUTELLA: A P2P networking infrastructure based on RDF. In: Proc. Int. World Wide Web Conf., pp. 604–615 (2002)

    Google Scholar 

  27. Pan, H.: Relevance Feedback in XML Retrieval. In: Proc. Int. Conf. on Extending Database Technology Workshops, pp. 187–196 (2004)

    Google Scholar 

  28. Pitoura, E., Abiteboul, S., Pfoser, D., Samaras, G., Vazirgiannis, M.: DBGlobe: A service-oriented P2P system for global computing. ACM SIGMOD Record 32(3), 77–82 (2003)

    Article  Google Scholar 

  29. Polyzotis, N., Garofalakis, M.N.: Xsketch synopses for xml data graphs. ACM Transaction on Database Systems 31(3), 1014–1063 (2006)

    Article  Google Scholar 

  30. Puhlmann, S., Weis, M., Naumann, F.: XML Duplicate Detection Using Sorted Neighborhoods. In: Proc. Int. Conf. on Extending Database Technology, pp. 773–791 (2006)

    Google Scholar 

  31. Ribeiro, L., Hrder, T.: Entity Identification in XML Documents. Grundlagen von Datenbanken, 130–134 (2006)

    Google Scholar 

  32. Rodriguez-Gianolli, P., Mylopoulos, J.: A semantic approach to XML-based data integration. In: Proc. Int. Conf. on Conceptual Modeling, pp. 117–132 (2001)

    Google Scholar 

  33. Schlieder, T.: Schema-driven evaluation of approximate tree-pattern queries. In: Proc. Int. Conf. on Extending Database Technology, pp. 514–532 (2002)

    Google Scholar 

  34. Tatarinov, I., Halevy, A.Y.: Efficient query reformulation in peer-data management systems. In: Proc. ACM SIGMOD Conf. on Management of Data (2004)

    Google Scholar 

  35. Theobald, A., Weikum, G.: Adding Relevance to XML. In: Proc. Int. Work. on the Web and Databases, pp. 35–40 (2000)

    Google Scholar 

  36. Vdovjak, R., Houben, G.: RDF-based architecture for semantic integration of heterogeneous information sources. In: Proc. Work. on Information Integration on the Web, pp. 51–57 (2001)

    Google Scholar 

  37. WordNet, http://wordnet.princeton.edu/

  38. The World Wide Web Consortium. Extensible Markup Language (XML), http://www.w3.org/XML

  39. The World Wide Web Consortium. XML Path Language, http://www.w3.org/TR/xpath

  40. Yu, C., Popa, L.: Constraint-based XML query rewriting for data integration. In: Proc. ACM SIGMOD Conf. on Management of Data, pp. 371–382 (2004)

    Google Scholar 

  41. Zhang, K., Stgatman, R., Shasha, D.: Simple fast algorithm for the editing distance between trees and related problems. SIAM J. on Computing 18(6), 1245–1262 (1989)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fazzinga, B. (2010). Exploiting Vague Queries to Collect Data from Heterogeneous XML Sources. In: Ma, Z., Yan, L. (eds) Soft Computing in XML Data Management. Studies in Fuzziness and Soft Computing, vol 255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14010-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14010-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14009-9

  • Online ISBN: 978-3-642-14010-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics