skip to main content
10.1145/1007568.1007581acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

FleXPath: flexible structure and full-text querying for XML

Published:13 June 2004Publication History

ABSTRACT

Querying XML data is a well-explored topic with powerful database-style query languages such as XPath and XQuery set to become W3C standards. An equally compelling paradigm for querying XML documents is full-text search on textual content. In this paper, we study fundamental challenges that arise when we try to integrate these two querying paradigms.While keyword search is based on approximate matching, XPath has exact match semantics. We address this mismatch by considering queries on structure as a "template", and looking for answers that best match this template and the full-text search. To achieve this, we provide an elegant definition of relaxation on structure and define primitive operators to span the space of relaxations. Query answering is now based on ranking potential answers on structural and full-text search conditions. We set out certain desirable principles for ranking schemes and propose natural ranking schemes that adhere to these principles. We develop efficient algorithms for answering top-K queries and discuss results from a comprehensive set of experiments that demonstrate the utility and scalability of the proposed framework and algorithms.

References

  1. S. Al-Khalifa et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.Google ScholarGoogle Scholar
  2. S. Amer-Yahia et al. TeXQuery: A Full-Text Search Extension to XQuery. In WWW 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Amer-Yahia et al. Tree pattern relaxation. In EDBT, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Böhm et al. Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM. VLDB Journal Vol.6 No.4, Springer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. M. Bremer and M. Gertz. XQuery/IR: Integrating XML Document and Data Retrieval. WebDB 2002.Google ScholarGoogle Scholar
  6. E. W. Brown. Fast Evaluation of Structured Queries for Information Retrieval. SIGIR 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Bruno et al. Top-K Selection Queries Over Relational Databases: Mapping Strategies and Performance Evaluation. ACM Transactions on Database Systems (TODS), 27(2), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. J. Carey and D. Kossmann. On Saying "Enough Already!" in SQL. In SIGMOD 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Carmel et al. Searching XML Documents via XML Fragments. In SIGIR 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Chen and Y. Ling. A Sampling-Based Estimator for Top-K Query. In ICDE 2002.Google ScholarGoogle Scholar
  11. T. T. Chinenyanga and N. Kushmerick. Expressive and Efficient Ranked Querying of XML Data. 4th International Workshop on the Web and Databases (WebDB). Santa Barbara, California, 2001.Google ScholarGoogle Scholar
  12. S. Cohen et al. XSEarch: A Semantic Search Engine for XML. In VLDB 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Cutler et al. Using the Structure of HTML Documents to Improve Retrieval. USENIX Symposium on Internet Technologies and Systems. California 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Damiani et al. The APPROXML Tool Demonstration. In EDBT 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Delobel and M. C. Rousset. A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. International Workshop on Foundations of Models for Information Integration (FMII-2001).Google ScholarGoogle Scholar
  16. S. Flesca et al. On the minimization of XPath queries. In VLDB 2003: 153--164 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Florescu et al. Integrating Keyword Search into XML Query Processing. In WWW 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Fuhr and K. Grossjohann. XIRQL: An Extension of XQL for Information Retrieval. ACM SIGIR Workshop on XML and Information Retrieval. Athens, Greece, 2000.Google ScholarGoogle Scholar
  19. N. Fuhr, T. Rlleke. HySpirit a Probabilistic Inference Engine for Hypermedia Re-trieval in Large Databases. 6th International Conference on Extending Database Technology (EDBT). Valencia, Spain, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Guo et al. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Hayashi et al. Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, 2000.Google ScholarGoogle Scholar
  22. V. Hristidis et al. PREFER: A system for the Efficient Execution Of Multiparametric Ranked Queries. In SIGMOD 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Kilpelainen. Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Finland, November 1992.Google ScholarGoogle Scholar
  24. G. Miklau and D. Suciu. Containment and Equivalence for an XPath Fragment. In PODS 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S.-H. Myaeng et al. A Flexible Model for Retrieval of SGML Documents. ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Naughton et al. The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.htmlGoogle ScholarGoogle Scholar
  27. N. Polyzotis et al. Selectivity Estimation for XML Twigs. ICDE 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Schmidt et al. Querying XML Documents Made Easy: Nearest Concept Queries. In ICDE 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Schlieder. Similarity Search in XML Data using Cost-Based Query Transformations. ACM SIGMOD 2001 Web and Databases Workshop. May, 2001. Santa Barbara, California.Google ScholarGoogle Scholar
  31. A. Theobald and G. Weikum. newblock Adding Relevance to XML newblock 3rd International Workshop on the Web and Databases. Dallas, Texas, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. FleXPath: flexible structure and full-text querying for XML

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
          June 2004
          988 pages
          ISBN:1581138598
          DOI:10.1145/1007568

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 June 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader