skip to main content
10.1145/1458082.1458155acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A heuristic approach for checking containment of generalized tree-pattern queries

Published: 26 October 2008 Publication History

Abstract

Query processing techniques for XML data have focused mainly on tree-pattern queries (TPQs). However, the need for querying XML data sources whose structure is very complex or not fully known to the user, and the need to integrate multiple XML data sources with different structures have driven, recently, the suggestion of query languages that relax the complete specification of a tree pattern. In order to implement the processing of such languages in current DBMSs, their containment problem has to be efficiently solved.
In this paper, we consider a query language which generalizes TPQs by allowing the partial specification of a tree pattern. Partial tree-pattern queries (PTPQs) constitute a large fragment of XPath that flexibly permits the specification of a broad range of queries from keyword queries without structure, to queries with partial specification of the structure, to complete TPQs. We address the containment problem for PTPQs. This problem becomes more complex in the context of PTPQs because the partial specification of the structure allows new, non-trivial, structural expressions to be inferred from those explicitly specified in a query. We show that the containent problem cannot be characterized by homomorphisms between PTPQs, even when PTPQs are put in a canonical form that comprises all derived structural expressions. We provide necessary and sufficient conditions for this problem in terms of homomorphisms between PTPQs and (a possibly exponential number of) TPQs. To cope with the high complexity of PTPQ containment, we suggest a heuristic approach for this problem that trades accuracy for speed. An extensive experimental evaluation of our heuristic shows that our heuristic approach can be efficiently implemented in a query optimizer.

References

[1]
World Wide Web Consortium Site. (W3C), http://www.w3c.org
[2]
S. Amer-Yahia, S. Cho, L. V. S. Lakshmanan, and D. Srivastava. Minimization of Tree Pattern Queries. In SIGMOD, pages 497--508, 2001.
[3]
S. Amer-Yahia, S. Cho, and D. Srivastava. Tree Pattern Relaxation. In EDBT, 2002.
[4]
S. Amer-Yahia, L. V. S. Lakshmanan, and S. Pandit. Flexpath: Flexible structure and full-text querying for xml. In SIGMOD, pages 83--94, 2004.
[5]
M. Benedikt and I. Fundulaki. XML subtree queries: Specification and composition. In DBPL, 2005.
[6]
L. Chen and E. A. Rundensteiner. XQuery Containment in Presence of Variable Binding Dependencies. In WWW, 2005.
[7]
S. Cluet, P. Veltri, and D. Vodislav. Views in a large scale xml repository. In VLDB, 2001.
[8]
S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSearch: A Semantic Search Engine for XML. In Proc. of VLDB, 2003.
[9]
A. Deutsch and V. Tannen. Containment and integrity constraints for xpath. In KRDB, 2001.
[10]
X. Dong, A. Y. Halevy, and I. Tatarinov. Containment of Nested XML Queries. In Proc. of VLDB, 2004.
[11]
D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. Computer Networks, 33(1-6):119--135, 2000.
[12]
S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Approximate XML joins. In SIGMOD, pages 287--298, 2002.
[13]
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword Proximity Search on XML Graphs. In ICDE, pages 367--378, 2003.
[14]
Y. Kanza and Y. Sagiv. Flexible Queries Over Semistructured Data. In PODS, 2001.
[15]
L. V. Lakshmanan, H. W. Wang, and Z. J. Zhao. Answering Tree Pattern Queries Using Views. In VLDB, 2006.
[16]
L. V. S. Lakshmanan, G. Ramesh, H. W. Wang, and Z. J. Zhao. On Testing Satisfiability of Tree Pattern Queries. In VLDB, pages 120--130, 2004.
[17]
Y. Li, C. Yu, and H. V. Jagadish. Schema-Free XQuery. In VLDB, pages 72--83, 2004.
[18]
Z. Liu and Y. Chen. Identifying meaningful return information for XML keyword search. In SIGMOD, pages 329--340, 2007.
[19]
G. Miklau and D. Suciu. Containment and Equivalence for an XPath Fragment. In PODS, 2002.
[20]
F. Neven and T. Schwentick. XPath Containment in the Presence of Disjunction, DTDs, and Variables. In ICDT, pages 315--329, 2003.
[21]
D. Olteanu. Forward node-selecting queries over trees. ACM Trans. Database Syst., 32(1):3, 2007.
[22]
D. Olteanu, H. Meuss, T. Furche, and F. Bry. Xpath: Looking forward. In EDBT Workshops, 2002.
[23]
Y. Papakonstantinou and V. Vassalos. Query rewriting for semistructured data. In SIGMOD, 1999.
[24]
N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Approximate XML query answers. In SIGMOD, 2004.
[25]
P. Ramanan. Efficient Algorithms for Minimizing Tree Pattern Queries. In SIGMOD, pages 299--309, 2002.
[26]
A. Schmidt, M. L. Kersten, and M. Windhouwer. Querying XML Documents Made Easy: Nearest Concept Queries. In ICDE, pages 321--329, 2001.
[27]
D. Theodoratos, T. Dalamagas, A. Koufopoulos, and N. Gehani. Semantic Querying of Tree-Structured Data Sources Using Partially Specified Tree-Patterns. In CIKM, pages 712--719, 2005.
[28]
D. Theodoratos, T. Dalamagas, P. Placek, S. Souldatos, and T. Sellis. Containment of Partially Specified Tree-Pattern Queries. In SSDBM, 2006.
[29]
D. Theodoratos, S. Souldatos, T. Dalamagas, P. Placek, and T. Sellis. Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. In CIKM, pages 445--454, 2006.
[30]
D. Theodoratos and X. Wu. Assigning Semantics to Partial Tree-Pattern Queries. Data Knowl. Eng., 2007.
[31]
P. T. Wood. Minimising Simple XPath Expressions. In WebDB, pages 13--18, 2001.
[32]
P. T. Wood. Containment for XPath Fragments under DTD Constraints. In ICDE, pages 300--314, 2003.

Cited By

View all
  • (2013)A Survey of XML Tree PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2011.20925:1(29-46)Online publication date: 1-Jan-2013
  • (2011)An XPath Query Aggregation Algorithm Using a Region EncodingProceedings of the 2011 IEEE/IPSJ International Symposium on Applications and the Internet10.1109/SAINT.2011.14(27-36)Online publication date: 18-Jul-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. partial tree-pattern query
  2. query containment
  3. xml

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2013)A Survey of XML Tree PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2011.20925:1(29-46)Online publication date: 1-Jan-2013
  • (2011)An XPath Query Aggregation Algorithm Using a Region EncodingProceedings of the 2011 IEEE/IPSJ International Symposium on Applications and the Internet10.1109/SAINT.2011.14(27-36)Online publication date: 18-Jul-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media