An Original Semantics to Keyword Queries for XML Using Structural Patterns

Theodoratos, Dimitri; Wu, Xiaoying

doi:10.1007/978-3-540-71703-4_61

Dimitri Theodoratos¹ &
Xiaoying Wu¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1425 Accesses
2 Citations

Abstract

XML is by now the de facto standard for exporting and exchanging data on the web. The need for querying XML data sources whose structure is not fully known to the user and the need to integrate multiple data sources with different tree structures have motivated recently the suggestion of keyword-based techniques for querying XML documents. The semantics adopted by these approaches aims at restricting the answers to meaningful ones. However, these approaches suffer from low precision, while recent ones with improved precision suffer from low recall.

In this paper, we introduce an original approach for assigning semantics to keyword queries for XML documents. We exploit index graphs (a structural summary of data) to extract tree patterns that return meaningful answers. In contrast to previous approaches that operate locally on the data to compute meaningful answers (usually by computing lowest common ancestors), our approach operates globally on index graphs to detect and exploit meaningful tree patterns. We implemented and experimentally evaluated our approach on DBLP-based data sets with irregularities. Its comparison to previous ones shows that it succeeds in finding all the meaningful answers when the others fail (perfect recall). Further, it outperforms approaches with similar recall in excluding meaningless answers (better precision). Since our approach is based on tree-pattern query evaluation, it can be easily implemented on top of an XQuery engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proc. of the Intl. Conf. on Data Engineering, p. 141 (2002)
Google Scholar
Amer-Yahia, S., Cho, S., Srivastava, D.: Tree Pattern Relaxation. In: Jensen, C.S., Jeffery, K.G., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 496. Springer, Heidelberg (2002)
Chapter Google Scholar
Barta, A., Consens, M.P., Mendelzon, A.O.: Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Methods. In: Proc. of the 31st Intl. Conf. on Very Large Data Bases, pp. 133–144 (2005)
Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 310–321 (2002)
Google Scholar
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: A Semantic Search Engine for XML. In: Proc. of the 29th Intl. Conf. on Very Large Data Bases (2003)
Google Scholar
Florescu, D., Kossmann, D., Manolescu, I.: Integrating keyword search into xml query processing. Computer Networks 33(1-6), 119–135 (2000)
Article Google Scholar
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proc. of the 23rd Intl. Conf. on Very large Databases, pp. 436–445 (1997)
Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 16–27 (2003)
Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)
Article Google Scholar
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of the 19th Intl. Conf. on Data Engineering, pp. 367–378 (2003)
Google Scholar
Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V.S., Nierman, A., Paparizos, S., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: Timber: A native XML database. VLDB Journal 11(4), 274–291 (2002)
Article MATH Google Scholar
Kanza, Y., Sagiv, Y.: Flexible Queries Over Semistructured Data. In: Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2001)
Google Scholar
Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering Indexes for Branching Path Queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data (2002)
Google Scholar
Li, Y., Yu, C., Jagadish, H.V.: Schema-Free Xquery. In: Proc. of the 30th Intl. Conf. on Very Large Data Bases, pp. 72–83 (2004)
Google Scholar
Milo, T.: Index Structures for Path Expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)
Chapter Google Scholar
Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML Documents Made Easy: Nearest Concept Queries. In: Proc. of the 17th Intl. Conf. on Data Engineering (2001)
Google Scholar
Theodoratos, D., Dalamagas, T., Koufopoulos, A., Gehani, N.: Semantic Querying of Tree-Structured Data Sources Using Partially Specified Tree-Patterns. In: Proc. of the 14th ACM Intl. Conf. on Information and Knowledge Management, pp. 712–719 (2005)
Google Scholar
Theodoratos, D., Souldatos, S., Dalamagas, T., Placek, P., Sellis, T.: Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. In: Proc. of the 15th ACM Intl. Conf. on Information and Knowledge Management (2006)
Google Scholar
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, New Jersey Institute of Technology, USA
Dimitri Theodoratos & Xiaoying Wu

Authors

Dimitri Theodoratos
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoying Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theodoratos, D., Wu, X. (2007). An Original Semantics to Keyword Queries for XML Using Structural Patterns. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_61

Download citation

DOI: https://doi.org/10.1007/978-3-540-71703-4_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71702-7
Online ISBN: 978-3-540-71703-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics