Abstract
XML is by now the de facto standard for exporting and exchanging data on the web. The need for querying XML data sources whose structure is not fully known to the user and the need to integrate multiple data sources with different tree structures have motivated recently the suggestion of keyword-based techniques for querying XML documents. The semantics adopted by these approaches aims at restricting the answers to meaningful ones. However, these approaches suffer from low precision, while recent ones with improved precision suffer from low recall.
In this paper, we introduce an original approach for assigning semantics to keyword queries for XML documents. We exploit index graphs (a structural summary of data) to extract tree patterns that return meaningful answers. In contrast to previous approaches that operate locally on the data to compute meaningful answers (usually by computing lowest common ancestors), our approach operates globally on index graphs to detect and exploit meaningful tree patterns. We implemented and experimentally evaluated our approach on DBLP-based data sets with irregularities. Its comparison to previous ones shows that it succeeds in finding all the meaningful answers when the others fail (perfect recall). Further, it outperforms approaches with similar recall in excluding meaningless answers (better precision). Since our approach is based on tree-pattern query evaluation, it can be easily implemented on top of an XQuery engine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proc. of the Intl. Conf. on Data Engineering, p. 141 (2002)
Amer-Yahia, S., Cho, S., Srivastava, D.: Tree Pattern Relaxation. In: Jensen, C.S., Jeffery, K.G., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 496. Springer, Heidelberg (2002)
Barta, A., Consens, M.P., Mendelzon, A.O.: Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Methods. In: Proc. of the 31st Intl. Conf. on Very Large Data Bases, pp. 133–144 (2005)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 310–321 (2002)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: A Semantic Search Engine for XML. In: Proc. of the 29th Intl. Conf. on Very Large Data Bases (2003)
Florescu, D., Kossmann, D., Manolescu, I.: Integrating keyword search into xml query processing. Computer Networks 33(1-6), 119–135 (2000)
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proc. of the 23rd Intl. Conf. on Very large Databases, pp. 436–445 (1997)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 16–27 (2003)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of the 19th Intl. Conf. on Data Engineering, pp. 367–378 (2003)
Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V.S., Nierman, A., Paparizos, S., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: Timber: A native XML database. VLDB Journal 11(4), 274–291 (2002)
Kanza, Y., Sagiv, Y.: Flexible Queries Over Semistructured Data. In: Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2001)
Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering Indexes for Branching Path Queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data (2002)
Li, Y., Yu, C., Jagadish, H.V.: Schema-Free Xquery. In: Proc. of the 30th Intl. Conf. on Very Large Data Bases, pp. 72–83 (2004)
Milo, T.: Index Structures for Path Expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)
Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML Documents Made Easy: Nearest Concept Queries. In: Proc. of the 17th Intl. Conf. on Data Engineering (2001)
Theodoratos, D., Dalamagas, T., Koufopoulos, A., Gehani, N.: Semantic Querying of Tree-Structured Data Sources Using Partially Specified Tree-Patterns. In: Proc. of the 14th ACM Intl. Conf. on Information and Knowledge Management, pp. 712–719 (2005)
Theodoratos, D., Souldatos, S., Dalamagas, T., Placek, P., Sellis, T.: Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. In: Proc. of the 15th ACM Intl. Conf. on Information and Knowledge Management (2006)
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Theodoratos, D., Wu, X. (2007). An Original Semantics to Keyword Queries for XML Using Structural Patterns. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-71703-4_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71702-7
Online ISBN: 978-3-540-71703-4
eBook Packages: Computer ScienceComputer Science (R0)