Skip to main content

An Original Semantics to Keyword Queries for XML Using Structural Patterns

  • Conference paper
Advances in Databases: Concepts, Systems and Applications (DASFAA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

Abstract

XML is by now the de facto standard for exporting and exchanging data on the web. The need for querying XML data sources whose structure is not fully known to the user and the need to integrate multiple data sources with different tree structures have motivated recently the suggestion of keyword-based techniques for querying XML documents. The semantics adopted by these approaches aims at restricting the answers to meaningful ones. However, these approaches suffer from low precision, while recent ones with improved precision suffer from low recall.

In this paper, we introduce an original approach for assigning semantics to keyword queries for XML documents. We exploit index graphs (a structural summary of data) to extract tree patterns that return meaningful answers. In contrast to previous approaches that operate locally on the data to compute meaningful answers (usually by computing lowest common ancestors), our approach operates globally on index graphs to detect and exploit meaningful tree patterns. We implemented and experimentally evaluated our approach on DBLP-based data sets with irregularities. Its comparison to previous ones shows that it succeeds in finding all the meaningful answers when the others fail (perfect recall). Further, it outperforms approaches with similar recall in excluding meaningless answers (better precision). Since our approach is based on tree-pattern query evaluation, it can be easily implemented on top of an XQuery engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proc. of the Intl. Conf. on Data Engineering, p. 141 (2002)

    Google Scholar 

  2. Amer-Yahia, S., Cho, S., Srivastava, D.: Tree Pattern Relaxation. In: Jensen, C.S., Jeffery, K.G., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 496. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Barta, A., Consens, M.P., Mendelzon, A.O.: Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Methods. In: Proc. of the 31st Intl. Conf. on Very Large Data Bases, pp. 133–144 (2005)

    Google Scholar 

  4. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 310–321 (2002)

    Google Scholar 

  5. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSearch: A Semantic Search Engine for XML. In: Proc. of the 29th Intl. Conf. on Very Large Data Bases (2003)

    Google Scholar 

  6. Florescu, D., Kossmann, D., Manolescu, I.: Integrating keyword search into xml query processing. Computer Networks 33(1-6), 119–135 (2000)

    Article  Google Scholar 

  7. Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proc. of the 23rd Intl. Conf. on Very large Databases, pp. 436–445 (1997)

    Google Scholar 

  8. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 16–27 (2003)

    Google Scholar 

  9. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)

    Article  Google Scholar 

  10. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: Proc. of the 19th Intl. Conf. on Data Engineering, pp. 367–378 (2003)

    Google Scholar 

  11. Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V.S., Nierman, A., Paparizos, S., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: Timber: A native XML database. VLDB Journal 11(4), 274–291 (2002)

    Article  MATH  Google Scholar 

  12. Kanza, Y., Sagiv, Y.: Flexible Queries Over Semistructured Data. In: Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2001)

    Google Scholar 

  13. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering Indexes for Branching Path Queries. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data (2002)

    Google Scholar 

  14. Li, Y., Yu, C., Jagadish, H.V.: Schema-Free Xquery. In: Proc. of the 30th Intl. Conf. on Very Large Data Bases, pp. 72–83 (2004)

    Google Scholar 

  15. Milo, T.: Index Structures for Path Expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  16. Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML Documents Made Easy: Nearest Concept Queries. In: Proc. of the 17th Intl. Conf. on Data Engineering (2001)

    Google Scholar 

  17. Theodoratos, D., Dalamagas, T., Koufopoulos, A., Gehani, N.: Semantic Querying of Tree-Structured Data Sources Using Partially Specified Tree-Patterns. In: Proc. of the 14th ACM Intl. Conf. on Information and Knowledge Management, pp. 712–719 (2005)

    Google Scholar 

  18. Theodoratos, D., Souldatos, S., Dalamagas, T., Placek, P., Sellis, T.: Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. In: Proc. of the 15th ACM Intl. Conf. on Information and Knowledge Management (2006)

    Google Scholar 

  19. Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: Proc. of the ACM SIGMOD Intl. Conf. on Management of Data (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Theodoratos, D., Wu, X. (2007). An Original Semantics to Keyword Queries for XML Using Structural Patterns. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71703-4_61

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71702-7

  • Online ISBN: 978-3-540-71703-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics