Skip to main content

Exploiting Semantic Tags in XML Retrieval

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6203))

Abstract

With the new semantically annotated Wikipedia XML corpus, we attempt to investigate the following two research questions. Do the structural constraints in CAS queries help in retrieving an XML document collection containing semantically rich tags? How to exploit the semantic tag information to improve the CO queries as most users prefer to express the simplest forms of queries? In this paper, we describe and analyze the work done on comparing CO and CAS queries over the document collection at INEX 2009 ad hoc track, and we propose a method to improve the effectiveness of CO queries by enriching the element content representations with semantic tags. Our results show that the approaches of enriching XML element representations with semantic tags are effective in improving the early precision, while on average precisions, strict interpretation of CAS queries are generally superior.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chu-Carroll, J., Prager, J., Czuba, K., Ferrucci, D., Duboue, P.: Semantic Search via XML Fragments: A High-Precision Approach to IR. In: SIGIR 2006 (2006)

    Google Scholar 

  2. Carmel, D., Maarek, Y.S., Mandelbrod, M., et al.: Searching XML documents via XML fragments. In: SIGIR 2003 (2003)

    Google Scholar 

  3. Trotman, A., Sigurbjörnsson, B.: Narrowed extended xPath I (NEXI). In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 16–40. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Lemur/Indri, http://www.lemurproject.org

  5. XQuery Full-Text, http://www.w3.org/TR/xpath-full-text-10/

  6. Trotman, A., Lalmas, M.: Why Structural Hints in Queries do not Help XML-Retrieval? In: SIGIR 2006 (2006)

    Google Scholar 

  7. Schenkel, R., Suchanek, F., Kasneci, G.: YAWN: A Semantically Annotated Wikipedia XML Corpus. In: BTW 2007 (2007)

    Google Scholar 

  8. Hiemstra, D.: Statistical Language Models for Intelligent XML Retrieval. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data. LNCS, vol. 2818, pp. 107–118. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Ogilvie, P., Callan, J.: Language Models and Structured Document Retrieval. In: INEX 2003 (2003)

    Google Scholar 

  10. Ogilvie, P., Callan, J.: Hierarchical Language Models for XML Component Retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Ogilvie, P., Callan, J.: Parameter Estimation for a Simple Hierarchical Generative Model for XML Retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 211–224. Springer, Heidelberg (2006)

    Google Scholar 

  12. Zhai, C.: Statistical Language Models for Information Retrieval: A Critical Review. Foundations and Trends in Information Retrieval 2(3) (2008)

    Google Scholar 

  13. Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: SIGIR 2001 (2001)

    Google Scholar 

  14. Zhai, C., Lafferty, J.: Two-Stage Language Models for Information Retrieval. In: SIGIR 2002 (2002)

    Google Scholar 

  15. Mei, Q., Zhang, D., Zhai, C.: A General Optimization Framework for Smoothing Language Models on Graph Structures. In: SIGIR 2008 (2008)

    Google Scholar 

  16. Wang, Q., Li, Q., Wang, S.: Preliminary Work on XML Retrieval. In: Pre-Proceedings of INEX 2007 (2007)

    Google Scholar 

  17. Pektova, D., Croft, W.B., Diao, Y.: Refining Keyword Queries for XML Retrieval by Combining Content and Structure. In: ECIR 2009 (2009)

    Google Scholar 

  18. Kim, J., Xue, X., Croft, W.B.: A Probabilistic Retrieval Model for Semistructured Data. In: ECIR 2009 (2009)

    Google Scholar 

  19. Bo, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML Keyword Search with Relevance Oriented Ranking. In: ICDE 2009 (2009)

    Google Scholar 

  20. Metzler, D., Novak, J., Cui, H., Reddy, S.: Building Enriched Document Representations using Aggregated Anchor Text. In: SIGIR 2009 (2009)

    Google Scholar 

  21. Kamps, J., Marx, M., de Rijke, M., Sigurbjörnsson, B.: Structured Queries in XML Retrieval. In: CIKM 2005 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Q., Li, Q., Wang, S., Du, X. (2010). Exploiting Semantic Tags in XML Retrieval. In: Geva, S., Kamps, J., Trotman, A. (eds) Focused Retrieval and Evaluation. INEX 2009. Lecture Notes in Computer Science, vol 6203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14556-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14556-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14555-1

  • Online ISBN: 978-3-642-14556-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics