skip to main content
10.1145/1031171.1031247acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Processing content-oriented XPath queries

Published: 13 November 2004 Publication History

Abstract

Document-centric XML collections contain text-rich documents, marked up with XML tags that add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggests a content-oriented (IR) approach, while the mark-up allows users to add structural constraints to their IR queries. Hybrid queries tend to be more expressive, which should lead---in principle---to better retrieval performance. In practice, the processing of these hybrid queries within an IR systems turns out to be far from trivial, because a delicate balance between structural and content information needs to be sought. We propose an approach to processing such hybrid content-and-structure queries that decomposes a query into multiple content-only queries whose results are then combined in ways determined by the structural constraints of the original query. We evaluate our methods using the INEX 2003 test-suite, and show (1) that effective ways of processing of content-oriented XPath queries are non-trivial, (2) that there are differences in the effectiveness for different topics types, but (3) that with appropriate processing methods retrieval effectiveness can improve.

References

[1]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.
[2]
N. J. Belkin, R. N. Oddy, and H. M. Brooks. ASK for Information Retrieval: Part I. Background and Theory. Journal of Documentation, 38(2):61--71, 1982.
[3]
D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML documents via XML fragments. In Proceedings of the 26th Annual International ACM SIGIR Conference, pages 151--158. ACM Press, 2003.
[4]
N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information. In Proceedings of the 24th Annual International ACM SIGIR Conference, pages 250--257. ACM Press, 2001.
[5]
N. Fuhr, M. Lalmas, and S. Malik, editors. INEX 2003 Workshop Proceedings, 2004.
[6]
T. Grust. Accelerating XPath Location Steps. In Proc. SIGMOD, pages 109--120. ACM Press, 2002.
[7]
D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, University of Twente, 2001.
[8]
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In E. Voorhees and D. Harman, editors, The Seventh Text REtrieval Conference (TREC-7), pages 227--238. National Institute for Standards and Technology. NIST Special Publication 500--242, 1999.
[9]
INitiative for the Evaluation of XML Retrieval, 2003. http://inex.is.informatik.uni-duisburg.de:2003/.
[10]
J. Kamps, M. de Rijke, and B. Sigurbjörnsson. Length normalization in XML retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference, pages 80--87, 2004.
[11]
J. Kamps, M. Marx, M. de Rijke, and B. Sigurbjörnsson. Best-match querying from document-centric XML. In S. Amer-Yahia and L. Gravano, editors, Proceedings Seventh International Workshop on the Web and Databases (WebDB 2004), pages 55--60, 2004.
[12]
G. Kazai, M. Lalmas, and B. Piwowarski. INEX'03 Relevance Assessment Guide. In INEX 2003 Workshop Proceedings, pages 204--209, 2004.
[13]
M. Lalmas and T. Rölleke. Modelling Vague Content and Structure Querying in XML Retrieval with a Probabilistic Object-Relational Framework. In Proceedings of the 6th International Conference on Flexible Query Answering Systems, FQAS 2004, volume 3055 of Lecture Notes in Computer Science, pages 432--445. Springer, 2004.
[14]
S. Liu, Q. Zou, and W. W. Chu. Configurable indexing and ranking for XML information retrieval. In Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 88--95. ACM Press, 2004.
[15]
G. Navarro and R. Baeza-Yates. A language for queries on structure and contents of textual databases. In Proceedings of the 18th Annual International ACM SIGIR Conference, pages 93--101, 1995.
[16]
R. A. O'Keefe and A. Trotman. The Simplest Query Language That Could Possibly Work. In INEX 2003 Workshop Proceedings, pages 167--174, 2004.
[17]
B. Piwowarski and P. Gallinari. An algebra for probabilistic xml retrieval. In Proceedings of the first Twente Data Management Workshop on XML Databasesand Information Retrieval, pages 59--66, 2004.
[18]
T. Schlieder and H. Meuss. Querying and ranking XML documents. Journal of the American Society for Information Science and Technology, 53:489--503, 2002.
[19]
B. Sigurbjörnsson, J. Kamps, and M. de Rijke. An element-based approch to XML retrieval. In INEX 2003 Workshop Proceedings, pages 19--26, 2004.
[20]
B. Sigurbjörnsson and A. Trotman. Queries, INEX 2003 working group report. In INEX 2003 Workshop Proceedings, pages 167--170, 2004.
[21]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference, pages 21--29. ACM Press, 1996.
[22]
A. Trotman. Searching structured documents. Information Processing and Management, 40:619--632, 2004.
[23]
R. Wilkinson. Effective retrieval of structured documents. In Proceedings of the 17th ACM SIGIR Conference, pages 311--317, 1994.
[24]
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes. Morgan Kaufmann, 1999.
[25]
XML Path Language (XPath), 1999. http://www.w3.org/TR/xpath.
[26]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual ACM SIGIR Conference, pages 334--342, 2001.

Cited By

View all
  • (2013)Visual Evaluation of XPath QueriesProceedings of the 2013 International Conference on Computational and Information Sciences10.1109/ICCIS.2013.121(434-437)Online publication date: 21-Jun-2013
  • (2012)VXPathProceedings of the 2012 Fourth International Conference on Computational and Information Sciences10.1109/ICCIS.2012.362(361-364)Online publication date: 17-Aug-2012
  • (2005)Structured queries in XML retrievalProceedings of the 14th ACM international conference on Information and knowledge management10.1145/1099554.1099559(4-11)Online publication date: 31-Oct-2005
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
November 2004
678 pages
ISBN:1581138741
DOI:10.1145/1031171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML retrieval
  2. XPath
  3. content and structure

Qualifiers

  • Article

Conference

CIKM04
Sponsor:
CIKM04: Conference on Information and Knowledge Management
November 8 - 13, 2004
D.C., Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2013)Visual Evaluation of XPath QueriesProceedings of the 2013 International Conference on Computational and Information Sciences10.1109/ICCIS.2013.121(434-437)Online publication date: 21-Jun-2013
  • (2012)VXPathProceedings of the 2012 Fourth International Conference on Computational and Information Sciences10.1109/ICCIS.2012.362(361-364)Online publication date: 17-Aug-2012
  • (2005)Structured queries in XML retrievalProceedings of the 14th ACM international conference on Information and knowledge management10.1145/1099554.1099559(4-11)Online publication date: 31-Oct-2005
  • (2005)Hierarchical Language Models for XML Component RetrievalAdvances in XML Information Retrieval10.1007/11424550_18(224-237)Online publication date: 2005
  • (2004)Mixture models, overlap, and structural hints in XML element retrievalProceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval10.1007/11424550_16(196-210)Online publication date: 6-Dec-2004

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media