skip to main content
10.1145/1148170.1148247acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections

Semantic search via XML fragments: a high-precision approach to IR

Published: 06 August 2006 Publication History


In some IR applications, it is desirable to adopt a high precision search strategy to return a small set of documents that are highly focused and relevant to the user's information need. With these applications in mind, we investigate semantic search using the XML Fragments query language on text corpora automatically pre-processed to encode semantic information useful for retrieval. We identify three XML Fragment operations that can be applied to a query to conceptualize, restrict, or relate terms in the query. We demonstrate how these operations can be used to address four different query-time semantic needs: to specify target information type, to disambiguate keywords, to specify search term context, or to relate select terms in the query. We demonstrate the effectiveness of our semantic search technology through a series of experiments using the two applications in which we embed this technology and show that it yields significant improvement in precision in the search results.


D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. In Proc. 5th ANLP Conference, 1997.
A. Broder, Y. Maarek, M. Mandelbrod, and Y. Mass. Using XML to query XML -- from theory to practice. In Proceedings of RIAO, 2004.
D. Carmel, Y. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML documents via XML fragments. In Proc. 26th SIGIR Conference, 2003.
S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In Proc. 29th VLDB Conference, 2003.
N. Fuhr and K. Grosjohann. XIRQL: A query language for information retrieval in XML documents. In Proc. 24th SIGIR Conference, 2001.
P. Grosso and D. Veillard. XML fragment interchange. W3C Candidate Recomendation 12 February 2001.
R. Guha, R. McCool, and E. Miller. Semantic search. In Proc. 12th WWW Conference, 2003.
J. Heflin and J. Hendler. Searching the web with SHOE. In AAAI Workshop on AI for Web Search, 2000.
B. Katz and J. Lin. Selectively using relations to improve precision in question answering. In Proc. EACL Workshop on NLP for QA, 2003.
G. Kazai and M. Lalmas. INEX 2005 evaluation metrics.
A. Levas, E. Brown, J. Murdock, and D. Ferrucci. The Semantic Analysis Workbench (SAW): Towards a framework for knowledge gathering and synthesis. In Proc. Int'l Conf. in Intelligence Analysis, 2005.
R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, and L. V. Subramaniam. Text analytics for life science using the unstructured information management architecture. IBM Systems Journal, 43(3), 2004.
R. Mihalcea and D. Moldovan. Semantic indexing using WordNet senses. In Proc. ACL Workshop on IR and NLP, 2000.
R. Mihalcea and D. Moldovan. Document indexing using named entities. Studies in Informatics and Control, 10(1), 2001.
J. Prager, E. Brown, A. Coden, and D. Radev. Question-answering by predictive annotation. In Proc. 23rd SIGIR Conference, 2000.
J. Prager, J. Chu-Carroll, E. Brown, and K. Czuba. Question answering using predictive annoation. In Advances in Open-Domain Question Answering. Kluwer Academic Publishers, 2006.
M. Sanderson. Retrieving with good sense. Information Retrieval, 2(1), 2000.
A. Smeaton, R. O'Donnell, and F. Kelledy. Indexing structures derived from syntax in TREC-3: System description. In Proc. 3rd TREC, 1995.
R. Srihari, W. Li, C. Nui, and T. Cornell. InfoXtract: A customizable intermediate level information extraction engine. Journal of Natural Language Engineering, 2006.
J. Tiedemann. Integrating linguistic knowledge in passage retrieval for question answering. In Proc. HLT/EMNLP Conference, 2005.
E. Voorhees. Using WordNet to disambiguate word sense for text retrieval. In Proc. SIGIR, 1993.
E. Voorhees and H. Dang. Overview of the TREC 2005 question answering track. In Proc. TREC, 2006.
J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin. Learning subjective language. Computational Linguistics, 30(3), 2004.
H. Yu and V. Hatzivassilogou. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proc. EMNLP Conference, 2003.

Cited By

View all
  • (2023)Olio: A Semantic Search Interface for Data RepositoriesProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606806(1-16)Online publication date: 29-Oct-2023
  • (2019)Bridging Text Visualization and Mining: A Task-Driven SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2018.283434125:7(2482-2504)Online publication date: 1-Jul-2019
  • (2018)Cost-effective conceptual design using taxonomiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0501-127:3(369-394)Online publication date: 1-Jun-2018
  • Show More Cited By

Index Terms

  1. Semantic search via XML fragments: a high-precision approach to IR



    Information & Contributors


    Published In

    cover image ACM Conferences
    SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
    August 2006
    768 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 August 2006


    Request permissions for this article.

    Check for updates

    Author Tags

    1. XML retrieval
    2. question answering
    3. semantic search


    • Article


    SIGIR06: The 29th Annual International SIGIR Conference
    August 6 - 11, 2006
    Washington, Seattle, USA

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2023)Olio: A Semantic Search Interface for Data RepositoriesProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606806(1-16)Online publication date: 29-Oct-2023
    • (2019)Bridging Text Visualization and Mining: A Task-Driven SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2018.283434125:7(2482-2504)Online publication date: 1-Jul-2019
    • (2018)Cost-effective conceptual design using taxonomiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0501-127:3(369-394)Online publication date: 1-Jun-2018
    • (2017)A Review of the State of the Art in Hindi Question Answering SystemsIntelligent Natural Language Processing: Trends and Applications10.1007/978-3-319-67056-0_14(265-292)Online publication date: 18-Nov-2017
    • (2015)Cost-Effective Conceptual Design for Information ExtractionACM Transactions on Database Systems10.1145/271632140:2(1-39)Online publication date: 30-Jun-2015
    • (2014)Which concepts are worth extracting?Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610496(779-790)Online publication date: 18-Jun-2014
    • (2014)New Dimensions in Semantic Knowledge ManagementTowards the Internet of Services: The THESEUS Research Program10.1007/978-3-319-06755-1_4(37-50)Online publication date: 2-Jul-2014
    • (2013)Semantic Search Engine and Object Database Guidelines for Service Oriented Architecture ModelsTechnology Diffusion and Adoption10.4018/978-1-4666-2791-8.ch015(225-250)Online publication date: 2013
    • (2013)Repeatable and reliable semantic search evaluationWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2013.05.00521(14-29)Online publication date: 1-Aug-2013
    • (2013)A Linked Science investigation: enhancing climate change data discovery with semantic technologiesEarth Science Informatics10.1007/s12145-013-0118-26:3(175-185)Online publication date: 21-Jun-2013
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media