ABSTRACT
In some IR applications, it is desirable to adopt a high precision search strategy to return a small set of documents that are highly focused and relevant to the user's information need. With these applications in mind, we investigate semantic search using the XML Fragments query language on text corpora automatically pre-processed to encode semantic information useful for retrieval. We identify three XML Fragment operations that can be applied to a query to conceptualize, restrict, or relate terms in the query. We demonstrate how these operations can be used to address four different query-time semantic needs: to specify target information type, to disambiguate keywords, to specify search term context, or to relate select terms in the query. We demonstrate the effectiveness of our semantic search technology through a series of experiments using the two applications in which we embed this technology and show that it yields significant improvement in precision in the search results.
- D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. In Proc. 5th ANLP Conference, 1997. Google ScholarDigital Library
- A. Broder, Y. Maarek, M. Mandelbrod, and Y. Mass. Using XML to query XML -- from theory to practice. In Proceedings of RIAO, 2004.Google Scholar
- D. Carmel, Y. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML documents via XML fragments. In Proc. 26th SIGIR Conference, 2003. Google ScholarDigital Library
- S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In Proc. 29th VLDB Conference, 2003. Google ScholarDigital Library
- N. Fuhr and K. Grosjohann. XIRQL: A query language for information retrieval in XML documents. In Proc. 24th SIGIR Conference, 2001. Google ScholarDigital Library
- P. Grosso and D. Veillard. XML fragment interchange. W3C Candidate Recomendation 12 February 2001. http://www.w3.org/TR/xml-fragment.Google Scholar
- R. Guha, R. McCool, and E. Miller. Semantic search. In Proc. 12th WWW Conference, 2003. Google ScholarDigital Library
- J. Heflin and J. Hendler. Searching the web with SHOE. In AAAI Workshop on AI for Web Search, 2000.Google ScholarCross Ref
- B. Katz and J. Lin. Selectively using relations to improve precision in question answering. In Proc. EACL Workshop on NLP for QA, 2003.Google Scholar
- G. Kazai and M. Lalmas. INEX 2005 evaluation metrics. http://inex.is.informatik.uni-duisburg.de/2005/inex-2005-metricsv6.pdf.Google Scholar
- A. Levas, E. Brown, J. Murdock, and D. Ferrucci. The Semantic Analysis Workbench (SAW): Towards a framework for knowledge gathering and synthesis. In Proc. Int'l Conf. in Intelligence Analysis, 2005.Google Scholar
- R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, , and L. V. Subramaniam. Text analytics for life science using the unstructured information management architecture. IBM Systems Journal, 43(3), 2004. Google ScholarDigital Library
- R. Mihalcea and D. Moldovan. Semantic indexing using WordNet senses. In Proc. ACL Workshop on IR and NLP, 2000. Google ScholarDigital Library
- R. Mihalcea and D. Moldovan. Document indexing using named entities. Studies in Informatics and Control, 10(1), 2001.Google Scholar
- J. Prager, E. Brown, A. Coden, and D. Radev. Question-answering by predictive annotation. In Proc. 23rd SIGIR Conference, 2000. Google ScholarDigital Library
- J. Prager, J. Chu-Carroll, E. Brown, and K. Czuba. Question answering using predictive annoation. In Advances in Open-Domain Question Answering. Kluwer Academic Publishers, 2006.Google ScholarCross Ref
- M. Sanderson. Retrieving with good sense. Information Retrieval, 2(1), 2000. Google ScholarDigital Library
- A. Smeaton, R. O'Donnell, and F. Kelledy. Indexing structures derived from syntax in TREC-3: System description. In Proc. 3rd TREC, 1995.Google Scholar
- R. Srihari, W. Li, C. Nui, and T. Cornell. InfoXtract: A customizable intermediate level information extraction engine. Journal of Natural Language Engineering, 2006. Google ScholarDigital Library
- J. Tiedemann. Integrating linguistic knowledge in passage retrieval for question answering. In Proc. HLT/EMNLP Conference, 2005. Google ScholarDigital Library
- E. Voorhees. Using WordNet to disambiguate word sense for text retrieval. In Proc. SIGIR, 1993. Google ScholarDigital Library
- E. Voorhees and H. Dang. Overview of the TREC 2005 question answering track. In Proc. TREC, 2006.Google Scholar
- J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin. Learning subjective language. Computational Linguistics, 30(3), 2004. Google ScholarDigital Library
- H. Yu and V. Hatzivassilogou. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proc. EMNLP Conference, 2003. Google ScholarDigital Library
Index Terms
- Semantic search via XML fragments: a high-precision approach to IR
Recommendations
Enhancing semantic search using case-based modular ontology
SAC '10: Proceedings of the 2010 ACM Symposium on Applied ComputingIn this paper, we present a semantic search approach based on Case-based modular Ontology. Our work aims to improve ontology-based information retrieval by the integration of the traditional information retrieval, the use of ontology and the case based ...
Using BM25F for semantic search
SEMSEARCH '10: Proceedings of the 3rd International Semantic Search WorkshopInformation Retrieval (IR) approaches for semantic web search engines have become very populars in the last years. Popularization of different IR libraries, like Lucene, that allows IR implementations almost out-of-the-box have make easier IR ...
Sound and complete relevance assessment for XML retrieval
In information retrieval research, comparing retrieval approaches requires test collections consisting of documents, user requests and relevance assessments. Obtaining relevance assessments that are as sound and complete as possible is crucial for the ...
Comments