ABSTRACT
We seek to leverage knowledge about information organization in a domain to effectively and efficiently meet targeted information needs of expert users. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. Semantic component instances are segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. This paper describes the semantic components model and presents experimental evidence from a large interactive searching study showing that semantic components, used to supplement full text and keyword indexing and to extend the query language, enhanced the retrieval of domain-specific documents in response to realistic queries posed by real users.
- Dillon, A. Reader's models of text structures: the case of academic articles. International Journal of Man-Machine Studies, 35(913--925) 1991. Google ScholarDigital Library
- Bishop, A. Document structure and digital libraries: How researchers mobilize information in journal articles. Information Processing & Management, 35, pp. 225--279, 1999. Google ScholarDigital Library
- Price, S. L., Delcambre, L. M., Nielsen, M. L., Tolle, T., Luk, V. and Weaver, M. Using Semantic Components to Facilitate Access to Domain-Specific Documents in Government Settings. In The 7th Annual International Conference on Digital Government Research (dg.o), San Diego, California, May 21-24, 2006. Google ScholarDigital Library
- Price, S. L., Delcambre, L. M. and Nielsen, M. L. Using semantic components to express clinical questions against document collections. In International Workshop on Healthcare Information and Knowledge Management (HIKM 2006), Arlington, VA, November 11, 2006. Google ScholarDigital Library
- URL: http://www.sundhed.dk, Accessed: August 1, 2007.Google Scholar
- Crowston, K. and Kwasnik, B. H. Can document-genre metadata improve information access to large digital collections? Library Trends, 52(2)2003.Google Scholar
- Rauber, A. and Müller-Kögler, A. Integrating automatic genre analysis into digital libraries. In Proceedings of the 1st ACM/IEEE-CS joint conference on digital libraries (JCDL '01), Roanoke, VA, 2001. Google ScholarDigital Library
- Freund, L., Toms, E. G. and Clarke, C. L. Modeling task-genre relationships for IR in the workplace. In Proceedings of SIGIR '05, Salvador, Brazil, August 2005. Google ScholarDigital Library
- Freund, L., Toms, E. G. and Waterhouse, J. Modeling the information behaviour of software engineers using a work-task framework. In ASIS&T '05 Proceedings of the 68th Annual Meeting, Charlotte, NC, October 28-Nov2, 2005.Google Scholar
- Turner, A. M., Liddy, E. D., Bradley, J. and Wheatley, J. A. Modeling public health interventions for improved access to the gray literature. J Med Libr Assoc, 93(4), pp. 487--494, 2005.Google Scholar
- Liddy, E. D. The discourse-level structure of empirical abstracts: An exploratory study. Information Processing & Management, 27(1), pp. 55--81, 1991. Google ScholarDigital Library
- Teufel, S. and Moens, M. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), pp. 409--445, 2002. Google ScholarDigital Library
- Paice, C. D. and Jones, P. A. The identification of important concepts in highly structured technical papers. In Proceedings of SIGIR '93, pp. 69--78, Pittsburgh, PA, 1993. Google ScholarDigital Library
- Conrad, J. G. and Dabney, D. P. A cognitive approach to judicial opinion structure: Applying domain expertise to component analysis. In Proceedings of the 8th international conference on artificial intelligence and law, pp. 1--11, St. Louis, Missouri, 2001. Google ScholarDigital Library
- Purcell, G. P., Rennels, G. D. and Shortliffe, E. H. Development and evaluation of a context-based document representation for searching the medical literature. International Journal on Digital Libraries, 1, pp. 288--296, 1997.Google ScholarDigital Library
- Foskett, A. C. The subject approach to information. Library Association Publishing: London, 1996.Google Scholar
- Hearst, M. A. Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), pp. 59--61, 2006. Google ScholarDigital Library
- Medical Subject Headings Home Page. URL: http://www.nlm.nih.gov/mesh/, Last updated: July 9, 2007, Accessed: August 1, 2007.Google Scholar
- Krippendorff, K. Content analysis: An introduction to its methodology. Second ed. Sage Publications: Thousand Oaks, CA, 2004.Google Scholar
- Hearst, M. and Plaunt, C. Subtopic structuring for full length document access. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59--68, Pittsburgh, PA, 1993. Google ScholarDigital Library
- Hearst, M. A. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), pp. 33--64, 1997. Google ScholarDigital Library
- Liu, X. and Croft, W. B. Passage Retrieval Based On Language Models. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM '02), pp. 375--382, McLean, VA, November 4-9, 2002. Google ScholarDigital Library
- Bowers, S., Delcambre, L. and Maier, D. Superimposed schematics: Introducing E-R structure for in-situ information selections. S. Spaccapietra, S. T. March and Y. Kambayashi, Editors, ER 2002, Lecture Notes in Computer Science 2503, Berlin: Springer-Verlag, pp. 90--104, 2002. Google ScholarDigital Library
- URL: http://www.ultraseek.com, Accessed: August 1, 2007.Google Scholar
- URL: http://www.autonomy.com, Accessed: April 5, 2007.Google Scholar
- Ultraseek FAQ: How does Ultraseek create scores for items in the search results? URL: http://www.ultraseek.com/support/faqs/1070.html, Accessed: August 1, 2007.Google Scholar
- Borlund, P. The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Information Research, 8(3)2003.Google Scholar
- Ely, J., Osheroff, J., Gorman, P., Ebell, M., Chambliss, M., Pifer, E. and Stavri, P. A taxonomy of generic clinical questions: classification study. BMJ, 321(7258), pp. 429--32, 2000.Google ScholarCross Ref
- Sormunen, E. Liberal relevance criteria of TREC - Counting on negligible documents? In Proceedings of ACM SIGIR '02, Tampere, Finland, August 11--15, 2002. Google ScholarDigital Library
- Hersh, W., Turpin, A., Price, S., Kraemer, D., Olson, D., Chan, B. and Sacherek, L. Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations. Information Processing & Management, 37, pp. 383--402, 2001. Google ScholarDigital Library
- Turpin, A. and Hersh, W. Why batch and user evaluations do not give the same results. In SIGIR '01, New Orleans, Louisiana, September 9-12, 2001. Google ScholarDigital Library
- Turpin, A. and Scholer, F. User performance versus precision measures for simple search tasks. In SIGIR '06, Seattle, Washington, August 6--11, 2006. Google ScholarDigital Library
- Järvelin, K. and Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), pp. 422--446, 2002. Google ScholarDigital Library
- Buckley, C. and Voorhees, E. M. Retrieval evaluation with incomplete information. In Proceedings of SIGIR '04, Sheffield, South Yorkshire, UK, July 25--29, 2004. Google ScholarDigital Library
- Buckley, C. and Voorhees, E. M. Evaluating evaluation measure stability. In Proceedings of SIGIR 2000, pp. 33--40, Athens, Greece. Google ScholarDigital Library
- Voorhees, E. M. Evaluation by highly relevant documents. In Proceedings of SIGIR '01, New Orleans, LA, September 9-12, 2001. Google ScholarDigital Library
Index Terms
- Semantic components enhance retrieval of domain-specific documents
Recommendations
Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspective
We seek to leverage an expert user's knowledge about how information is organized in a domain and how information is presented in typical documents within a particular domain-specific collection, to effectively and efficiently meet the expert's targeted ...
Using semantic components to express clinical questions against document collections
HIKM '06: Proceedings of the international workshop on Healthcare information and knowledge managementInability to find answers to clinical questions is a major obstacle to obtaining just-in-time information during patient encounters. We have introduced a new model for describing the content of documents in domain-specific collections, using document ...
Extracting and modeling the semantic information content of web documents to support semantic document retrieval
APCCM '09: Proceedings of the Sixth Asia-Pacific Conference on Conceptual Modeling - Volume 96Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically processed, retrieved and explored by computer applications. Existing ...
Comments