skip to main content
10.1145/1321440.1321502acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Semantic components enhance retrieval of domain-specific documents

Published: 06 November 2007 Publication History

Abstract

We seek to leverage knowledge about information organization in a domain to effectively and efficiently meet targeted information needs of expert users. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. Semantic component instances are segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. This paper describes the semantic components model and presents experimental evidence from a large interactive searching study showing that semantic components, used to supplement full text and keyword indexing and to extend the query language, enhanced the retrieval of domain-specific documents in response to realistic queries posed by real users.

References

[1]
Dillon, A. Reader's models of text structures: the case of academic articles. International Journal of Man-Machine Studies, 35(913--925) 1991.
[2]
Bishop, A. Document structure and digital libraries: How researchers mobilize information in journal articles. Information Processing & Management, 35, pp. 225--279, 1999.
[3]
Price, S. L., Delcambre, L. M., Nielsen, M. L., Tolle, T., Luk, V. and Weaver, M. Using Semantic Components to Facilitate Access to Domain-Specific Documents in Government Settings. In The 7th Annual International Conference on Digital Government Research (dg.o), San Diego, California, May 21-24, 2006.
[4]
Price, S. L., Delcambre, L. M. and Nielsen, M. L. Using semantic components to express clinical questions against document collections. In International Workshop on Healthcare Information and Knowledge Management (HIKM 2006), Arlington, VA, November 11, 2006.
[5]
URL: http://www.sundhed.dk, Accessed: August 1, 2007.
[6]
Crowston, K. and Kwasnik, B. H. Can document-genre metadata improve information access to large digital collections? Library Trends, 52(2)2003.
[7]
Rauber, A. and Müller-Kögler, A. Integrating automatic genre analysis into digital libraries. In Proceedings of the 1st ACM/IEEE-CS joint conference on digital libraries (JCDL '01), Roanoke, VA, 2001.
[8]
Freund, L., Toms, E. G. and Clarke, C. L. Modeling task-genre relationships for IR in the workplace. In Proceedings of SIGIR '05, Salvador, Brazil, August 2005.
[9]
Freund, L., Toms, E. G. and Waterhouse, J. Modeling the information behaviour of software engineers using a work-task framework. In ASIS&T '05 Proceedings of the 68th Annual Meeting, Charlotte, NC, October 28-Nov2, 2005.
[10]
Turner, A. M., Liddy, E. D., Bradley, J. and Wheatley, J. A. Modeling public health interventions for improved access to the gray literature. J Med Libr Assoc, 93(4), pp. 487--494, 2005.
[11]
Liddy, E. D. The discourse-level structure of empirical abstracts: An exploratory study. Information Processing & Management, 27(1), pp. 55--81, 1991.
[12]
Teufel, S. and Moens, M. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), pp. 409--445, 2002.
[13]
Paice, C. D. and Jones, P. A. The identification of important concepts in highly structured technical papers. In Proceedings of SIGIR '93, pp. 69--78, Pittsburgh, PA, 1993.
[14]
Conrad, J. G. and Dabney, D. P. A cognitive approach to judicial opinion structure: Applying domain expertise to component analysis. In Proceedings of the 8th international conference on artificial intelligence and law, pp. 1--11, St. Louis, Missouri, 2001.
[15]
Purcell, G. P., Rennels, G. D. and Shortliffe, E. H. Development and evaluation of a context-based document representation for searching the medical literature. International Journal on Digital Libraries, 1, pp. 288--296, 1997.
[16]
Foskett, A. C. The subject approach to information. Library Association Publishing: London, 1996.
[17]
Hearst, M. A. Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), pp. 59--61, 2006.
[18]
Medical Subject Headings Home Page. URL: http://www.nlm.nih.gov/mesh/, Last updated: July 9, 2007, Accessed: August 1, 2007.
[19]
Krippendorff, K. Content analysis: An introduction to its methodology. Second ed. Sage Publications: Thousand Oaks, CA, 2004.
[20]
Hearst, M. and Plaunt, C. Subtopic structuring for full length document access. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59--68, Pittsburgh, PA, 1993.
[21]
Hearst, M. A. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), pp. 33--64, 1997.
[22]
Liu, X. and Croft, W. B. Passage Retrieval Based On Language Models. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM '02), pp. 375--382, McLean, VA, November 4-9, 2002.
[23]
Bowers, S., Delcambre, L. and Maier, D. Superimposed schematics: Introducing E-R structure for in-situ information selections. S. Spaccapietra, S. T. March and Y. Kambayashi, Editors, ER 2002, Lecture Notes in Computer Science 2503, Berlin: Springer-Verlag, pp. 90--104, 2002.
[24]
URL: http://www.ultraseek.com, Accessed: August 1, 2007.
[25]
URL: http://www.autonomy.com, Accessed: April 5, 2007.
[26]
Ultraseek FAQ: How does Ultraseek create scores for items in the search results? URL: http://www.ultraseek.com/support/faqs/1070.html, Accessed: August 1, 2007.
[27]
Borlund, P. The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Information Research, 8(3)2003.
[28]
Ely, J., Osheroff, J., Gorman, P., Ebell, M., Chambliss, M., Pifer, E. and Stavri, P. A taxonomy of generic clinical questions: classification study. BMJ, 321(7258), pp. 429--32, 2000.
[29]
Sormunen, E. Liberal relevance criteria of TREC - Counting on negligible documents? In Proceedings of ACM SIGIR '02, Tampere, Finland, August 11--15, 2002.
[30]
Hersh, W., Turpin, A., Price, S., Kraemer, D., Olson, D., Chan, B. and Sacherek, L. Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations. Information Processing & Management, 37, pp. 383--402, 2001.
[31]
Turpin, A. and Hersh, W. Why batch and user evaluations do not give the same results. In SIGIR '01, New Orleans, Louisiana, September 9-12, 2001.
[32]
Turpin, A. and Scholer, F. User performance versus precision measures for simple search tasks. In SIGIR '06, Seattle, Washington, August 6--11, 2006.
[33]
Järvelin, K. and Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), pp. 422--446, 2002.
[34]
Buckley, C. and Voorhees, E. M. Retrieval evaluation with incomplete information. In Proceedings of SIGIR '04, Sheffield, South Yorkshire, UK, July 25--29, 2004.
[35]
Buckley, C. and Voorhees, E. M. Evaluating evaluation measure stability. In Proceedings of SIGIR 2000, pp. 33--40, Athens, Greece.
[36]
Voorhees, E. M. Evaluation by highly relevant documents. In Proceedings of SIGIR '01, New Orleans, LA, September 9-12, 2001.

Cited By

View all
  • (2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
  • (2016)A study of the use of simulated work task situations in interactive information retrieval evaluationsJournal of Documentation10.1108/JD-06-2015-006872:3(394-413)Online publication date: 9-May-2016
  • (2012)Time drives interactionProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348301(105-114)Online publication date: 12-Aug-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. semantic components

Qualifiers

  • Research-article

Conference

CIKM07

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
  • (2016)A study of the use of simulated work task situations in interactive information retrieval evaluationsJournal of Documentation10.1108/JD-06-2015-006872:3(394-413)Online publication date: 9-May-2016
  • (2012)Time drives interactionProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348301(105-114)Online publication date: 12-Aug-2012
  • (2011)Evaluating multi-query sessionsProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval10.1145/2009916.2010056(1053-1062)Online publication date: 24-Jul-2011
  • (2011)Toward a semantic granularity model for domain-specific information retrievalACM Transactions on Information Systems10.1145/1993036.199303929:3(1-46)Online publication date: 22-Jul-2011
  • (2011)PatentRankProceedings of the 18th international conference on Neural Information Processing - Volume Part II10.1007/978-3-642-24958-7_47(399-405)Online publication date: 13-Nov-2011
  • (2010)Reconsideration of the simulated work task situationProceedings of the third symposium on Information interaction in context10.1145/1840784.1840808(155-164)Online publication date: 18-Aug-2010
  • (2009)Deriving clinical query patterns from medical corpora using domain ontologiesProceedings of the Workshop on Biomedical Information Extraction10.5555/1859776.1859784(50-56)Online publication date: 18-Sep-2009
  • (2009)Using semantic components to search for domain-specific documentsInformation Systems10.5555/1595075.159591234:8(724-752)Online publication date: 1-Dec-2009
  • (2009)Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspectiveInformation Systems10.1016/j.is.2009.04.00534:8(724-752)Online publication date: Dec-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media