research-article

Semantic components enhance retrieval of domain-specific documents

Authors:

Susan L. Price,

Marianne Lykke Nielsen,

Lois M. L. Delcambre,

Peter VedstedAuthors Info & Claims

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 429 - 438

https://doi.org/10.1145/1321440.1321502

Published: 06 November 2007 Publication History

Abstract

We seek to leverage knowledge about information organization in a domain to effectively and efficiently meet targeted information needs of expert users. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. Semantic component instances are segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. This paper describes the semantic components model and presents experimental evidence from a large interactive searching study showing that semantic components, used to supplement full text and keyword indexing and to extend the query language, enhanced the retrieval of domain-specific documents in response to realistic queries posed by real users.

References

[1]

Dillon, A. Reader's models of text structures: the case of academic articles. International Journal of Man-Machine Studies, 35(913--925) 1991.

Digital Library

[2]

Bishop, A. Document structure and digital libraries: How researchers mobilize information in journal articles. Information Processing & Management, 35, pp. 225--279, 1999.

Digital Library

[3]

Price, S. L., Delcambre, L. M., Nielsen, M. L., Tolle, T., Luk, V. and Weaver, M. Using Semantic Components to Facilitate Access to Domain-Specific Documents in Government Settings. In The 7th Annual International Conference on Digital Government Research (dg.o), San Diego, California, May 21-24, 2006.

Digital Library

[4]

Price, S. L., Delcambre, L. M. and Nielsen, M. L. Using semantic components to express clinical questions against document collections. In International Workshop on Healthcare Information and Knowledge Management (HIKM 2006), Arlington, VA, November 11, 2006.

Digital Library

[5]

URL: http://www.sundhed.dk, Accessed: August 1, 2007.

[6]

Crowston, K. and Kwasnik, B. H. Can document-genre metadata improve information access to large digital collections? Library Trends, 52(2)2003.

[7]

Rauber, A. and Müller-Kögler, A. Integrating automatic genre analysis into digital libraries. In Proceedings of the 1st ACM/IEEE-CS joint conference on digital libraries (JCDL '01), Roanoke, VA, 2001.

Digital Library

[8]

Freund, L., Toms, E. G. and Clarke, C. L. Modeling task-genre relationships for IR in the workplace. In Proceedings of SIGIR '05, Salvador, Brazil, August 2005.

Digital Library

[9]

Freund, L., Toms, E. G. and Waterhouse, J. Modeling the information behaviour of software engineers using a work-task framework. In ASIS&T '05 Proceedings of the 68th Annual Meeting, Charlotte, NC, October 28-Nov2, 2005.

[10]

Turner, A. M., Liddy, E. D., Bradley, J. and Wheatley, J. A. Modeling public health interventions for improved access to the gray literature. J Med Libr Assoc, 93(4), pp. 487--494, 2005.

[11]

Liddy, E. D. The discourse-level structure of empirical abstracts: An exploratory study. Information Processing & Management, 27(1), pp. 55--81, 1991.

Digital Library

[12]

Teufel, S. and Moens, M. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), pp. 409--445, 2002.

Digital Library

[13]

Paice, C. D. and Jones, P. A. The identification of important concepts in highly structured technical papers. In Proceedings of SIGIR '93, pp. 69--78, Pittsburgh, PA, 1993.

Digital Library

[14]

Conrad, J. G. and Dabney, D. P. A cognitive approach to judicial opinion structure: Applying domain expertise to component analysis. In Proceedings of the 8th international conference on artificial intelligence and law, pp. 1--11, St. Louis, Missouri, 2001.

Digital Library

[15]

Purcell, G. P., Rennels, G. D. and Shortliffe, E. H. Development and evaluation of a context-based document representation for searching the medical literature. International Journal on Digital Libraries, 1, pp. 288--296, 1997.

Digital Library

[16]

Foskett, A. C. The subject approach to information. Library Association Publishing: London, 1996.

[17]

Hearst, M. A. Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), pp. 59--61, 2006.

Digital Library

[18]

Medical Subject Headings Home Page. URL: http://www.nlm.nih.gov/mesh/, Last updated: July 9, 2007, Accessed: August 1, 2007.

[19]

Krippendorff, K. Content analysis: An introduction to its methodology. Second ed. Sage Publications: Thousand Oaks, CA, 2004.

[20]

Hearst, M. and Plaunt, C. Subtopic structuring for full length document access. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59--68, Pittsburgh, PA, 1993.

Digital Library

[21]

Hearst, M. A. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), pp. 33--64, 1997.

Digital Library

[22]

Liu, X. and Croft, W. B. Passage Retrieval Based On Language Models. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM '02), pp. 375--382, McLean, VA, November 4-9, 2002.

Digital Library

[23]

Bowers, S., Delcambre, L. and Maier, D. Superimposed schematics: Introducing E-R structure for in-situ information selections. S. Spaccapietra, S. T. March and Y. Kambayashi, Editors, ER 2002, Lecture Notes in Computer Science 2503, Berlin: Springer-Verlag, pp. 90--104, 2002.

Digital Library

[24]

URL: http://www.ultraseek.com, Accessed: August 1, 2007.

[25]

URL: http://www.autonomy.com, Accessed: April 5, 2007.

[26]

Ultraseek FAQ: How does Ultraseek create scores for items in the search results? URL: http://www.ultraseek.com/support/faqs/1070.html, Accessed: August 1, 2007.

[27]

Borlund, P. The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Information Research, 8(3)2003.

[28]

Ely, J., Osheroff, J., Gorman, P., Ebell, M., Chambliss, M., Pifer, E. and Stavri, P. A taxonomy of generic clinical questions: classification study. BMJ, 321(7258), pp. 429--32, 2000.

[29]

Sormunen, E. Liberal relevance criteria of TREC - Counting on negligible documents? In Proceedings of ACM SIGIR '02, Tampere, Finland, August 11--15, 2002.

Digital Library

[30]

Hersh, W., Turpin, A., Price, S., Kraemer, D., Olson, D., Chan, B. and Sacherek, L. Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations. Information Processing & Management, 37, pp. 383--402, 2001.

Digital Library

[31]

Turpin, A. and Hersh, W. Why batch and user evaluations do not give the same results. In SIGIR '01, New Orleans, Louisiana, September 9-12, 2001.

Digital Library

[32]

Turpin, A. and Scholer, F. User performance versus precision measures for simple search tasks. In SIGIR '06, Seattle, Washington, August 6--11, 2006.

Digital Library

[33]

Järvelin, K. and Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), pp. 422--446, 2002.

Digital Library

[34]

Buckley, C. and Voorhees, E. M. Retrieval evaluation with incomplete information. In Proceedings of SIGIR '04, Sheffield, South Yorkshire, UK, July 25--29, 2004.

Digital Library

[35]

Buckley, C. and Voorhees, E. M. Evaluating evaluation measure stability. In Proceedings of SIGIR 2000, pp. 33--40, Athens, Greece.

Digital Library

[36]

Voorhees, E. M. Evaluation by highly relevant documents. In Proceedings of SIGIR '01, New Orleans, LA, September 9-12, 2001.

Digital Library

Cited By

Chen HDou ZMao J(2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-981-96-1024-2_2
Borlund P(2016)A study of the use of simulated work task situations in interactive information retrieval evaluationsJournal of Documentation10.1108/JD-06-2015-006872:3(394-413)Online publication date: 9-May-2016
https://doi.org/10.1108/JD-06-2015-0068
Baskaya FKeskustalo HJärvelin KHersh WCallan JMaarek YSanderson M(2012)Time drives interactionProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348301(105-114)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2348283.2348301
Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

November 2007

1048 pages

ISBN:9781595938039

DOI:10.1145/1321440

Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

semantic components

Qualifiers

Research-article

Conference

CIKM07

Sponsor:

CIKM07: Conference on Information and Knowledge Management

November 6 - 10, 2007

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
509
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen HDou ZMao J(2025)Session-Level Normalization and Click-Through Data Enhancement for Session-Based EvaluationBig Data10.1007/978-981-96-1024-2_2(15-33)Online publication date: 24-Jan-2025
https://doi.org/10.1007/978-981-96-1024-2_2
Borlund P(2016)A study of the use of simulated work task situations in interactive information retrieval evaluationsJournal of Documentation10.1108/JD-06-2015-006872:3(394-413)Online publication date: 9-May-2016
https://doi.org/10.1108/JD-06-2015-0068
Baskaya FKeskustalo HJärvelin KHersh WCallan JMaarek YSanderson M(2012)Time drives interactionProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348301(105-114)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2348283.2348301
Kanoulas ECarterette BClough PSanderson MMa WNie JBaeza-Yates RChua TCroft W(2011)Evaluating multi-query sessionsProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval10.1145/2009916.2010056(1053-1062)Online publication date: 24-Jul-2011
https://dl.acm.org/doi/10.1145/2009916.2010056
Yan XLau RSong DLi XMa J(2011)Toward a semantic granularity model for domain-specific information retrievalACM Transactions on Information Systems10.1145/1993036.199303929:3(1-46)Online publication date: 22-Jul-2011
https://dl.acm.org/doi/10.1145/1993036.1993039
Li MZheng HJiang YXia S(2011)PatentRankProceedings of the 18th international conference on Neural Information Processing - Volume Part II10.1007/978-3-642-24958-7_47(399-405)Online publication date: 13-Nov-2011
https://dl.acm.org/doi/10.1007/978-3-642-24958-7_47
Borlund PSchneider JBelkin NKelly D(2010)Reconsideration of the simulated work task situationProceedings of the third symposium on Information interaction in context10.1145/1840784.1840808(155-164)Online publication date: 18-Aug-2010
https://dl.acm.org/doi/10.1145/1840784.1840808
Wennerberg PBuitelaar PZillner S(2009)Deriving clinical query patterns from medical corpora using domain ontologiesProceedings of the Workshop on Biomedical Information Extraction10.5555/1859776.1859784(50-56)Online publication date: 18-Sep-2009
https://dl.acm.org/doi/10.5555/1859776.1859784
Price SLykke Nielsen MDelcambre LVedsted PSteinhauer J(2009)Using semantic components to search for domain-specific documentsInformation Systems10.5555/1595075.159591234:8(724-752)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1595075.1595912
Price SLykke Nielsen MDelcambre LVedsted PSteinhauer J(2009)Using semantic components to search for domain-specific documents: An evaluation from the system perspective and the user perspectiveInformation Systems10.1016/j.is.2009.04.00534:8(724-752)Online publication date: Dec-2009
https://doi.org/10.1016/j.is.2009.04.005
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten