skip to main content
10.1145/1321440.1321502acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Semantic components enhance retrieval of domain-specific documents

Published:06 November 2007Publication History

ABSTRACT

We seek to leverage knowledge about information organization in a domain to effectively and efficiently meet targeted information needs of expert users. The semantic components model represents document content in a manner that is complementary to full text and keyword indexing. Semantic component instances are segments of text about a particular aspect of the main topic of the document and may not correspond to structural elements in the document. This paper describes the semantic components model and presents experimental evidence from a large interactive searching study showing that semantic components, used to supplement full text and keyword indexing and to extend the query language, enhanced the retrieval of domain-specific documents in response to realistic queries posed by real users.

References

  1. Dillon, A. Reader's models of text structures: the case of academic articles. International Journal of Man-Machine Studies, 35(913--925) 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bishop, A. Document structure and digital libraries: How researchers mobilize information in journal articles. Information Processing & Management, 35, pp. 225--279, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Price, S. L., Delcambre, L. M., Nielsen, M. L., Tolle, T., Luk, V. and Weaver, M. Using Semantic Components to Facilitate Access to Domain-Specific Documents in Government Settings. In The 7th Annual International Conference on Digital Government Research (dg.o), San Diego, California, May 21-24, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Price, S. L., Delcambre, L. M. and Nielsen, M. L. Using semantic components to express clinical questions against document collections. In International Workshop on Healthcare Information and Knowledge Management (HIKM 2006), Arlington, VA, November 11, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. URL: http://www.sundhed.dk, Accessed: August 1, 2007.Google ScholarGoogle Scholar
  6. Crowston, K. and Kwasnik, B. H. Can document-genre metadata improve information access to large digital collections? Library Trends, 52(2)2003.Google ScholarGoogle Scholar
  7. Rauber, A. and Müller-Kögler, A. Integrating automatic genre analysis into digital libraries. In Proceedings of the 1st ACM/IEEE-CS joint conference on digital libraries (JCDL '01), Roanoke, VA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Freund, L., Toms, E. G. and Clarke, C. L. Modeling task-genre relationships for IR in the workplace. In Proceedings of SIGIR '05, Salvador, Brazil, August 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Freund, L., Toms, E. G. and Waterhouse, J. Modeling the information behaviour of software engineers using a work-task framework. In ASIS&T '05 Proceedings of the 68th Annual Meeting, Charlotte, NC, October 28-Nov2, 2005.Google ScholarGoogle Scholar
  10. Turner, A. M., Liddy, E. D., Bradley, J. and Wheatley, J. A. Modeling public health interventions for improved access to the gray literature. J Med Libr Assoc, 93(4), pp. 487--494, 2005.Google ScholarGoogle Scholar
  11. Liddy, E. D. The discourse-level structure of empirical abstracts: An exploratory study. Information Processing & Management, 27(1), pp. 55--81, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Teufel, S. and Moens, M. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), pp. 409--445, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Paice, C. D. and Jones, P. A. The identification of important concepts in highly structured technical papers. In Proceedings of SIGIR '93, pp. 69--78, Pittsburgh, PA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Conrad, J. G. and Dabney, D. P. A cognitive approach to judicial opinion structure: Applying domain expertise to component analysis. In Proceedings of the 8th international conference on artificial intelligence and law, pp. 1--11, St. Louis, Missouri, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Purcell, G. P., Rennels, G. D. and Shortliffe, E. H. Development and evaluation of a context-based document representation for searching the medical literature. International Journal on Digital Libraries, 1, pp. 288--296, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Foskett, A. C. The subject approach to information. Library Association Publishing: London, 1996.Google ScholarGoogle Scholar
  17. Hearst, M. A. Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), pp. 59--61, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Medical Subject Headings Home Page. URL: http://www.nlm.nih.gov/mesh/, Last updated: July 9, 2007, Accessed: August 1, 2007.Google ScholarGoogle Scholar
  19. Krippendorff, K. Content analysis: An introduction to its methodology. Second ed. Sage Publications: Thousand Oaks, CA, 2004.Google ScholarGoogle Scholar
  20. Hearst, M. and Plaunt, C. Subtopic structuring for full length document access. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59--68, Pittsburgh, PA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hearst, M. A. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), pp. 33--64, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liu, X. and Croft, W. B. Passage Retrieval Based On Language Models. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM '02), pp. 375--382, McLean, VA, November 4-9, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bowers, S., Delcambre, L. and Maier, D. Superimposed schematics: Introducing E-R structure for in-situ information selections. S. Spaccapietra, S. T. March and Y. Kambayashi, Editors, ER 2002, Lecture Notes in Computer Science 2503, Berlin: Springer-Verlag, pp. 90--104, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. URL: http://www.ultraseek.com, Accessed: August 1, 2007.Google ScholarGoogle Scholar
  25. URL: http://www.autonomy.com, Accessed: April 5, 2007.Google ScholarGoogle Scholar
  26. Ultraseek FAQ: How does Ultraseek create scores for items in the search results? URL: http://www.ultraseek.com/support/faqs/1070.html, Accessed: August 1, 2007.Google ScholarGoogle Scholar
  27. Borlund, P. The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Information Research, 8(3)2003.Google ScholarGoogle Scholar
  28. Ely, J., Osheroff, J., Gorman, P., Ebell, M., Chambliss, M., Pifer, E. and Stavri, P. A taxonomy of generic clinical questions: classification study. BMJ, 321(7258), pp. 429--32, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sormunen, E. Liberal relevance criteria of TREC - Counting on negligible documents? In Proceedings of ACM SIGIR '02, Tampere, Finland, August 11--15, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hersh, W., Turpin, A., Price, S., Kraemer, D., Olson, D., Chan, B. and Sacherek, L. Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations. Information Processing & Management, 37, pp. 383--402, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Turpin, A. and Hersh, W. Why batch and user evaluations do not give the same results. In SIGIR '01, New Orleans, Louisiana, September 9-12, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Turpin, A. and Scholer, F. User performance versus precision measures for simple search tasks. In SIGIR '06, Seattle, Washington, August 6--11, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Järvelin, K. and Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), pp. 422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Buckley, C. and Voorhees, E. M. Retrieval evaluation with incomplete information. In Proceedings of SIGIR '04, Sheffield, South Yorkshire, UK, July 25--29, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Buckley, C. and Voorhees, E. M. Evaluating evaluation measure stability. In Proceedings of SIGIR 2000, pp. 33--40, Athens, Greece. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Voorhees, E. M. Evaluation by highly relevant documents. In Proceedings of SIGIR '01, New Orleans, LA, September 9-12, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantic components enhance retrieval of domain-specific documents

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
              November 2007
              1048 pages
              ISBN:9781595938039
              DOI:10.1145/1321440

              Copyright © 2007 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 6 November 2007

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,861of8,427submissions,22%

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader