Abstract
Technical terms play an important role of effective queries for many users to search scientific databases. However, authors of scientific literature often employ alternative expressions to represent the meanings of specific terms, in other words, Terminological Paraphrases (TPs) in the literature for certain reasons, which leads to producing relevant documents that are not captured by conventional terms above. In this paper, we propose an effective way to retrieve “de facto relevant documents” which only contain those TPs and cannot be searched by conventional models in an environment with only controlled vocabularies by adapting Predicate Argument Tuple (PAT). The experiment confirms that PAT-based document retrieval is an effective and promising method to discover those kinds of documents and to improve the recall of terminology-based scientific information access models.
Similar content being viewed by others
Notes
Google Scholar(http://scholar.google.com/), PubMed(http://www.ncbi.nlm.nih.gov/pubmed), Microsoft Academic Search (http://academic.research.microsoft.com/)
The controlled vocabulary of PubMed (http://www.ncbi.nlm.nih.gov/pubmed) is MeSH (Medical Sub-ject Headings). ACM (http://portal.acm.org) maintains CCS (Computing Classification System), and LC (http://catalog.loc.gov/) controls access to their content by LCSH (Library of Congress Subject Headings).
References
Abdou S, Ruck P, Savoy J (2005) Evaluation of stemming, query expansion and manual indexing approaches for the genomic task. In: The Fourtheenth Text REtrieval Conference Proceedings (TREC 2005), vol. 501, pp. 863–871
Aronson AR (1996) The effect of textual variation on concept based information retrieval. Proceedings a conference of the American Medical Informatics Association. pp. 373–377
Bacchin M, Melucci M (2005) Symbol-based query expansion experiments at TREC 2005 genomics track
Choi S-P, Song S, Jung H, Geierhos M, Myaeng S-H (2012) Scientific literature retrieval based on terminological paraphrases using predicate argument tuple. In: SoftTech 2012
Cohen J (1968) Weighed kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70(4):687–699
InfoTerm, “Terminology Standardization,” 2010. [Online]. Available: http://www.infoterm.info/standardization/index.php
Lavrenko V, Croft WB (2001) Relevance based language models. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New Orleans, Louisiana, United States, pp. 120–127
Lu Z, Kim W, Wilbur WJ (2009) Evaluation of query expansion using MeSH in PubMed. Inf Retr 12(1):69–80
Macdonald C, Ounis I (2007) Using relevance feedback in expert search. Proceedings of the 29th European conference on IR research. Springer-Verlag, Rome, Italy, pp. 431–443
Miyao Y, Tsujii J (2008) Feature forest models for probabilistic HPSG parsing. Comput Linguist 34(1):35–80
Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Melbourne, Australia, pp. 275–281
Srinivasan P (1996) Query expansion and MEDLINE. Inf Process Manag 32(4):431–443
Turtle H, Croft WB (1991) Evaluation of an inference network-based retrieval model. ACM Trans Inf Syst 9(3):187–222
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is the substantially extended version of the paper accepted and presented for SoftTech 2012 [4]. The extensions include additional experiments, analysis and details about proposed approaches.
Rights and permissions
About this article
Cite this article
Choi, SP., Shin, SH., Jung, H. et al. Finding hidden relevant documents buried in scientific documents by terminological paraphrases. Multimed Tools Appl 74, 8729–8743 (2015). https://doi.org/10.1007/s11042-013-1484-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1484-y