Abstract
Term mismatch is a serious problem effecting the performance of information retrieval systems. The problem is more severe in biomedical domain where lot of term variations, abbreviations and synonyms exist. We present query paraphrasing and various term selection combination techniques to overcome this problem. To perform paraphrasing, we use noun words to generate synonyms from Metathesaurus. The new synthesized paraphrases are ranked using statistical information derived from the corpus and relevant documents are retrieved based on top n selected paraphrases. We compare the results with state-of-the-art pseudo relevance feedback based retrieval techniques. In quest of enhancing the results of pseudo relevance feedback approach, we introduce two term selection combination techniques namely Borda Count and Intersection. Surprisingly, combinational techniques performed worse than single term selection techniques. In pseudo relevance feedback approach best algorithms are IG, Rochio and KLD which are performing 33%, 30% and 20% better than other techniques respectively. However, the performance of paraphrasing technique is 20% better than pseudo relevance feedback approach.
Similar content being viewed by others
References
Abdulla AAA, Lin H, Bo X, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinformatics 17(7):238
Asim MN, Rehman A, Shoaib U (2017) Accuracy based feature ranking metric for multi-label text classification. Int J Adv Comput Sci Appl 8(10):369–378
Asim MN, Wasim M, Ali MS, Rehman A (2017) Comparison of feature selection methods in text classification on highly skewed datasets. In: 2017 First International conference on latest trends in electrical engineering and computing technologies (INTELLECT). IEEE, pp 1–8
Bannard C, Callison-Burch C (2005) Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 597–604
Barzilay R, Lee L (2003) Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1. Association for Computational Linguistics, pp 16–23
Barzilay R, McKeown KR (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 50–57
Bouadjenek MR, Verspoor K (2017) Multi-field query expansion is effective for biomedical dataset retrieval. Database, 2017
Brill E (1992) A simple rule-based part of speech tagger. In: Proceedings of the workshop on speech and natural language. Association for Computational Linguistics, pp 112–116
Callison-Burch C (2008) Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 196–205
Carpineto C, Romano G (1999) Towards more effective techniques for automatic query expansion. Res Adv Technol Digit Lib, 851–852
Claveau V (2012) Unsupervised and semi-supervised morphological analysis for information retrieval in the biomedical domain. In: COLING-24th International conference on computational linguistics
Cover TM, Thomas JA (1991) Entropy, relative entropy and mutual information. Elem Inf Theory 2: 1–55
Fang H (2008) A re-examination of query expansion using lexical resources. In: Proceedings of ACL-08: HLT, pp 139–147
Gonzalo J, Verdejo F, Chugur I, Cigarran J (1998) Indexing with wordnet synsets can improve text retrieval. arXiv:cmp-lg/9808002
Harman DK (1995) The 3rd text retrieval conference (trec-3). NIST Special Publication, pp 500–225
Hiemstra D (2001) Using language models for information retrieval
Jinxi X, Bruce Croft W (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 4–11
Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42(1):155–165
Lemur project. http://lemurproject.org/lemur/indriquerylanguage.php
Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on computational linguistics, vol 2 Association for Computational Linguistics, pp 768–774
Lytinen S, Tomuro N, Repede T (2000) The use of wordnet sense tagging in faqfinder. In: Proceedings of the AAAI00 Workshop on AI and Web Search
Majumder P, Mitra M, Chaudhuri BB (2002) N-gram: a language independent approach to ir and nlp. In: International conference on universal knowledge and language
Marton Y, Callison-Burch C, Resnik P (2009) Improved statistical machine translation using monolingually-derived paraphrases. In: 2009 Proceedings of the conference on empirical methods in natural language processing: vol 1. Association for Computational Linguistics, pp 381–390
Metamap a tool for recognizing umls concepts in text. http://metamap.nlm.nih.gov. Accessed 2015
Mihalcea R, Moldovan DI (1999) A method for word sense disambiguation of unrestricted text. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics. Association for Computational Linguistics, pp 152–158
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicograph 3(4):235–244
Mitra M, Singhal A, Buckley C (1998) Improving automatic query expansion. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 206–214
Pérez-Agüera JR, Araujo L (2008) Comparing and combining methods for automatic query expansion. arXiv:0804.2057
Pubmed help. http://www.ncbi.nlm.nih.gov/books/nbk3827. Accessed 2015
Ramos J et al. (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242, pp 133–142
Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. arXiv:1606.07608
Salton G, Buckley C (1997) Improving retrieval performance by relevance feedback. Read Inf Retriev 24(5):355–363
Salton G, McGill MJ (1983) Introduction to modern information Philadelphia, Pa. American Association for Artificial Intelligence Retrieval
Sanderson M (1994) Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag New York Inc., pp 142–151
Singh J, Sharan A (2015) Relevance feedback based query expansion model using borda count and semantic similarity approach. Comput Intell Neurosci 2015:96
Van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval. J Document 33(2):106–119
Whissell John S, Clarke Charles LA (2011) Improving document clustering using okapi bm25 feature weighting. Inf Retriev 14(5):466–487
Xiong N, Vasilakos AV, Yang LT, Song L, Pan Y, Kannan R, Li Y (2009) Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J Sel Areas Commun 27(4):495–509
Xiong N, Vasilakos AV, Yang LT, Wang C, Kannane R, Chang C, Pan Y (2010) A novel self-tuning feedback controller for active queue management supporting TCP flows. Inform Sci 180(11):2249–2263
Xu J, Croft WB (2017) Quary expansion using local and global document analysis. In: ACM SIGIR Forum, vol 51. ACM, pp 168–175
Zhao M, Ohshima H, Tanaka K (2016) Paraphrasing sentential queries by incorporating coordinate relationship. J Inf Process 24(4):721–731
Zhou Y, Zhang D, Xiong N (2017) Post-cloud computing paradigms: a survey and comparison. Tsinghua Sci Technol 22(6):714–732
Acknowledgments
This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant NRF-2016R1D1A1A09919551.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wasim, M., Asim, M.N., Ghani, M.U. et al. Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval. Multimed Tools Appl 78, 29681–29712 (2019). https://doi.org/10.1007/s11042-018-6060-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6060-z