Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval

Wasim, Muhammad; Asim, Muhammad Nabeel; Ghani, Muhammad Usman; Rehman, Zahoor Ur; Rho, Seungmin; Mehmood, Irfan

doi:10.1007/s11042-018-6060-z

Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval

Published: 04 June 2018

Volume 78, pages 29681–29712, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Muhammad Wasim ORCID: orcid.org/0000-0001-9248-5540^1,2,
Muhammad Nabeel Asim²,
Muhammad Usman Ghani^1,2,
Zahoor Ur Rehman³,
Seungmin Rho⁴ &
…
Irfan Mehmood⁵

476 Accesses
4 Citations
Explore all metrics

Abstract

Term mismatch is a serious problem effecting the performance of information retrieval systems. The problem is more severe in biomedical domain where lot of term variations, abbreviations and synonyms exist. We present query paraphrasing and various term selection combination techniques to overcome this problem. To perform paraphrasing, we use noun words to generate synonyms from Metathesaurus. The new synthesized paraphrases are ranked using statistical information derived from the corpus and relevant documents are retrieved based on top n selected paraphrases. We compare the results with state-of-the-art pseudo relevance feedback based retrieval techniques. In quest of enhancing the results of pseudo relevance feedback approach, we introduce two term selection combination techniques namely Borda Count and Intersection. Surprisingly, combinational techniques performed worse than single term selection techniques. In pseudo relevance feedback approach best algorithms are IG, Rochio and KLD which are performing 33%, 30% and 20% better than other techniques respectively. However, the performance of paraphrasing technique is 20% better than pseudo relevance feedback approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Biomedical Data Retrieval Using Enhanced Query Expansion

Query Expansion Using Medical Subject Headings Terms in the Biomedical Documents

Notes

References

Abdulla AAA, Lin H, Bo X, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinformatics 17(7):238
Article Google Scholar
Asim MN, Rehman A, Shoaib U (2017) Accuracy based feature ranking metric for multi-label text classification. Int J Adv Comput Sci Appl 8(10):369–378
Google Scholar
Asim MN, Wasim M, Ali MS, Rehman A (2017) Comparison of feature selection methods in text classification on highly skewed datasets. In: 2017 First International conference on latest trends in electrical engineering and computing technologies (INTELLECT). IEEE, pp 1–8
Bannard C, Callison-Burch C (2005) Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 597–604
Barzilay R, Lee L (2003) Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1. Association for Computational Linguistics, pp 16–23
Barzilay R, McKeown KR (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 50–57
Bouadjenek MR, Verspoor K (2017) Multi-field query expansion is effective for biomedical dataset retrieval. Database, 2017
Brill E (1992) A simple rule-based part of speech tagger. In: Proceedings of the workshop on speech and natural language. Association for Computational Linguistics, pp 112–116
Callison-Burch C (2008) Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 196–205
Carpineto C, Romano G (1999) Towards more effective techniques for automatic query expansion. Res Adv Technol Digit Lib, 851–852
Claveau V (2012) Unsupervised and semi-supervised morphological analysis for information retrieval in the biomedical domain. In: COLING-24th International conference on computational linguistics
Cover TM, Thomas JA (1991) Entropy, relative entropy and mutual information. Elem Inf Theory 2: 1–55
Google Scholar
Fang H (2008) A re-examination of query expansion using lexical resources. In: Proceedings of ACL-08: HLT, pp 139–147
Gonzalo J, Verdejo F, Chugur I, Cigarran J (1998) Indexing with wordnet synsets can improve text retrieval. arXiv:cmp-lg/9808002
Harman DK (1995) The 3rd text retrieval conference (trec-3). NIST Special Publication, pp 500–225
Hiemstra D (2001) Using language models for information retrieval
Jinxi X, Bruce Croft W (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 4–11
Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42(1):155–165
Article Google Scholar
Lemur project. http://lemurproject.org/lemur/indriquerylanguage.php
Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on computational linguistics, vol 2 Association for Computational Linguistics, pp 768–774
Lytinen S, Tomuro N, Repede T (2000) The use of wordnet sense tagging in faqfinder. In: Proceedings of the AAAI00 Workshop on AI and Web Search
Majumder P, Mitra M, Chaudhuri BB (2002) N-gram: a language independent approach to ir and nlp. In: International conference on universal knowledge and language
Marton Y, Callison-Burch C, Resnik P (2009) Improved statistical machine translation using monolingually-derived paraphrases. In: 2009 Proceedings of the conference on empirical methods in natural language processing: vol 1. Association for Computational Linguistics, pp 381–390
Metamap a tool for recognizing umls concepts in text. http://metamap.nlm.nih.gov. Accessed 2015
Mihalcea R, Moldovan DI (1999) A method for word sense disambiguation of unrestricted text. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics. Association for Computational Linguistics, pp 152–158
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicograph 3(4):235–244
Article Google Scholar
Mitra M, Singhal A, Buckley C (1998) Improving automatic query expansion. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 206–214
Pérez-Agüera JR, Araujo L (2008) Comparing and combining methods for automatic query expansion. arXiv:0804.2057
Pubmed help. http://www.ncbi.nlm.nih.gov/books/nbk3827. Accessed 2015
Ramos J et al. (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242, pp 133–142
Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. arXiv:1606.07608
Salton G, Buckley C (1997) Improving retrieval performance by relevance feedback. Read Inf Retriev 24(5):355–363
Google Scholar
Salton G, McGill MJ (1983) Introduction to modern information Philadelphia, Pa. American Association for Artificial Intelligence Retrieval
Sanderson M (1994) Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag New York Inc., pp 142–151
Singh J, Sharan A (2015) Relevance feedback based query expansion model using borda count and semantic similarity approach. Comput Intell Neurosci 2015:96
Article Google Scholar
Van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval. J Document 33(2):106–119
Article Google Scholar
Whissell John S, Clarke Charles LA (2011) Improving document clustering using okapi bm25 feature weighting. Inf Retriev 14(5):466–487
Article Google Scholar
Xiong N, Vasilakos AV, Yang LT, Song L, Pan Y, Kannan R, Li Y (2009) Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J Sel Areas Commun 27(4):495–509
Article Google Scholar
Xiong N, Vasilakos AV, Yang LT, Wang C, Kannane R, Chang C, Pan Y (2010) A novel self-tuning feedback controller for active queue management supporting TCP flows. Inform Sci 180(11):2249–2263
Article MathSciNet Google Scholar
Xu J, Croft WB (2017) Quary expansion using local and global document analysis. In: ACM SIGIR Forum, vol 51. ACM, pp 168–175
Zhao M, Ohshima H, Tanaka K (2016) Paraphrasing sentential queries by incorporating coordinate relationship. J Inf Process 24(4):721–731
Google Scholar
Zhou Y, Zhang D, Xiong N (2017) Post-cloud computing paradigms: a survey and comparison. Tsinghua Sci Technol 22(6):714–732
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant NRF-2016R1D1A1A09919551.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Engineering and Technology, Lahore, Pakistan
Muhammad Wasim & Muhammad Usman Ghani
Al-Khawarizmi Institute of Computer Science, University of Engineering and Technology (UET), Lahore, Pakistan
Muhammad Wasim, Muhammad Nabeel Asim & Muhammad Usman Ghani
Department of Computer Science Institute of Information Technology, Attock, Pakistan
Zahoor Ur Rehman
Department of Media Software, Sungkyul University, Anyang, Korea
Seungmin Rho
Department of Software, Sejong University, Seoul, Korea
Irfan Mehmood

Authors

Muhammad Wasim
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Nabeel Asim
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Usman Ghani
View author publications
You can also search for this author in PubMed Google Scholar
Zahoor Ur Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Seungmin Rho
View author publications
You can also search for this author in PubMed Google Scholar
Irfan Mehmood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irfan Mehmood.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wasim, M., Asim, M.N., Ghani, M.U. et al. Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval. Multimed Tools Appl 78, 29681–29712 (2019). https://doi.org/10.1007/s11042-018-6060-z

Download citation

Received: 15 January 2018
Revised: 28 March 2018
Accepted: 24 April 2018
Published: 04 June 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11042-018-6060-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval

Abstract

Access this article

Similar content being viewed by others

Biomedical Data Retrieval Using Enhanced Query Expansion

Biomedical Data Retrieval Using Enhanced Query Expansion

Query Expansion Using Medical Subject Headings Terms in the Biomedical Documents

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval

Abstract

Access this article

Similar content being viewed by others

Biomedical Data Retrieval Using Enhanced Query Expansion

Biomedical Data Retrieval Using Enhanced Query Expansion

Query Expansion Using Medical Subject Headings Terms in the Biomedical Documents

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation