Skip to main content

Advertisement

Log in

Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Term mismatch is a serious problem effecting the performance of information retrieval systems. The problem is more severe in biomedical domain where lot of term variations, abbreviations and synonyms exist. We present query paraphrasing and various term selection combination techniques to overcome this problem. To perform paraphrasing, we use noun words to generate synonyms from Metathesaurus. The new synthesized paraphrases are ranked using statistical information derived from the corpus and relevant documents are retrieved based on top n selected paraphrases. We compare the results with state-of-the-art pseudo relevance feedback based retrieval techniques. In quest of enhancing the results of pseudo relevance feedback approach, we introduce two term selection combination techniques namely Borda Count and Intersection. Surprisingly, combinational techniques performed worse than single term selection techniques. In pseudo relevance feedback approach best algorithms are IG, Rochio and KLD which are performing 33%, 30% and 20% better than other techniques respectively. However, the performance of paraphrasing technique is 20% better than pseudo relevance feedback approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://github.com/wasimbhalli/DSL-BiomedicalQueryExpansion/blob/master/resources/corpus-motivational-example

  2. https://github.com/darthcodus/Lucene-TREC-OHSUMED

References

  1. Abdulla AAA, Lin H, Bo X, Banbhrani SK (2016) Improving biomedical information retrieval by linear combinations of different query expansion techniques. BMC Bioinformatics 17(7):238

    Article  Google Scholar 

  2. Asim MN, Rehman A, Shoaib U (2017) Accuracy based feature ranking metric for multi-label text classification. Int J Adv Comput Sci Appl 8(10):369–378

    Google Scholar 

  3. Asim MN, Wasim M, Ali MS, Rehman A (2017) Comparison of feature selection methods in text classification on highly skewed datasets. In: 2017 First International conference on latest trends in electrical engineering and computing technologies (INTELLECT). IEEE, pp 1–8

  4. Bannard C, Callison-Burch C (2005) Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 597–604

  5. Barzilay R, Lee L (2003) Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1. Association for Computational Linguistics, pp 16–23

  6. Barzilay R, McKeown KR (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 50–57

  7. Bouadjenek MR, Verspoor K (2017) Multi-field query expansion is effective for biomedical dataset retrieval. Database, 2017

  8. Brill E (1992) A simple rule-based part of speech tagger. In: Proceedings of the workshop on speech and natural language. Association for Computational Linguistics, pp 112–116

  9. Callison-Burch C (2008) Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 196–205

  10. Carpineto C, Romano G (1999) Towards more effective techniques for automatic query expansion. Res Adv Technol Digit Lib, 851–852

  11. Claveau V (2012) Unsupervised and semi-supervised morphological analysis for information retrieval in the biomedical domain. In: COLING-24th International conference on computational linguistics

  12. Cover TM, Thomas JA (1991) Entropy, relative entropy and mutual information. Elem Inf Theory 2: 1–55

    Google Scholar 

  13. Fang H (2008) A re-examination of query expansion using lexical resources. In: Proceedings of ACL-08: HLT, pp 139–147

  14. Gonzalo J, Verdejo F, Chugur I, Cigarran J (1998) Indexing with wordnet synsets can improve text retrieval. arXiv:cmp-lg/9808002

  15. Harman DK (1995) The 3rd text retrieval conference (trec-3). NIST Special Publication, pp 500–225

  16. Hiemstra D (2001) Using language models for information retrieval

  17. Jinxi X, Bruce Croft W (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 4–11

  18. Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42(1):155–165

    Article  Google Scholar 

  19. Lemur project. http://lemurproject.org/lemur/indriquerylanguage.php

  20. Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on computational linguistics, vol 2 Association for Computational Linguistics, pp 768–774

  21. Lytinen S, Tomuro N, Repede T (2000) The use of wordnet sense tagging in faqfinder. In: Proceedings of the AAAI00 Workshop on AI and Web Search

  22. Majumder P, Mitra M, Chaudhuri BB (2002) N-gram: a language independent approach to ir and nlp. In: International conference on universal knowledge and language

  23. Marton Y, Callison-Burch C, Resnik P (2009) Improved statistical machine translation using monolingually-derived paraphrases. In: 2009 Proceedings of the conference on empirical methods in natural language processing: vol 1. Association for Computational Linguistics, pp 381–390

  24. Metamap a tool for recognizing umls concepts in text. http://metamap.nlm.nih.gov. Accessed 2015

  25. Mihalcea R, Moldovan DI (1999) A method for word sense disambiguation of unrestricted text. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics. Association for Computational Linguistics, pp 152–158

  26. Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicograph 3(4):235–244

    Article  Google Scholar 

  27. Mitra M, Singhal A, Buckley C (1998) Improving automatic query expansion. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 206–214

  28. Pérez-Agüera JR, Araujo L (2008) Comparing and combining methods for automatic query expansion. arXiv:0804.2057

  29. Pubmed help. http://www.ncbi.nlm.nih.gov/books/nbk3827. Accessed 2015

  30. Ramos J et al. (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242, pp 133–142

  31. Roy D, Paul D, Mitra M, Garain U (2016) Using word embeddings for automatic query expansion. arXiv:1606.07608

  32. Salton G, Buckley C (1997) Improving retrieval performance by relevance feedback. Read Inf Retriev 24(5):355–363

    Google Scholar 

  33. Salton G, McGill MJ (1983) Introduction to modern information Philadelphia, Pa. American Association for Artificial Intelligence Retrieval

  34. Sanderson M (1994) Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag New York Inc., pp 142–151

  35. Singh J, Sharan A (2015) Relevance feedback based query expansion model using borda count and semantic similarity approach. Comput Intell Neurosci 2015:96

    Article  Google Scholar 

  36. Van Rijsbergen CJ (1977) A theoretical basis for the use of co-occurrence data in information retrieval. J Document 33(2):106–119

    Article  Google Scholar 

  37. Whissell John S, Clarke Charles LA (2011) Improving document clustering using okapi bm25 feature weighting. Inf Retriev 14(5):466–487

    Article  Google Scholar 

  38. Xiong N, Vasilakos AV, Yang LT, Song L, Pan Y, Kannan R, Li Y (2009) Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems. IEEE J Sel Areas Commun 27(4):495–509

    Article  Google Scholar 

  39. Xiong N, Vasilakos AV, Yang LT, Wang C, Kannane R, Chang C, Pan Y (2010) A novel self-tuning feedback controller for active queue management supporting TCP flows. Inform Sci 180(11):2249–2263

    Article  MathSciNet  Google Scholar 

  40. Xu J, Croft WB (2017) Quary expansion using local and global document analysis. In: ACM SIGIR Forum, vol 51. ACM, pp 168–175

  41. Zhao M, Ohshima H, Tanaka K (2016) Paraphrasing sentential queries by incorporating coordinate relationship. J Inf Process 24(4):721–731

    Google Scholar 

  42. Zhou Y, Zhang D, Xiong N (2017) Post-cloud computing paradigms: a survey and comparison. Tsinghua Sci Technol 22(6):714–732

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant NRF-2016R1D1A1A09919551.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irfan Mehmood.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wasim, M., Asim, M.N., Ghani, M.U. et al. Lexical paraphrasing and pseudo relevance feedback for biomedical document retrieval. Multimed Tools Appl 78, 29681–29712 (2019). https://doi.org/10.1007/s11042-018-6060-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6060-z

Keywords

Navigation