Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study

Yin, Xiaoshi; Huang, Xiangji; Hu, Qinmin; Li, Zhoujun

doi:10.1007/978-3-642-01307-2_100

Xiaoshi Yin^23,25,
Xiangji Huang²³,
Qinmin Hu²⁴ &
…
Zhoujun Li²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3351 Accesses

Abstract

This paper presents an empirical study of the combination of content-based information retrieval results with linkage-based document importance scores to improve retrieval performance on TREC biomedical literature datasets. In our study, content-based information comes from the state-of-the-art probability model based Okapi information retrieval system. On the other hand, linkage-based information comes from a citation graph generated from REFERENCES sections of a biomedical literature dataset. Three well-known linkage-based ranking algorithms (PageRank, HITS and InDegree) are applied on the citation graph to calculate document importance scores. We use TREC 2007 Genomics dataset for evaluation, which contains 162,259 biomedical literatures. Our approach achieves the best document-based MAP among all results that have been reported so far. Our major findings can be summarized as follows. First, without hyperlinks, linkage information extracted from REFERENCES sections can be used to improve the effectiveness of biomedical information retrieval. Second, performance of the integrated system is sensitive to linkage-based ranking algorithms, and a simpler algorithm, InDegree, is more suitable for biomedical literature retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Retrieval of Highly Related Biomedical References by Key Passages of Citations

Mining Biomedical Literature: An Open Source and Modular Approach

A2A: a platform for research in biomedical literature search

Article Open access 21 December 2020

References

Marchiori, M.: The quest for correct information on the web: Hyper search engines. In: Proc. of the 6th International World Wide Web Conference, pp. 1225–1235 (1997)
Google Scholar
Carriere, J., Kazman, R.: Webquery: Searching and visualizing the web through connectivity. In: Proc. of the 6th International World Wide Web Conference, pp. 1257–1267 (1997)
Google Scholar
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking algorithms, theory, and experiments. ACM Tran. on Internet Technologies 5, 231–297 (2005)
Article Google Scholar
Najork, M., Zaragoza, H., Taylor, M.: Hits on the web: How does it compare? In: Proc. of the 30th ACM SIGIR, pp. 471–478 (2007)
Google Scholar
Lin, J.: PageRank without Hyperlinks: Reranking with Related Document Networks. Technical Report LAMP-TR-146/HCIL-2008-01 (January 2008)
Google Scholar
Beaulieu, M., Gatford, M., Huang, X., Robertson, S.E., Walker, S., Williams, P.: Okapi at TREC-5. In: Proc. of of TREC-5, pp. 143–166 (1997)
Google Scholar
Huang, X., Peng, F., Schuurmans, D., Cercone, N., Robertson, S.: Applying machine learning to text segmentation for information retrieval. Information Retrieval Journal 6(4), 333–362 (2003)
Article Google Scholar
Robertson, S.E., Sparck, J.K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129–146 (1976)
Article Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Article Google Scholar
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of ACM (JASM) 46, 604–632 (1999)
Article MathSciNet MATH Google Scholar
Hersh, W., Cohen, A., Ruslen, L., Roberts, P.: Trec 2007 genomics track overview. In: Proc. of TREC (2007)
Google Scholar
An, Y., Janssen, J., Milios, E.: Characterizing and mining the citation graph of the computer science literature. Knowledge and Information Systems 6(6), 664–678 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, York University, Toronto, Ontario, M3J 1P3, Canada
Xiaoshi Yin & Xiangji Huang
Computer Science Department, York University, Toronto, Ontario, M3J 1P3, Canada
Qinmin Hu
School of Computer Science and Engineering, Beihang University, Beijing, 100083, China
Xiaoshi Yin & Zhoujun Li

Authors

Xiaoshi Yin
View author publications
You can also search for this author in PubMed Google Scholar
Xiangji Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qinmin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhoujun Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5 Tiwanont Road, 12000, Bangkadi, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Dept. of Computer Engineering, Faculty of Engineering, Chulalongkorn University, 10330, Bangkok, Thailand
Boonserm Kijsirikul
Faculty of Science & Engineering, York University, 355 Lumbers Building, 4700 Keele Street, M3J 1P3, Toronto, Ontario, Canada
Nick Cercone
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, 923-1292, Ishikawa, Japan
Tu-Bao Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, X., Huang, X., Hu, Q., Li, Z. (2009). Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_100

Download citation

DOI: https://doi.org/10.1007/978-3-642-01307-2_100
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics