Abstract
This paper presents an empirical study of the combination of content-based information retrieval results with linkage-based document importance scores to improve retrieval performance on TREC biomedical literature datasets. In our study, content-based information comes from the state-of-the-art probability model based Okapi information retrieval system. On the other hand, linkage-based information comes from a citation graph generated from REFERENCES sections of a biomedical literature dataset. Three well-known linkage-based ranking algorithms (PageRank, HITS and InDegree) are applied on the citation graph to calculate document importance scores. We use TREC 2007 Genomics dataset for evaluation, which contains 162,259 biomedical literatures. Our approach achieves the best document-based MAP among all results that have been reported so far. Our major findings can be summarized as follows. First, without hyperlinks, linkage information extracted from REFERENCES sections can be used to improve the effectiveness of biomedical information retrieval. Second, performance of the integrated system is sensitive to linkage-based ranking algorithms, and a simpler algorithm, InDegree, is more suitable for biomedical literature retrieval.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Marchiori, M.: The quest for correct information on the web: Hyper search engines. In: Proc. of the 6th International World Wide Web Conference, pp. 1225–1235 (1997)
Carriere, J., Kazman, R.: Webquery: Searching and visualizing the web through connectivity. In: Proc. of the 6th International World Wide Web Conference, pp. 1257–1267 (1997)
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking algorithms, theory, and experiments. ACM Tran. on Internet Technologies 5, 231–297 (2005)
Najork, M., Zaragoza, H., Taylor, M.: Hits on the web: How does it compare? In: Proc. of the 30th ACM SIGIR, pp. 471–478 (2007)
Lin, J.: PageRank without Hyperlinks: Reranking with Related Document Networks. Technical Report LAMP-TR-146/HCIL-2008-01 (January 2008)
Beaulieu, M., Gatford, M., Huang, X., Robertson, S.E., Walker, S., Williams, P.: Okapi at TREC-5. In: Proc. of of TREC-5, pp. 143–166 (1997)
Huang, X., Peng, F., Schuurmans, D., Cercone, N., Robertson, S.: Applying machine learning to text segmentation for information retrieval. Information Retrieval Journal 6(4), 333–362 (2003)
Robertson, S.E., Sparck, J.K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129–146 (1976)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of ACM (JASM) 46, 604–632 (1999)
Hersh, W., Cohen, A., Ruslen, L., Roberts, P.: Trec 2007 genomics track overview. In: Proc. of TREC (2007)
An, Y., Janssen, J., Milios, E.: Characterizing and mining the citation graph of the computer science literature. Knowledge and Information Systems 6(6), 664–678 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yin, X., Huang, X., Hu, Q., Li, Z. (2009). Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_100
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_100
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)