Skip to main content

Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

  • 3351 Accesses

Abstract

This paper presents an empirical study of the combination of content-based information retrieval results with linkage-based document importance scores to improve retrieval performance on TREC biomedical literature datasets. In our study, content-based information comes from the state-of-the-art probability model based Okapi information retrieval system. On the other hand, linkage-based information comes from a citation graph generated from REFERENCES sections of a biomedical literature dataset. Three well-known linkage-based ranking algorithms (PageRank, HITS and InDegree) are applied on the citation graph to calculate document importance scores. We use TREC 2007 Genomics dataset for evaluation, which contains 162,259 biomedical literatures. Our approach achieves the best document-based MAP among all results that have been reported so far. Our major findings can be summarized as follows. First, without hyperlinks, linkage information extracted from REFERENCES sections can be used to improve the effectiveness of biomedical information retrieval. Second, performance of the integrated system is sensitive to linkage-based ranking algorithms, and a simpler algorithm, InDegree, is more suitable for biomedical literature retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Marchiori, M.: The quest for correct information on the web: Hyper search engines. In: Proc. of the 6th International World Wide Web Conference, pp. 1225–1235 (1997)

    Google Scholar 

  2. Carriere, J., Kazman, R.: Webquery: Searching and visualizing the web through connectivity. In: Proc. of the 6th International World Wide Web Conference, pp. 1257–1267 (1997)

    Google Scholar 

  3. Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link analysis ranking algorithms, theory, and experiments. ACM Tran. on Internet Technologies 5, 231–297 (2005)

    Article  Google Scholar 

  4. Najork, M., Zaragoza, H., Taylor, M.: Hits on the web: How does it compare? In: Proc. of the 30th ACM SIGIR, pp. 471–478 (2007)

    Google Scholar 

  5. Lin, J.: PageRank without Hyperlinks: Reranking with Related Document Networks. Technical Report LAMP-TR-146/HCIL-2008-01 (January 2008)

    Google Scholar 

  6. Beaulieu, M., Gatford, M., Huang, X., Robertson, S.E., Walker, S., Williams, P.: Okapi at TREC-5. In: Proc. of of TREC-5, pp. 143–166 (1997)

    Google Scholar 

  7. Huang, X., Peng, F., Schuurmans, D., Cercone, N., Robertson, S.: Applying machine learning to text segmentation for information retrieval. Information Retrieval Journal 6(4), 333–362 (2003)

    Article  Google Scholar 

  8. Robertson, S.E., Sparck, J.K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129–146 (1976)

    Article  Google Scholar 

  9. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)

    Article  Google Scholar 

  10. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of ACM (JASM) 46, 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hersh, W., Cohen, A., Ruslen, L., Roberts, P.: Trec 2007 genomics track overview. In: Proc. of TREC (2007)

    Google Scholar 

  12. An, Y., Janssen, J., Milios, E.: Characterizing and mining the citation graph of the computer science literature. Knowledge and Information Systems 6(6), 664–678 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yin, X., Huang, X., Hu, Q., Li, Z. (2009). Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_100

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_100

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics