Skip to main content

Citation-Based Extraction of Core Contents from Biomedical Articles

  • Conference paper
  • First Online:
Trends in Applied Knowledge-Based Systems and Data Science (IEA/AIE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9799))

Abstract

Retrieval of biomedical articles about specific research issues (e.g., gene-disease associations) is an essential and routine job for biomedical researchers. An article a can be said to be about a research issue r only if its core content (goal, background, and conclusion of a) focuses on r. In this paper, we present a technique CoreCE (Core Content Extractor) that, given a biomedical article a, extracts the textual core content of a. The core contents extracted from biomedical articles can be used to index the articles so that articles about specific research issues can be retrieved by search engines more properly. Development of CoreCE is challenging, because the core content of an article a may be expressed in different ways and scattered in a. We tackle the challenge by considering titles of the references cited by a, as well as the passages (in a) used to explain why the references are cited (i.e., the citation passages). Empirical evaluation shows that, by representing biomedical articles with the core contents extracted by CoreCE, retrieval of those articles that are judged (by biomedical experts) to be about specific gene-disease associations can be significantly improved. CoreCE can thus be a front-end processor for search engines to preprocess biomedical scholarly articles for subsequent indexing and retrieval. The contribution is of technical significance to the retrieval and mining of the evidence already published in biomedical literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The ways of database update by Genetic Home Reference and Online Mendelian Inheritance in Human can be found at http://ghr.nlm.nih.gov/ExpertReviewers and http://www.omim.org/about, respectively.

  2. 2.

    Google Scholar is available at https://scholar.google.com.

  3. 3.

    PubMed is available at http://www.ncbi.nlm.nih.gov/pubmed.

  4. 4.

    The way PubMed employs to retrieve related articles can be found at http://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Computation_of_Similar_Articl.

  5. 5.

    When extracting the α words, stopwords are excluded and hence not counted.

  6. 6.

    We expect that α should be in the range of [5, 15], which is the typical number of words employed to comment a reference. The expectation will be justified in the experiments reported in Sect. 4.

  7. 7.

    DisGeNET is available at http://www.disgenet.org/web/DisGeNET/menu/home.

  8. 8.

    GAD is available at http://geneticassociationdb.nih.gov.

  9. 9.

    CTD is available at http://ctdbase.org.

  10. 10.

    PMC provides full-text biomedical articles at http://www.ncbi.nlm.nih.gov/pmc. All articles that are not included in PMC are excluded in the experiments.

References

  1. Aljaber, B., Stokes, N., Bailey, J., Pei, J.: Document clustering of scientific texts using citation contexts. Inf. Retrieval 13(2), 101–131 (2010)

    Article  Google Scholar 

  2. Amsler R.A.: Application of citation-based automatic classification. Technical report, Linguistics Research Center, University of Texas at Austin (1972)

    Google Scholar 

  3. Becker, K.G., Barnes, K.C., Bright, T.J., Wang, S.A.: The genetic association database. Nat. Genet. 36(5), 431–432 (2004)

    Article  Google Scholar 

  4. Boyack, K.W., Small, H., Klavans, R.: Improving the accuracy of co-citation clustering using full text. J. Am. Soc. Inform. Sci. Technol. 64(9), 1759–1767 (2013)

    Article  Google Scholar 

  5. Boyack, K.W., Newman, D., Duhon, R.J., Klavans, R., Patek, M., Biberstine, J.R., et al.: Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS One 6(3), e18029 (2011)

    Article  Google Scholar 

  6. Boyack, K.W., Klavans, R.: Co-citation analysis, bibliographic coupling, and direct citation: which citation approach represents the research front most accurately? J. Am. Soc. Inform. Sci. Technol. 61(12), 2389–2404 (2010)

    Article  Google Scholar 

  7. Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., Goncalves, M.A.: Combining link-based and content-based methods for web document classification. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, New Orleans, Louisiana, USA (2003)

    Google Scholar 

  8. Couto, T., Cristo, M., Gonçalves, M.A., Calado, P., Nivio Ziviani, N., Moura, E., Ribeiro-Neto, B.: A comparative study of citations and links in document classification. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 75–84 (2006)

    Google Scholar 

  9. Gipp, B., Meuschke, N.: Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence. In: Proceedings of 11th ACM Symposium on Document Engineering, Mountain View, CA, USA (2011)

    Google Scholar 

  10. Gipp, B., Beel, J.: Citation proximity analysis (CPA) – a new approach for identifying related work based on co-citation analysis. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 2, pp. 571–575 (2009)

    Google Scholar 

  11. Glenisson, P., Glanzel, W., Janssens, F., De Moor, B.: Combining full text and bibliometric information in mapping scientific disciplines. Inf. Process. Manag. 41, 1548–1572 (2005)

    Article  Google Scholar 

  12. Heck, T.: Combining social information for academic networking. In: Proceedings of the 16th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), San Antonio, Texas, USA (2013)

    Google Scholar 

  13. Kumar, S., Reddy, P.K., Reddy, V.B., Singh, A.: Similarity analysis of legal judgments. In: Proceedings of the Fourth Annual ACM Bangalore Conference (COMPUTE), Bangalore, Karnataka, India (2011)

    Google Scholar 

  14. Kessler, M.M.: Bibliographic coupling between scientific papers. Am. Documentation 14(1), 10–25 (1963)

    Article  Google Scholar 

  15. Landauer, T.K., Laham, D., Derr, M.: From paragraph to graph: latent semantic analysis for information visualization. Proc. Natl. Acad. Sci. U.S.A. 101(Suppl 1), 5214–5219 (2004)

    Article  Google Scholar 

  16. Liu, R.-L.: Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS One 10(10), e0139245 (2015)

    Article  Google Scholar 

  17. Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., Lin, Y.: Literature retrieval based on citation context. Scientometrics 101(2), 1293–1307 (2014)

    Article  Google Scholar 

  18. Liu, X., Zhang, J., Guo, C.: Full-text citation analysis: a new method to enhance scholarly networks. J. Am. Soc. Inform. Sci. Technol. 64(9), 1852–1863 (2013)

    Article  Google Scholar 

  19. Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR 2004 Workshop on Search and Discovery in Bioinformatics, pp. 81–88 (2004)

    Google Scholar 

  20. Robertson, S.E., Walker, S., Beaulieu, M.: Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. In: Proceedings of the 7th Text REtrieval Conference (TREC 7), Gaithersburg, USA, pp. 253–264 (1998)

    Google Scholar 

  21. Small, H.G.: Co-citation in the scientific literature: a new measure of relationship between two documents. J. Am. Soc. Inform. Sci. Technol. 24(4), 265–269 (1973)

    Article  Google Scholar 

  22. Whissell, J.S., Clarke, C.L.A.: Effective measures for inter-document similarity. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1361–1370 (2013)

    Google Scholar 

  23. White, H.D., Griffith, B.C.: Author cocitation: a literature measure of intellectual structure. J. Am. Soc. Inform. Sci. Technol. 32(3), 163–171 (1981)

    Article  Google Scholar 

  24. Wiegers, T.C., Davis, A.P., Cohen, K.B., Hirschman, L., Mattingly, C.J.: Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD). BMC Bioinformatics 10, 326 (2009)

    Article  Google Scholar 

  25. Yoon, S.-H., Kim, S.-W., Park, S.: A link-based similarity measure for scientific literature. In: Proceedings of the 19th International World Wide Web Conference (WWW), North Carolina, USA (2010)

    Google Scholar 

  26. Zhao, P., Han, J., Sun, Y.: P-Rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 553–562 (2009)

    Google Scholar 

Download references

Acknowledgment

This research was supported by the Ministry of Science and Technology of Taiwan under the grant MOST 104-2221-E-320-005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rey-Long Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, RL. (2016). Citation-Based Extraction of Core Contents from Biomedical Articles. In: Fujita, H., Ali, M., Selamat, A., Sasaki, J., Kurematsu, M. (eds) Trends in Applied Knowledge-Based Systems and Data Science. IEA/AIE 2016. Lecture Notes in Computer Science(), vol 9799. Springer, Cham. https://doi.org/10.1007/978-3-319-42007-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42007-3_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42006-6

  • Online ISBN: 978-3-319-42007-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics