Abstract
Bibliographic coupling (BC) is an effective measure to estimate the similarity between two scholarly articles (i.e., inter-article similarity between the two articles). It works on out-link references of articles (i.e., those references cited by the articles), and is essential for relatedness analysis and topic clustering of scholarly articles. In this paper, we present a new BC measure DescriptiveBC, which employs the titles of the out-link references to improve BC in two ways: given a target article a, DescriptiveBC provides more accurate information about how (based on numerical inter-article similarity) and why (based on textual descriptive terms) a scholarly article is related to a. Visualization of the information can support the identification, clustering, mapping, and navigation of the related evidence in scientific literature. Empirical evaluation justifies the contributions of DescriptiveBC. Release of the reference titles in each article is thus helpful for the dissemination of research findings in scientific literature, and DescriptiveBC can be incorporated into search engines of scholarly articles to help prospective researchers to navigate through the space of related articles online.
Similar content being viewed by others
Notes
Basic description of the “KeyWords Plus” service can be found at http://interest.science.thomsonreuters.com/content/WOKUserTips-201010-IN.
DisGeNET is available at http://www.disgenet.org/web/DisGeNET/menu/home.
The ways of database update by Genetic Home Reference and Online Mendelian Inheritance in Human can be found at http://ghr.nlm.nih.gov/ExpertReviewers and http://www.omim.org/about, respectively.
GAD is available at available at http://geneticassociationdb.nih.gov.
CTD is available at available at http://ctdbase.org.
PubMed Central is available at http://www.ncbi.nlm.nih.gov/pmc.
The title of PMC1774044 is “Absence of PRSS1 mutations and association of SPINK1 trypsin inhibitor mutations in hereditary and non-hereditary chronic pancreatitis”.
The title of PMC1773194 is “The N34S mutation of SPINK1 (PSTI) is associated with a familial pattern of idiopathic chronic pancreatitis but does not cause the disease”.
The title of PMC1773221 is “Mutations in serine protease inhibitor Kazal type 1 are strongly associated with chronic pancreatitis”.
The title of PMC2928535 is “Inhibition of acinar apoptosis occurs during acute pancreatitis in the human homologue ∆F508 cystic fibrosis mouse”.
A basic description for cationic trypsinogen and serine peptidase can be found at Genetic Home Reference: https://ghr.nlm.nih.gov/gene/PRSS1.
A basic description for the CFTR gene and cystic fibrosis can be found at Genetic Home Reference: https://ghr.nlm.nih.gov/condition/cystic-fibrosis#genes.
A basic description for the erythropoietin (EPO) gene can be found at Genetic Home Reference: https://ghr.nlm.nih.gov/gene/EPO#.
The title of PMC3441831 is “Erythropoietin Receptor Contributes to Melanoma Cell Survival in vivo”.
The title of PMC1386105 is “Signals for stress erythropoiesis are integrated via an erythropoietin receptor–phosphotyrosine-343–Stat5 axis”.
The title of PMC2754516 is “Use of agents stimulating erythropoiesis in digestive diseases”.
The title of PMC1890992 is “Erythropoietin/erythropoietin receptor system is involved in angiogenesis in human neuroblastoma”.
Epoetin alfa is human erythropoietin produced in cell culture.
References
Aljaber, B., Stokes, N., Bailey, J., & Pei, J. (2010). Document clustering of scientific texts using citation contexts. Information Retrieval, 13(2), 101–131.
Becker, K. G., Barnes, K. C., Bright, T. J., & Wang, S. A. (2004). The genetic association database. Nature Genetics, 36(5), 431–432.
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404.
Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., et al. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE, 6(3), e18029.
Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.
Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., & Goncalves, M. A. (2003). Combining link-based and content-based methods for web document classification. In Proceedings of the 2003 ACM CIKM international conference on information and knowledge management (CIKM’03), New Orleans, Louisiana, USA.
Couto, T., Cristo, M., Gonçalves, M. A., Calado, P., Nivio Ziviani, N., Moura, E., et al. (2006). A comparative study of citations and links in document classification. In Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries (pp. 75–84).
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.
Garfield, E. (1990). KeyWords Plus: ISI’s breakthrough retrieval method. Part 1. Expanding your searching power on current contents on diskette. Current Contents, 32, 3–7.
Gipp, B., & Beel, J. (2009). Citation proximity analysis (CPA)—A new approach for identifying related work based on co-citation analysis. In Proceedings of the 12th international conference on scientometrics and informetrics (pp. 571–575), Brazil.
Gipp, B., & Meuschke, N. (2011). Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence. In Proceedings of the 11th ACM symposium on document engineering, Mountain View, CA, USA.
Glenisson, P., Glanzel, W., Janssens, F., & De Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41, 1548–1572.
Janssens, F., Glänzel, W., & De Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.
Janssens, F., Zhang, L., De Moor, B., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45, 683–702.
Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10–25.
Kumar, S., Reddy, P. K., Reddy, V. B., & Singh, A. (2011). Similarity analysis of legal judgments. In Proceedings of the fourth annual ACM Bangalore conference (COMPUTE 2011), Bangalore, Karnataka, India.
Landauer, T. K., Laham, D., & Derr, M. (2004). From paragraph to graph: Latent semantic analysis for information visualization. Proceedings of the National Academy of Sciences of the USA, 101(Suppl 1), 5214–5219.
Liu, R.-L. (2015). Passage-based bibliographic coupling: An inter-article similarity measure for biomedical articles. PLoS ONE, 10(10), e0139245.
Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101(2), 1293–1307.
Liu, R.-L., & Huang, Y.-C. (2011). Ranker enhancement for proximity-based ranking of biomedical texts. Journal of the American Society for Information Science and Technology, 62(12), 2479–2495.
Liu, X., Zhang, J., & Guo, C. (2013). Full-text citation analysis: A new method to enhance scholarly networks. Journal of the American Society for Information Science and Technology, 64(9), 1852–1863.
Nakov, P. I., Schwartz, A. S., & Hearst, M. (2004). Citances: Citation sentences for semantic analysis of bioscience text. In Proceedings of the SIGIR’04 workshop on search and discovery in bioinformatics (pp. 81–88).
Qin, J. (2000). Semantic similarities between a keyword database and a controlled vocabulary database: an investigation in the antibiotic resistance literature. Journal of the American Society for Information Science., 51(3), 166–180.
Ritchie, A., Teufel, S., & Robertson, S. (2008). Using terms from citations for IR: Some first results. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, & R. White (Eds.), Advances in information retrieval (Vol. 4956, pp. 211–221). Berlin: Springer.
Robertson, S. E., Walker, S., & Beaulieu, M. (1998). Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. In Proceedings of the 7th text retrieval conference (TREC 7) (pp. 253–264). Gaithersburg, USA.
Salton, G., & Zhang, Y. (1986). Enhancement of text representations using related document titles. Information Processing and Management, 22(5), 385–394.
Small, H. G. (1973). Co-citation in the scientific literature: A new measure of relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269.
Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388.
Thijs, B., Zhang, L., & Glänzel, W. (2015). Bibliographic coupling and hierarchical clustering for the validation and improvement of subject-classification schemes. Scientometrics, 105(3), 1453–1467.
van Eck, N. J., Waltman, L., Noyons, E. C., & Buter, R. K. (2010). Automatic term identification for bibliometric mapping. Scientometrics, 82(3), 581–596.
Whissell, J. S., & Clarke, C. L. A. (2013). effective measures for inter-document similarity. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (CIKM’13) (pp. 1361–1370).
Wiegers, T. C., Davis, A. P., Cohen, K. B., Hirschman, L., & Mattingly, C. J. (2009). Text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (CTD). BMC Bioinformatics, 10, 326.
Acknowledgements
This research was supported by Ministry of Science and Technology, Taiwan (Grant ID: MOST 104-2221-E-320-005).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, RL. A new bibliographic coupling measure with descriptive capability. Scientometrics 110, 915–935 (2017). https://doi.org/10.1007/s11192-016-2196-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-2196-7