Skip to main content
Log in

Computer models for identifying instrumental citations in the biomedical literature

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The most popular method for evaluating the quality of a scientific publication is citation count. This metric assumes that a citation is a positive indicator of the quality of the cited work. This assumption is not always true since citations serve many purposes. As a result, citation count is an indirect and imprecise measure of impact. If instrumental citations could be reliably distinguished from non-instrumental ones, this would readily improve the performance of existing citation-based metrics by excluding the non-instrumental citations. A citation was operationally defined as instrumental if either of the following was true: the hypothesis of the citing work was motivated by the cited work, or the citing work could not have been executed without the cited work. This work investigated the feasibility of developing computer models for automatically classifying citations as instrumental or non-instrumental. Instrumental citations were manually labeled, and machine learning models were trained on a combination of content and bibliometric features. The experimental results indicate that models based on content and bibliometric features are able to automatically classify instrumental citations with high predictivity (AUC = 0.86). Additional experiments using independent hold out data and prospective validation show that the models are generalizeable and can handle unseen cases. This work demonstrates that it is feasible to train computer models to automatically identify instrumental citations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aliferis, C. F., Statnikov, A., & Tsamardinos, I. (2006). Challenges in the analysis of mass-throughput data. Cancer Informatics, 2, 133–162.

    Google Scholar 

  • Aliferis, C. F., Statnikov, A., Tsamardinos, I., et al. (2010). Local causal and markov blanket induction for causal discovery and feature selection for classification part I: Algorithms and empirical evaluation. Journal of Machine Learning Research, 11, 171–234.

    MathSciNet  MATH  Google Scholar 

  • Aphinyanaphongs, Y., Tsamardinos, I., Statnikov, A., et al. (2005). Text categorization models for high-quality article retrieval in internal medicine. Journal of the American Medical Informatics Association, 12, 207–216.

    Article  Google Scholar 

  • Bornmann, L., & Daniel, H. (2007). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.

    Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30, 107–117.

    Article  Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

    Article  Google Scholar 

  • Cronin, B. (1998). Metatheorizing citation. Scientometrics, 43, 45–55.

    Article  Google Scholar 

  • Egashira, K., Inou, T., Hirooka, Y., et al. (1993). Evidence of impaired endothelium-dependent coronary vasodilatation in patients with angina pectoris and normal coronary angiograms. New England Journal of Medicine, 328, 1659–1664. doi:10.1056/nejm199306103282302.

    Article  Google Scholar 

  • Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85, 257–270.

    Article  Google Scholar 

  • Garfield, E. (1962). Can citation indexing be automated? Essays of an Information Scientist, 1, 84–90.

    Google Scholar 

  • Hecht, S. S., Carmella, S. G., Murphy, S. E., et al. (1993). A tobacco-specific lung carcinogen in the urine of men exposed to cigarette smoke. New England Journal of Medicine, 329, 1543–1546. doi:10.1056/nejm199311183292105.

    Article  Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174.

    Article  MathSciNet  MATH  Google Scholar 

  • Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. Machine Learning, 46, 423–444.

    Article  MATH  Google Scholar 

  • MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36, 435–444.

    Article  Google Scholar 

  • Mercer, R. E., DiMarco, C. (2004). A design methodology for a biomedical literature indexing tool using the rhetoric of science. In 2004 Joint Conference on Human Language Technology/North American Association for Computational Linguistics (HLT-NAACL).

  • Nicolaisen, J. (2003). The Social Act of Citing: Towards New Horizons in Citation Theory. In Proceedings of the 66th ASIST Annual Meeting 12–20.

  • Phelan, T. J. (1999). A compendium of issues for citation analysis. Scientometrics, 45, 117–136.

    Article  Google Scholar 

  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137.

    Article  Google Scholar 

  • Seglen, P. O. (1998). Citation rates and journal impact factors are not suitable for evaluation of research. Acta Orthopaedica Scandinavica, 69, 224–229.

    Article  Google Scholar 

  • Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of EMNLP.

Download references

Acknowledgments

The authors gratefully acknowledge support from R56 LM007948-04A1 and 1UL1RR029893.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lawrence D. Fu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, L.D., Aphinyanaphongs, Y. & Aliferis, C.F. Computer models for identifying instrumental citations in the biomedical literature. Scientometrics 97, 871–882 (2013). https://doi.org/10.1007/s11192-013-0983-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-013-0983-y

Keywords

Navigation