Abstract
Citation counts are commonly used to evaluate the scientific impact of a publication on the general premise that more citations probably mean more endorsements. However, two questionable assumptions underpin this idea: a) that all authors contributed equally to the paper; and b) that the endorsement is positive. Obviously, neither of these assumptions hold true. Hence, with this study, we examine two components of citations—their purpose, i.e., the reason for the citation, and polarity, being the author’s attitude toward the cited work. Our findings provide a new perspective on the scientific impact of highly-cited publications. Our methodology consists of three steps. Firstly, a pre-trained model composed of a Word2Vec—a well-known word embedding approach—and a convolutional neural network (CNN) is used to identify citation polarity and purpose. Secondly, in a set of highly-cited papers, we compare eight categories of purpose from foundational to critical and three categories of polarity: positive, negative, and neutral. We further explore how different types of papers—those discussing discoveries or those discussing utilitarian topics—influence the evaluation of scientific impact of papers. Finally, we mine and discover the knowledge (e.g. method, concept, tool or data) to explain the actual scientific impact of a highly-cited paper. To demonstrate how combining citation polarity with purpose can provide far greater details of a paper’s scientific impact, we undertake a case study with 370 highly-cited journal articles spanning “Biochemistry & Molecular Biology” and “Genetics & Heredity”. The results yield valuable insights into the assumption about citation counts as a metric for evaluating scientific impact.
Similar content being viewed by others
Data availability
All data and materials support our published claims and comply with field standards.
Code availability
The software application or custom code support our published claims and comply with field standards.
References
Abu-Jbara, A., Ezra, J., & Radev, D. (2013). Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 596–606).
Akella, A. P., Alhoori, H., Kondamudi, P. R., Freeman, C., & Zhou, H. (2021). Early indicators of scientific impact: Predicting citations with altmetrics. Journal of Informetrics, 15(2), 101128. https://doi.org/10.1016/j.joi.2020.101128
Athar, A. (2011, June). Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 student session (pp. 81–87). Association for Computational Linguistics.
Athar, A., & Teufel, S. (2012, June). Context-enhanced citation sentiment detection. In Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 597–601). Association for Computational Linguistics.
Bergstrom, C. T., West, J. D., & Wiseman, M. A. (2008). The eigenfactor metrics. Journal of Neuroscience, 28(45), 11433–11434. https://doi.org/10.1016/j.poly.2005.08.020
Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216. https://doi.org/10.1002/asi.4630330404
Bornmann, L., & Leydesdorff, L. (2017). Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on web of science data. Journal of Informetrics, 11(1), 164–175. https://doi.org/10.1016/j.joi.2016.12.001
Brin, S.,Page, L.,Motwami, R., &Winograd, T. (1998). The PageRank Citation Ranking:Bringing Order to the Web. Stanford Digital Libraries Working Paper, (6), 102–107.
Bu, Y., Ludo, W., & Huang, Y. (2021). A multi-dimensional framework for characterizing the citation impact of scientific publications. Quantitative Science Studies, 2, 1–40. https://doi.org/10.1162/qss_a_00109
Butler, D. (2008). Free Journal-Ranking Tool Enters Citation Market. Nature, 451, 6.
Butt, B. H., Rafi, M., Jamal, A., Rehman, R. S. U., Alam, S. M. Z., & Alam, M. B. (2015). Classification of research citations (CRC). arXiv preprint arXiv:1506.08966.
Chi, P. S., & Glanzel, W. (2017). An empirical investigation of the associations among usage, scientific collaboration and citation impact. Scientometrics, 112(1), 403–412. https://doi.org/10.1007/s11192-017-2356-4
Crane, D. (1972). Invisible colleges: Diffusion of knowledge in scientific communities. The University of Chicago Press.
Egghe, L. (2006). Theory and Practice of the G-index. Scientometrics, 1(69), 131–152.
Egghe, L. (2011). The single publication index of papers in the hirsch-core of a researcher and the indirect index. Scientometrics, 89(3), 727–739. https://doi.org/10.1007/s11192-011-0483-x
Fujiwara, T., & Yamamoto, Y. (2015). Colil: A database and search service for citation contexts in the life sciences domain. Journal of Biomedical Semantics, 6(1), 1–11. https://doi.org/10.1186/s13326-015-0037-x
Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–9. https://doi.org/10.1126/science.178.4060.471
Garfield, E. (1979). Is citation analysis a legitimate evaluation tool? Scientometrics, 1(4), 359–375. https://doi.org/10.1007/BF02016602
Garfield, E., & Merton, R. K. (1979). Citation indexing: Its theory and application in science, technology, and humanities (Vol. 8). Wiley.
Hernández-Alvarez, M., & Gómez, J. M. (2015, October). Citation impact categorization: for scientific literature. In 2015 IEEE 18th International Conference on Computational Science and Engineering (pp. 307–313). IEEE.
Hernández-Alvarez, M., Soriano, J. M. G., & Martínez-Barco, P. (2017). Citation function, polarity and influence classification. Natural Language Engineering, 23(4), 561–588. https://doi.org/10.1007/s11192-019-03028-9
Hirsch, J. E. (2005). An index to quantify an individual's scientific research output. In Proceedings of the National Academy of ences of the United States of America (Vol. 102, pp. 16569–16572). https://doi.org/10.1073/pnas.0507655102
Hutchins, B. I., Yuan, X., Anderson, J. M., Santangelo, G. M., & Vaux, D. L. (2016). Relative citation ratio (RCR): A new metric that uses citation rates to measure influence at the article level. PLoS Biology, 14(9), e1002541.
Ikram, M. T., & Afzal, M. T. (2019). Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge. Scientometrics, 119(1), 73–95. https://doi.org/10.1007/s11192-019-03028-9
Jha, R., Jbara, A. A., Qazvinian, V., & Radev, D. R. (2017). NLP-driven citation analysis for scientometrics. Natural Language Engineering, 23(1), 93–130. https://doi.org/10.1017/S1351324915000443
Jiang, X., & Zhuge, H. (2019). Forward search path count as an alternative indirect citation impact indicator. Journal of Informetrics, 13(4), 1–28. https://doi.org/10.1016/j.joi.2019.100977
Jochim, C., & Schütze, H. (2014, June). Improving citation polarity classification with product reviews. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 42–48).
Jochim, C., & Schütze, H. (2012, December). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of International Conference on Computational Linguistics 2012 (pp. 1343–1358).
Kim, I. C., & Thoma, G. R. (2015, August). Automated classification of author's sentiments in citation using machine learning techniques: A preliminary study. In 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (pp. 1–7). IEEE.
Koshland, D. E. (2007). The cha-cha-cha theory of scientific discovery. Science, 317(5839), 761–762. https://doi.org/10.1126/science.1147166
Kosmulski, M. (2006). A new hirsch-type index saves time and works equally well as the original H-index. ISSI Newsletter, 2(3), 4–6.
Lauscher, A., Glavaš, G., Ponzetto, S. P., & Eckert, K. (2017, December). Investigating convolutional networks and domain-specific embeddings for semantic classification of citations. In Proceedings of the 6th International Workshop on Mining Scientific Publications (pp. 24–28). ACM.
Leydesdorff, L., Bornmann, L., & Wagner, C. S. (2019). The relative influences of government funding and international collaboration on citation impact. Journal of the American Society for Information Science and Technology, 70(2), 198–201.
Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013 (pp. 402–407).
Lin, C. S. (2018). An analysis of citation functions in the humanities and social sciences research from the perspective of problematic citation analysis assumptions. Scientometrics, 116(2), 797–813. https://doi.org/10.1007/s11192-018-2770-2
MacRoberts, M. H., & MacRoberts, B. R. (1984). The negational reference: Or the art of dissembling. Social Studies of Science, 14(1), 91–94. https://doi.org/10.1177/030631284014001006
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 86–92.
Nanba, H., & Okumura, M. (1999, July). Towards multi-paper summarization using reference information. In Proceedings of the 16th international joint conference on Artificial intelligence-Volume 2 (pp. 926–931). Morgan Kaufmann Publishers Inc.
Parthasarathy, G., & Tomar, D. C. (2014, September). Sentiment analyzer: analysis of journal citations from citation databases. In 2014 5th international conference-confluence the next generation information technology (pp. 923–928). IEEE.
Qayyum, F., & Afzal, M. T. (2019). Identification of important citations by exploiting research articles’ metadata and cue-terms from content. Scientometrics, 118(1), 21–43. https://doi.org/10.1007/s11192-018-2961-x
Schubert, A. (2009). Using the h-index for assessing single publications. Scientometrics, 78(3), 559–565. https://doi.org/10.1007/s11192-008-2208-3
Small, H., Tseng, H., & Patek, M. (2017). Discovering discoveries: Identifying biomedical discoveries using citation contexts. Journal of Informetrics, 11(1), 46–62. https://doi.org/10.1016/j.joi.2016.11.001
Spiegel-Rüsing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113. https://doi.org/10.1177/030631277700700111
Tahamtan, I., & Bornmann, L. (2019). What do citation counts measure? an updated review of studies on citations in scientific documents published between 2006 and 2018. Scientometrics, 121(3), 1635–1684. https://doi.org/10.1007/s11192-019-03243-4
Taşkın, Z., & Al, U. (2018). A content-based citation analysis study based on text categorization. Scientometrics, 114(1), 335–357. https://doi.org/10.1007/s11192-017-2560-2
Teufel, S., Siddharthan, A., & Tidhar, D. (2006, July). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics.
Winnink, J. J., Tijssen, R. J. W., & van Raan, A. F. J. (2019). Searching for new breakthroughs in science: How effective are computerised detection algorithms? Technological Forecasting and Social Change, 146, 673–686. https://doi.org/10.1016/j.techfore.2018.05.018
Xu, H., Martin, E., & Mahidadia, A. (2013, September). Using heterogeneous features for scientific citation classification. In Proceedings of the 13th conference of the Pacific Association for Computational Linguistics.
Yan, E., Chen, Z., & Li, K. (2020). Authors’ status and the perceived quality of their work: Measuring citation sentiment change in Nobel articles. Journal of the Association for Information Science and Technology, 71(3), 314–324. https://doi.org/10.1002/asi.24237.
Yan, E., Wu, C., & Song, M. (2018). The funding factor: A cross-disciplinary examination of the association between research funding and citation impact. Scientometrics, 115(1), 369–384. https://doi.org/10.1007/s11192-017-2583-8.
Zhang, Y., Ma, J., Wang, Z., Chen, B., & Yu, Y. (2018). Collective topical pagerank: A model to evaluate the topic-dependent academic impact of scientific papers. Scientometrics, 114(3), 1345–1372. https://doi.org/10.1007/s11192-017-2626-1.
Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). (pp. 1746–1751). Association for Computational Linguistics.
Zhou, Z., Shi, C., Hu, M., & Liu, Y. (2018). Visual ranking of academic influence via paper citation. Journal of Visual Languages & Computing, 48, 134–143. https://doi.org/10.1016/j.jvlc.2018.08.007
Acknowledgements
This work was supported by the General Program of the National Natural Science Foundation of China under Grant Nos. 72074020 and 71774012. The findings and observations in this paper are those of the authors and do not necessarily reflect the views of the supporters.
Funding
This work was supported by the General Program of National Natural Science Foundation of China under Grant Nos. 72074020 and 71774012.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Heng Huang. The first draft of the manuscript was written by Heng Huang and all authors commented on subsequent versions of the manuscript. Revisions to the manuscript were guided by Donghua Zhu and Xuefeng Wang. All authors read and approved the final paper.
Corresponding author
Ethics declarations
Conflicts of interest
Not applicable.
Rights and permissions
About this article
Cite this article
Huang, H., Zhu, D. & Wang, X. Evaluating scientific impact of publications: combining citation polarity and purpose. Scientometrics 127, 5257–5281 (2022). https://doi.org/10.1007/s11192-021-04183-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-021-04183-8