Abstract
With the development of citation analysis, citation mention times are drawing more attention. Aiming to extract mention times more conveniently and quickly, this study focused on developing a high-accuracy citation recognition algorithm based on neural networks, thereby providing automatic extraction of the number of citation mentions in citing papers, and on assessing its performance in PDF papers with different citation styles. We also used this algorithm to study the distribution rule and contribution of citations to citing papers. The results showed that the proposed algorithm is feasible for use in citation-mention-related research and further verified that the statistical distribution of the number of citation mentions conforms to the generalised Pareto distribution. Meanwhile, references mentioned more than twice accounted for about 20–40% of the total and contributed more than other references.


Similar content being viewed by others
References
Bergmark, D. (2000). Automatic extraction of reference linking information from online documents. Technical Report. Cornell University, USA.
Bergmark, D., Phempoonpanich, P., & Zhao, S. (2001). Scraping the ACM digital library. SIGIR Forum, 35(2), 1–7.
Bergstrom, C. T., West, J. D., & Wiseman, M. A. (2008). The eigenfactor metrics. Journal of Neuroscience, 28(45), 11433–11434.
Bertin, M., Atanassova, I., Gingras, Y., & Lariviere, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.
Blanford, C. F. (2016). Impact factors, citation distributions and journal stratification. Journal of Materials Science, 51, 10319–10322.
Boyack, K. W., Van Eck, N. J., Colavizzac, G., & Waltman, L. (2018). Characterizing in-text citations in scientific articles: A large-scale analysis. Journal of Informetrics, 12(1), 59–73.
Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5(4), 423–441.
Councill, I. G., Giles, C. L., Han, H., & Manavoglu, E. (2005). Automatic acknowledgement indexing: Expanding the semantics of contribution in the CiteSeer digital library. In Proceedings of the third international conference on knowledge capture, Banff, Canada.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009, 248–255.
Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.
Fricke, S. (2018). Semantic Scholar. Journal of the Medical Library Association: JMLA, 106(1), 145.
Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. American Documentation, 14(3), 195–201.
Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In The third ACM conference on digital libraries, Pittsburgh, PA.
González-Pereira, B., Guerrero-Bote, V. P., & Moya-Anegón, F. (2010). A new approach to the metric of journals’ scientific prestige: The SJR indicator. Journal of Informetrics, 4(3), 379–391.
Herlach, G. (1978). Can retrieval of information from citation indexes be simplified? Multiple mentions of a reference as a characteristic of the link between cited and citing article. Journal of the Association for Information Science & Technology, 29(6), 308–310.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. PNAS, 102(46), 16569–16572.
Hou, W. R., Li, M., & Niu, D. K. (2011). Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution. BioEssays, 33(10), 724–727.
Hu, Z., Chen, C., & Liu, Z. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics, 7(4), 887–896.
Hu, Z., Lin, G., Sun, T., & Hou, H. (2017). Understanding multiply mentioned references. Journal of Informetrics, 11(4), 948–958.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25(2), 1097–1105.
Larivière, V., Kiermer, V., MacCallum, C. J., McNutt, M., Patterson, M., Pulverer, B., et al. (2016). A simple proposal for the publication of journal citation distributions. BioRxiv. https://doi.org/10.1101/062109.
Lopez, P. (2010). Automatic extraction and resolution of bibliographical references in patent documents. In Information retrieval facility conference, Berlin, Heidelberg.
Massey, F. J. (1951). The Kolmogorov–Smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253), 68–78.
Moed, H. F. (2010). Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3), 265–277.
Nisonger, T. E. (2008). The ‘‘80/20 rule’’ and core journals. The Serials Librarian, 55, 62–84.
Powley, B., & Dale, R. (2007). Evidence-based information extraction for high-accuracy citation extraction and author name recognition. In Proceedings of the 8th RIAO international conference on large-scale semantic access to content, Pittsburgh, PA.
Pulli, K., Baksheev, A., Kornyakov, K., & Eruhimov, V. (2012). Real-time computer vision with opencv. Communications of the ACM, 55(6), 61–69.
Rousseau, R. (2005). Median and percentile impact factors: A set of new indicators. Scientometrics, 63(3), 431–441.
Sarawagi, S., Vydiswaran, V. G. V., Srinivasan, S., & Bhudhia, K. (2003). Resolving citations in a paper repository. ACM SIGKDD Explorations Newsletter, 5(2), 156–157.
Tang, R., & Safer, M. A. (2008). Author-rated importance of cited references in biology and psychology publications. Journal of Documentation, 64(2), 246–272.
Voos, H., & Dagaev, K. S. (1976). Are all citations equal? Or, Did we op. cit. your idem? Journal of Academic Librarianship, 1(6), 19–21.
Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365–391.
Waltman, L., van Eck, N. J., van Leeuwen, T. N., & Visser, M. S. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 7(2), 272–285.
Wan, X., & Liu, F. (2014). WL-index: Leveraging citation mention number to quantify an individual’s scientific impact. Journal of the Association for Information Science and Technology, 65(12), 2509–2517.
Wang, M., Jiao, S., Chai, K. H., & Chen, G. (2019a). Building journals’ long-term impact: Using indicators detected from the sustained active articles. Scientometrics, 121(1), 261–283.
Wang, M., Ren, J., Li, S., & Chen, G. (2019b). Quantifying a paper’s academic impact by distinguishing the unequal intensities and contributions of citations. IEEE Access, 7, 96198–96214.
Zhang, X., Zou, J., Le, D. X., & Thoma, G. R. (2011). A structural SVM approach for reference parsing. BMC Bioinformatics, 12(S3), S7.
Zhao, D., Cappello, A., & Johnston, L. (2017). Functions of uni- and multi-citations: Implications for weighted citation analysis. Journal of Data and Information Science, 2(1), 51–69.
Zhao, D., & Strotmann, A. (2015). Re-citation analysis: Promising for research evaluation, knowledge network analysis, knowledge representation, and information retrieval?. In Proceedings of the 15th international society for scientometrics and informetrics conference, Istanbul, Turkey.
Zhao, D., & Strotmann, A. (2016). Dimensions and uncertainties of author citation rankings: Lessons learned from frequency-weighted in-text citation counting. Journal of the Association for Information Science and Technology, 67(3), 671–682.
Zhao, D., & Strotmann, A. (2020). Deep and narrow impact: Introducing location filtered citation counting. Scientometrics, 122(1), 503–517.
Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.
Zou, J., Le, D., & Thoma, G. R. (2010). Locating and parsing bibliographic references in HTML medical articles. International Journal on Document Analysis and Recognition (IJDAR), 13(2), 107–119.
Acknowledgements
This study was funded by the National Natural Science Foundation of China (Grant Nos. 71704035 and 71531013). The authors wish to express their sincere appreciation to the editors and reviewers of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Algorithm used in this study

Appendix 2: Journals used in this study (2013–2017)
Discipline | No | Journal |
---|---|---|
1. Biochemistry and molecular biology | 1 | Biochimie |
2 | Journal of Molecular Graphics and Modelling | |
2. Biophysics | 3 | Bioelectrochemistry |
4 | Biophysical Chemistry | |
3. Computer science | 5 | Computers in Industry |
6 | Information and Computation | |
4. Computer science and information systems | 7 | Data and Knowledge Engineering |
8 | International Journal of Medical Informatics | |
5. Construction and building technology | 9 | Cement and Concrete Composites |
10 | Cement and Concrete Research | |
6. Engineering, chemical | 11 | Advanced Powder Technology |
12 | International Journal of Adhesion and Adhesives | |
7. Engineering, electrical and electronic | 13 | Microelectronics Journal |
14 | Optical Fiber Technology | |
8. Medicine | 15 | Advances in Medical Sciences |
16 | Forensic Science International | |
9. Operations research and management | 17 | Decision Support Systems |
18 | Operations Research Letters | |
10. Statistics and probability | 19 | Journal of Multivariate Analysis |
20 | Stochastic Processes and their Applications |
Rights and permissions
About this article
Cite this article
Wang, W., Wang, Z., Yu, T. et al. Research on citation mention times and contributions using a neural network. Scientometrics 125, 2383–2400 (2020). https://doi.org/10.1007/s11192-020-03711-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03711-2