Skip to main content
Log in

Research on citation mention times and contributions using a neural network

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

With the development of citation analysis, citation mention times are drawing more attention. Aiming to extract mention times more conveniently and quickly, this study focused on developing a high-accuracy citation recognition algorithm based on neural networks, thereby providing automatic extraction of the number of citation mentions in citing papers, and on assessing its performance in PDF papers with different citation styles. We also used this algorithm to study the distribution rule and contribution of citations to citing papers. The results showed that the proposed algorithm is feasible for use in citation-mention-related research and further verified that the statistical distribution of the number of citation mentions conforms to the generalised Pareto distribution. Meanwhile, references mentioned more than twice accounted for about 20–40% of the total and contributed more than other references.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bergmark, D. (2000). Automatic extraction of reference linking information from online documents. Technical Report. Cornell University, USA.

  • Bergmark, D., Phempoonpanich, P., & Zhao, S. (2001). Scraping the ACM digital library. SIGIR Forum, 35(2), 1–7.

    Article  Google Scholar 

  • Bergstrom, C. T., West, J. D., & Wiseman, M. A. (2008). The eigenfactor metrics. Journal of Neuroscience, 28(45), 11433–11434.

    Article  Google Scholar 

  • Bertin, M., Atanassova, I., Gingras, Y., & Lariviere, V. (2016). The invariant distribution of references in scientific articles. Journal of the Association for Information Science and Technology, 67(1), 164–177.

    Article  Google Scholar 

  • Blanford, C. F. (2016). Impact factors, citation distributions and journal stratification. Journal of Materials Science, 51, 10319–10322.

    Article  Google Scholar 

  • Boyack, K. W., Van Eck, N. J., Colavizzac, G., & Waltman, L. (2018). Characterizing in-text citations in scientific articles: A large-scale analysis. Journal of Informetrics, 12(1), 59–73.

    Article  Google Scholar 

  • Chubin, D. E., & Moitra, S. D. (1975). Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science, 5(4), 423–441.

    Article  Google Scholar 

  • Councill, I. G., Giles, C. L., Han, H., & Manavoglu, E. (2005). Automatic acknowledgement indexing: Expanding the semantics of contribution in the CiteSeer digital library. In Proceedings of the third international conference on knowledge capture, Banff, Canada.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009, 248–255.

    Google Scholar 

  • Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7(3), 583–592.

    Article  Google Scholar 

  • Fricke, S. (2018). Semantic Scholar. Journal of the Medical Library Association: JMLA, 106(1), 145.

    Article  Google Scholar 

  • Garfield, E., & Sher, I. H. (1963). New factors in the evaluation of scientific literature through citation indexing. American Documentation, 14(3), 195–201.

    Article  Google Scholar 

  • Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In The third ACM conference on digital libraries, Pittsburgh, PA.

  • González-Pereira, B., Guerrero-Bote, V. P., & Moya-Anegón, F. (2010). A new approach to the metric of journals’ scientific prestige: The SJR indicator. Journal of Informetrics, 4(3), 379–391.

    Article  Google Scholar 

  • Herlach, G. (1978). Can retrieval of information from citation indexes be simplified? Multiple mentions of a reference as a characteristic of the link between cited and citing article. Journal of the Association for Information Science & Technology, 29(6), 308–310.

    Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. PNAS, 102(46), 16569–16572.

    Article  Google Scholar 

  • Hou, W. R., Li, M., & Niu, D. K. (2011). Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution. BioEssays, 33(10), 724–727.

    Article  Google Scholar 

  • Hu, Z., Chen, C., & Liu, Z. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics, 7(4), 887–896.

    Article  Google Scholar 

  • Hu, Z., Lin, G., Sun, T., & Hou, H. (2017). Understanding multiply mentioned references. Journal of Informetrics, 11(4), 948–958.

    Article  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25(2), 1097–1105.

    Google Scholar 

  • Larivière, V., Kiermer, V., MacCallum, C. J., McNutt, M., Patterson, M., Pulverer, B., et al. (2016). A simple proposal for the publication of journal citation distributions. BioRxiv. https://doi.org/10.1101/062109.

    Article  Google Scholar 

  • Lopez, P. (2010). Automatic extraction and resolution of bibliographical references in patent documents. In Information retrieval facility conference, Berlin, Heidelberg.

  • Massey, F. J. (1951). The Kolmogorov–Smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253), 68–78.

    Article  Google Scholar 

  • Moed, H. F. (2010). Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3), 265–277.

    Article  Google Scholar 

  • Nisonger, T. E. (2008). The ‘‘80/20 rule’’ and core journals. The Serials Librarian, 55, 62–84.

    Article  Google Scholar 

  • Powley, B., & Dale, R. (2007). Evidence-based information extraction for high-accuracy citation extraction and author name recognition. In Proceedings of the 8th RIAO international conference on large-scale semantic access to content, Pittsburgh, PA.

  • Pulli, K., Baksheev, A., Kornyakov, K., & Eruhimov, V. (2012). Real-time computer vision with opencv. Communications of the ACM, 55(6), 61–69.

    Article  Google Scholar 

  • Rousseau, R. (2005). Median and percentile impact factors: A set of new indicators. Scientometrics, 63(3), 431–441.

    Article  Google Scholar 

  • Sarawagi, S., Vydiswaran, V. G. V., Srinivasan, S., & Bhudhia, K. (2003). Resolving citations in a paper repository. ACM SIGKDD Explorations Newsletter, 5(2), 156–157.

    Article  Google Scholar 

  • Tang, R., & Safer, M. A. (2008). Author-rated importance of cited references in biology and psychology publications. Journal of Documentation, 64(2), 246–272.

    Article  Google Scholar 

  • Voos, H., & Dagaev, K. S. (1976). Are all citations equal? Or, Did we op. cit. your idem? Journal of Academic Librarianship, 1(6), 19–21.

    Google Scholar 

  • Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365–391.

    Article  Google Scholar 

  • Waltman, L., van Eck, N. J., van Leeuwen, T. N., & Visser, M. S. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 7(2), 272–285.

    Article  Google Scholar 

  • Wan, X., & Liu, F. (2014). WL-index: Leveraging citation mention number to quantify an individual’s scientific impact. Journal of the Association for Information Science and Technology, 65(12), 2509–2517.

    Article  Google Scholar 

  • Wang, M., Jiao, S., Chai, K. H., & Chen, G. (2019a). Building journals’ long-term impact: Using indicators detected from the sustained active articles. Scientometrics, 121(1), 261–283.

    Article  Google Scholar 

  • Wang, M., Ren, J., Li, S., & Chen, G. (2019b). Quantifying a paper’s academic impact by distinguishing the unequal intensities and contributions of citations. IEEE Access, 7, 96198–96214.

    Article  Google Scholar 

  • Zhang, X., Zou, J., Le, D. X., & Thoma, G. R. (2011). A structural SVM approach for reference parsing. BMC Bioinformatics, 12(S3), S7.

    Article  Google Scholar 

  • Zhao, D., Cappello, A., & Johnston, L. (2017). Functions of uni- and multi-citations: Implications for weighted citation analysis. Journal of Data and Information Science, 2(1), 51–69.

    Article  Google Scholar 

  • Zhao, D., & Strotmann, A. (2015). Re-citation analysis: Promising for research evaluation, knowledge network analysis, knowledge representation, and information retrieval?. In Proceedings of the 15th international society for scientometrics and informetrics conference, Istanbul, Turkey.

  • Zhao, D., & Strotmann, A. (2016). Dimensions and uncertainties of author citation rankings: Lessons learned from frequency-weighted in-text citation counting. Journal of the Association for Information Science and Technology, 67(3), 671–682.

    Article  Google Scholar 

  • Zhao, D., & Strotmann, A. (2020). Deep and narrow impact: Introducing location filtered citation counting. Scientometrics, 122(1), 503–517.

    Article  Google Scholar 

  • Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.

    Article  Google Scholar 

  • Zou, J., Le, D., & Thoma, G. R. (2010). Locating and parsing bibliographic references in HTML medical articles. International Journal on Document Analysis and Recognition (IJDAR), 13(2), 107–119.

    Article  Google Scholar 

Download references

Acknowledgements

This study was funded by the National Natural Science Foundation of China (Grant Nos. 71704035 and 71531013). The authors wish to express their sincere appreciation to the editors and reviewers of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guang Yu.

Appendices

Appendix 1: Algorithm used in this study

figure a

Appendix 2: Journals used in this study (2013–2017)

Discipline

No

Journal

1. Biochemistry and molecular biology

1

Biochimie

2

Journal of Molecular Graphics and Modelling

2. Biophysics

3

Bioelectrochemistry

4

Biophysical Chemistry

3. Computer science

5

Computers in Industry

6

Information and Computation

4. Computer science and information systems

7

Data and Knowledge Engineering

8

International Journal of Medical Informatics

5. Construction and building technology

9

Cement and Concrete Composites

10

Cement and Concrete Research

6. Engineering, chemical

11

Advanced Powder Technology

12

International Journal of Adhesion and Adhesives

7. Engineering, electrical and electronic

13

Microelectronics Journal

14

Optical Fiber Technology

8. Medicine

15

Advances in Medical Sciences

16

Forensic Science International

9. Operations research and management

17

Decision Support Systems

18

Operations Research Letters

10. Statistics and probability

19

Journal of Multivariate Analysis

20

Stochastic Processes and their Applications

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Wang, Z., Yu, T. et al. Research on citation mention times and contributions using a neural network. Scientometrics 125, 2383–2400 (2020). https://doi.org/10.1007/s11192-020-03711-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03711-2

Keywords

Navigation