Skip to main content
Log in

Citation count prediction as a link prediction problem

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The citation count is an important factor to estimate the relevance and significance of academic publications. However, it is not possible to use this measure for papers which are too new. A solution to this problem is to estimate the future citation counts. There are existing works, which point out that graph mining techniques lead to the best results. We aim at improving the prediction of future citation counts by introducing a new feature. This feature is based on frequent graph pattern mining in the so-called citation network constructed on the basis of a dataset of scientific publications. Our new feature improves the accuracy of citation count prediction, and outperforms the state-of-the-art features in many cases which we show with experiments on two real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www-kdd.isti.cnr.it/GERM/

References

  1. Pobiedina N, Ichise R (2014) Predicting citation counts for academic literature using graph pattern mining. In: Proceeding IEA/AIE, pp 109–119

  2. Garfield E (2001) Impact factors, and why they won’t go away. Science 411(6837):522

    Google Scholar 

  3. Hirsch J (2005) An index to quantify an individual’s scientific research output. Proc the National Academy of Sciences of the United States America 102(46):16569

    Article  Google Scholar 

  4. Beel J, Gipp B (2009) Google scholar’s ranking algorithm: The impact of citation counts (an empirical study). In: Proceeding RCIS, pp 439–446

  5. Bethard S, Jurafsky D (2010) Who should I cite: learning literature search models from citation behavior. In: Proceeding CIKM, pp 609–618

  6. Callaham M, Wears R, Weber E (2002) Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. J. Am. Med. Assoc. 287(21):2847–50

    Article  Google Scholar 

  7. Kulkarni AV, Busse JW, Shams I (2007) Characteristics associated with citation rate of the medical literature. PLOS One 2(5)

  8. Didegah F, Thelwall M (2013) Determinants of research citation impact in nanoscience and nanotechnology. JASIST (JASIS) 64(5):1055–1064

    Article  Google Scholar 

  9. Livne A, Adar E, Teevan J, Dumais S (2013) Predicting citation counts using text and graph mining. In: Proceeding the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications

  10. Bringmann B, Berlingerio M, Bonchi F, Gionis A (2010) Learning and predicting the evolution of social networks. IEEE Intell Syst 25:26–35

    Article  Google Scholar 

  11. Yan R, Tang J, Liu X, Shan D, Li X (2011) Citation count prediction: learning to estimate future citations for literature. In: Proceeding CIKM, pp 1247–1252

  12. Mcgovern A, Friedl L, Hay M, Gallagher B, Fast A, Neville J, Jensen D (2003) Exploiting relational structure to understand publication patterns in high-energy physics. SIGKDD Explorations 5:2003

    Article  Google Scholar 

  13. Yan R, Huang C, Tang J, Zhang Y, Li X (2012) To better stand on the shoulder of giants. In: Proceeding JCDL, pp 51– 60

  14. Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Sci Mag 286(5439):509–512

    MathSciNet  Google Scholar 

  15. Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Networks 25(3):211–230

    Article  Google Scholar 

  16. Liben-Nowell D (2007) The link-prediction problem for social networks. JASIST 58(7):1019–1031

    Article  Google Scholar 

  17. Munasinghe L, Ichise R (2012) Time score: A new feature for link prediction in social networks. IEICE Trans 95-D(3):821–828

    Google Scholar 

  18. Shi X, Leskovec J, McFarland D A (2010) Citing for high impact. In: Proceeding JCDL, pp 49–58

  19. Devroye L, Gyrfi L, Lugosi G (1996) A Probabilistic Theory of Pattern Recognition. Springer

  20. Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27

    Article  Google Scholar 

  21. Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: A conditional inference framework. J Comp Graph Stat 15(3):651–674

    Article  MathSciNet  Google Scholar 

  22. Breiman L, Friedman J, Stone C J, Olshen R (1984) Classification and Regression Trees. Chapman and Hall/CRC

  23. The R project for statistical computing http://www.r-project.org/ (January 2013)

  24. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923

    Article  Google Scholar 

  25. Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nataliia Pobiedina.

Additional information

This is an extended and enhanced version of the results published in [1].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pobiedina, N., Ichise, R. Citation count prediction as a link prediction problem. Appl Intell 44, 252–268 (2016). https://doi.org/10.1007/s10489-015-0657-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0657-y

Keywords

Navigation