Predicting citation patterns: defining and determining influence

Brizan, David Guy; Gallagher, Kevin; Jahangir, Arnab; Brown, Theodore

doi:10.1007/s11192-016-1950-1

Predicting citation patterns: defining and determining influence

Published: 03 May 2016

Volume 108, pages 183–200, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

David Guy Brizan¹,
Kevin Gallagher²,
Arnab Jahangir³ &
…
Theodore Brown¹

973 Accesses
12 Citations
2 Altmetric
Explore all metrics

Abstract

Definitions for influence in bibliometrics are surveyed and expanded upon in this work. On data composed of the union of DBLP and CiteSeer^x, approximately 6 million publications, a relatively small number of features are developed to describe the set, including loyalty and community longevity, two novel features. These features are successfully used to predict the influential set of papers in a series of machine learning experiments. The most predictive features are highlighted and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

Plagiarism in research

Article 04 July 2014

The journal coverage of Web of Science and Scopus: a comparative analysis

Article 19 October 2015

References

Bollacker, K. D., Lawrence, S., & Giles, C. L. (1998). CiteSeer: An autonomous web agent for automatic retrieval and identification of interesting publications. In Proceedings of the second international conference on Autonomous agents (pp. 116–123).
Catalini, C., Lacetera, N., & Oettl, A. (2015). The incidence and role of negative citations in science. Proceedings of the National Academy of Sciences, 112(45), 13823–13826.
Article Google Scholar
Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69(1), 131–152.
Article MathSciNet Google Scholar
Giles, C. L., Bollacker, K. D., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In Proceedings of the third ACM conference on digital libraries (pp. 89–98).
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.
Article Google Scholar
Haslam, N., Ban, L., Kaufmann, L., Loughnan, S., Peters, K., Whelan, J., et al. (2008). What makes an article influential? Predicting impact in social and personality psychology. Scientometrics, 76(1), 169–185.
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.
Article Google Scholar
Hirsch, J. E. (2007). Does the h index have predictive power? Proceedings of the National Academy of Sciences, 104(49), 19193–19198.
Article Google Scholar
Judge, T. A., Cable, D. M., Colbert, A. E., & Rynes, S. L. (2007). What causes a management article to be citedarticle, author, or journal? Academy of Management Journal, 50(3), 491–506.
Article Google Scholar
Lawrence, D. F. U., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85(1), 257–270.
Article Google Scholar
Ley, M. (2002) The DBLP computer science bibliography: Evolution, research issues, perspectives. In String processing and information retrieval (pp. 1–10).
Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Sciences, 16(12), 317–323.
Merton, R. K. (1968). The Matthew effect in science. Science, 159(3810), 56–63.
Article Google Scholar
Mitra, P. (2006). Hirsch-type indices for ranking institutions scientific research output. Current Science, 91(11), 1439.
Google Scholar
Newman, M. E. J. (2009). The first-mover advantage in scientific publication. EPL (Europhysics Letters), 86(6), 68001.
Newman, M. E. J. (2014). Prediction of highly cited papers. EPL (Europhysics Letters), 105(2), 28002.
Price, D. J. de Solla (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Rossiter, M. W. (1993). The Matthew Matilda effect in science. Social Studies of Science, 23(2), 325–341.
Article Google Scholar
Schubert, A., Korn, A., & Telcs, A. (2008). Hirsch-type indices for characterizing networks. Scientometrics, 78(2), 375–382.
Article Google Scholar
Sher, I. H., & Garfield, E. (1965). New tools for improving and evaluating the effectiveness of research. In Research program effectiveness, proceedings of the conference sponsored by the Office of Naval Research, Washington, DC (pp. 135–146).
Shi, X., Tseng, B., & Adamic, L. A. (2009). Information diffusion in computer science citation networks. arXiv preprint arXiv:0905.2636.
Steinbach, M., Karypis, G., & Kumar, V. (2000). A comparison of document clustering techniques. In KDD workshop on text mining (Vol. 400, No. 1, pp. 525–526).
Tscharntke, T., Hochberg, M. E., Rand, T. A., Resh, V. H., & Krauss, J. (2007). Author sequence and credit for contributions in multiauthored publications. PLoS Biol, 5(1), e18.
Article Google Scholar
Van Dalen, H. P., & Henkens, K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50(3), 455–482.
Article Google Scholar
Van Raan, A. F. J. (2004). Sleeping beauties in science. Scientometrics, 59(3), 467–472.
Article Google Scholar

Download references

Acknowledgments

This research was supported, in part, under National Science Foundation Grants CNS-0958379, CNS-0855217, ACI-1126113 and the City University of New York High Performance Computing Center at the College of Staten Island. The authors also acknowledge the Office of Information Technology at The Graduate Center, CUNY for providing database and server resources that have contributed to the research results reported within this paper. URL: http://it.gc.cuny.edu/.

Author information

Authors and Affiliations

Department of Computer Science, CUNY and CUNY Graduate Center, 365 Fifth Ave, New York, NY, 10016, USA
David Guy Brizan & Theodore Brown
Department of Computer Science, NYU Tandon School of Engineering, 6 MetroTech Center, Brooklyn, NY, 11201, USA
Kevin Gallagher
Department of Computer Science, Hunter College CUNY, 695 Park Avenue, New York, NY, 10065, USA
Arnab Jahangir

Authors

David Guy Brizan
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Gallagher
View author publications
You can also search for this author in PubMed Google Scholar
Arnab Jahangir
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Brown
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Guy Brizan.

Appendices

Appendix 1: Features

Table 2 lists all 48 features used in our system. We consider different functionals (example, min or max of a set of numbers) to be different features.

Table 2 Model features

Full size table

Appendix 2: Clustering performance

Table 3 shows the different aliases for the Quantitative Evaluation of Systems (QEST) conference. We chose this conference because of its relatively small number of entries but its relatively high number of aliases. ID numbers uniquely identify an alias within our database. Lines separate clusters of aliases. Note that there is one large cluster of 15 aliases and many clusters with a single alias.

Table 3 Aliases for the QEST community

Full size table

Because none of the clusters have aliases belonging to other conferences, the purity of each cluster and of the set of clusters is 1.0. The entropy of this set of clusters is 0.0269, slightly higher than that of the other communities we sampled (0.0231).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brizan, D.G., Gallagher, K., Jahangir, A. et al. Predicting citation patterns: defining and determining influence. Scientometrics 108, 183–200 (2016). https://doi.org/10.1007/s11192-016-1950-1

Download citation

Received: 23 November 2015
Published: 03 May 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s11192-016-1950-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting citation patterns: defining and determining influence

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Plagiarism in research

The journal coverage of Web of Science and Scopus: a comparative analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Features

Appendix 2: Clustering performance

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting citation patterns: defining and determining influence

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Plagiarism in research

The journal coverage of Web of Science and Scopus: a comparative analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Features

Appendix 2: Clustering performance

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation