Skip to main content
Log in

The rise of hyperprolific authors in computer science: characterization and implications

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this article we study and characterize the phenomenon of the hyperprolific authors, who are the most productive researchers according to a given repository in a specific period of time. Particularly, we are interested in investigating and characterizing a subset of such hyperprolific authors who present a sudden growth in the number of published articles and coauthors, as well as concentrate their publications in a few specific journals, what can be seen as an anomalous behavior. Using data collected from the DBLP repository and covering the last 10 years, we propose a set of discriminative dimensions (features) aimed at characterizing the behavior of hyperprolific authors, ultimately helping to identify anomalous ones. Moreover, using a strategy based on ranking aggregation to identify the most prominent anomalous authors, we demonstrate that the best dimensions to characterize such anomalous behaviors may vary significantly among authors, but it is possible to identify a clear subset of them who present such behavior. Our results show that the top-ranked (most anomalous) authors manifest a distinct behavior from the middle-ranked ones. Indeed, each one of the five most anomalous authors published more than 48 journal articles in 2021 while collaborating with more than 1,000 coauthors in their careers. Specifically, one of such authors published more than 140 articles in just a single journal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. http://uis.unesco.org/apps/visualisations/research-and-development-spending, accessed on April 25, 2022.

  2. https://ncses.nsf.gov/pubs/nsb20206/publication-output-by-region-country-or-economy, accessed on April 25, 2022.

  3. http://dblb.org, accessed on May 14, 2022.

  4. https://scholar.google.com, accessed on May 14, 2022.

  5. https://arxiv.org, accessed on May 14, 2022.

  6. https://pubmed.ncbi.nlm.nih.gov, accessed on May 14, 2022.

  7. https://www.aminer.org, accessed on May 15, 2022.

  8. In this article we may use the words “researcher” and “author” to refer to a same person depending on its role when discussing a specific issue.

  9. We have computed those numbers based on the dataset we have created for our experiments. See Subsection 3.1.

  10. The Empirical Cumulative Distribution Function (ECDF) value for a given point p in the horizontal axis is the fraction of observations of the variable with values less than or equal to p.

  11. https://scholar.google.com/intl/en/scholar/citations.html, accessed on May 16, 2022.

  12. https://www.microsoft.com/en-us/research/project/microsoft-academic-graph, accessed on June 25, 2022.

  13. https://jcr.clarivate.com, accessed on June 19, 2022.

  14. https://incites.help.clarivate.com/Content/Indicators-Handbook/ih-journal-impact-factor.htm, accessed on June 25, 2022.

  15. A mega journal is a type of journal where publishers charge authors, rather than readers, for the article publication.

  16. https://predatory-publishing.com/how-many-predatory-journals-are-there, accessed on June 19, 2022.

  17. https://www.interacademies.org, accessed on May 9, 2022.

  18. Five temporal scenarios \(\times\) four temporal metrics \(\times\) two summarizations \(+\) the entropy \(+\) the publication intensity.

  19. This is the ECDF’s value obtained by subtracting 0.31 (or 31%) from one, which is equivalent to the portion of the reference set’s plot at the right side of the vertical dotted line.

  20. Remind that we use the term hyperproductive for the set of the top most productive researchers of the considered time period.

  21. For the sake of clarity, we limit the graph plot to the 10 top-ranked researchers only. The same reasoning applies to the analysis of the features discussed next.

  22. We consider ties as sharing the same rank.

  23. https://www.andrews.edu/\(\sim\) calkins/math/edrm611/edrm05.htm, accessed on January 1, 2022.

  24. All resources needed to reproduce our experiments, including our source code and dataset, are available at https://github.com/edreqm/raise-of-hiperprolific.

  25. We take the working days in 2021 as a reference and repeat the same number for all the 10 years. We take the 365 year’s days and exclude the weekends (104). To be conservative, we do not exclude the holidays.

  26. To compute the topics, we first concatenate all the publications’ titles to create a document for each author. Then, we eliminate all the adverbs, non-English, and stop words. Finally, we applied the CluWords algorithm to discover the 12 topics and the top 10 words describing them. We chose 12 as the number of topics to resemble the 12 Computer Science subfields defined in https://en.wikipedia.org/wiki/Outline_of_computer_science, accessed on December 11, 2022).

  27. https://pubmed.ncbi.nlm.nih.gov, accessed on June 27, 2022

References

  • Antkare, I. (2020). Ike Antkare, His Publications, and Those of His Disciples. In: Biagioli M, Lippman A (eds) Gaming the metrics: Misconduct and manipulation in academic research. MIT Press, chap 14, p 177–200

  • Berghel, H. (2022). A Collapsing Academy, Part III: Scientometrics and Metric Mania. Computer, 55(3), 117–123. https://doi.org/10.1109/MC.2022.3142542

    Article  Google Scholar 

  • Biagioli, M. (2016). Watch out for cheats in citation game. Nature News, 535(7611), 201. https://doi.org/10.1038/535201a

    Article  Google Scholar 

  • Biagioli, M., & Lippman, A. (Eds.). (2020). Gaming the metrics: Misconduct and manipulation in academic research. MIT Press.

    Google Scholar 

  • Biagioli, M., & Lippman, A. (2020). Introduction: Metrics and the new ecologies of academic misconduct. In A. Lippman (Ed.), Biagioli M (pp. 1–23). Gaming the metrics: Misconduct and manipulation in academic research. MIT Press.

    Google Scholar 

  • Björk, B. C. (2015). Have the “mega-journals’’ reached the limits to growth? PeerJ, 3, e981. https://doi.org/10.7717/peerj.981

    Article  Google Scholar 

  • Björk, B. C. (2018). Evolution of the scholarly mega-journal, 2006–2017. PeerJ, 6, e4357. https://doi.org/10.7717/peerj.4357

    Article  Google Scholar 

  • Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222. https://doi.org/10.1002/asi.23329

    Article  Google Scholar 

  • Butler, D. (2008). Free journal-ranking tool enters citation market. Nature 451(7174)(3). https://doi.org/10.1038/451006a

  • Chorus, C., & Waltman, L. (2016). A large-scale analysis of impact factor biased journal self-citations. PLoS One, 11(8), e0161,021.

    Article  Google Scholar 

  • Dwork, C., Kumar, R., & Naor, M., et al. (2001). Rank Aggregation Methods for the Web. In: Proceedings of the Tenth International Conference on the World Wide Web, WWW 10, Hong Kong, China, May 1-5, 2001, pp 613–622, https://doi.org/10.1145/371920.372165

  • Elmore, S. A., & Weston, E. H. (2020). Predatory journals: What they are and how to avoid them. Toxicologic Pathology, 48(4), 607–610.

    Article  Google Scholar 

  • Fanelli, D. (2020). Pressures to publish: What effects do we see? In: Biagioli M, Lippman A (eds) Gaming the metrics: Misconduct and manipulation in academic research. MIT Press, chap 8, p 111–122

  • Fire, M., & Guestrin, C. (2019). Over-optimization of academic publishing metrics: observing Goodhart’s Law in action. GigaScience, 8(6), 1–20. https://doi.org/10.1093/gigascience/giz053

    Article  Google Scholar 

  • Garfield, E. (1999). Journal impact factor: A brief review. Canadian Medical Association Journal, 161(8), 979–980.

    Google Scholar 

  • Grudniewicz, A., Moher, D., & Cobey, K.D., et al. (2019). Predatory journals: no definition, no defence

  • Guaspare, C., & Didier, E. (2020). The Voinnet Affair: Testing the Norms of Scientific Image Management. In: Biagioli M, Lippman A (eds) Gaming the metrics: Misconduct and manipulation in academic research. MIT Press, chap 12, p 157–167

  • Helmer, S., Blumenthal, D. B., & Paschen, K. (2020). What is meaningful research and how should we measure it? Scientometrics, 125(1), 153–169.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16,569-16,572.

    Article  MATH  Google Scholar 

  • IAP (2022) Combatting Predatory Academic Journals and Conferences (Full Report in English). The InterAcademy Partnership (IAP), accessed on May 20, 2022

  • Ioannidis, J. P., Klavans, R., & Boyack, K. W. (2018). The scientists who publish a paper every five days. Nature, 561, 167–169. https://doi.org/10.1038/d41586-018-06185-8

    Article  Google Scholar 

  • Kojaku, S., Livan, G., & Masuda, N. (2021). Detecting anomalous citation groups in journal networks. Scientific Reports, 11(1), 1–11.

    Article  Google Scholar 

  • Ley, M. (2009). DBLP—Some Lessons Learned. Proceedings of the VLDB Endowment, 2(2), 1493–1500. https://doi.org/10.14778/1687553.1687577

    Article  Google Scholar 

  • Li, W., Aste, T., Caccioli, F., et al. (2019). Early coauthorship with top scientists predicts success in academic careers. Nature communications, 10(1), 1–9.

    Article  Google Scholar 

  • Lima, H., Silva, T. H. P., Moro, M. M., et al. (2015). Assessing the profile of top Brazilian computer science researchers. Scientometrics, 103(3), 879–896. https://doi.org/10.1007/s11192-015-1569-7

    Article  Google Scholar 

  • Oravec, J. A. (2019). The “Dark Side’’ of Academics? Emerging issues in the gaming and manipulation of metrics in higher education. The Review of Higher Education, 42(3), 859–877.

    Article  Google Scholar 

  • Pan, R. K., Petersen, A. M., Pammolli, F., et al. (2018). The memory of science: Inflation, myopia, and the knowledge network. Journal of Informetrics, 12(3), 656–678.

    Article  Google Scholar 

  • Perez, O., Bar-Ilan, J., Cohen, R., et al. (2019). The network of law reviews: Citation cartels, scientific communities, and journal rankings. The Modern Law Review, 82(2), 240–268.

    Article  Google Scholar 

  • Petersen, A. M. (2015). Quantifying the impact of weak, strong, and super ties in scientific careers. Proceedings of the National Academy of Sciences, 112(34), E4671–E4680.

    Article  Google Scholar 

  • Pinto, Â. P., Mejdalani, G., Mounce, R., et al. (2021). Are publications on zoological taxonomy under attack? Royal Society Open Science, 8(2), 201,617-201,617.

    Article  Google Scholar 

  • Sinha, A., Shen, Z., & Song, Y., et al. (2015). An Overview of Microsoft Academic Service (MAS) and Applications. In: Proceedings of the 24th International Conference on the World Wide Web, pp 243–246, https://doi.org/10.1145/2740908.2742839

  • Sismondo, S. (2020). Ghost-Managing and Gaming Pharmaceutical Knowledge. In: Biagioli M, Lippman A (eds) Gaming the metrics: Misconduct and manipulation in academic research. MIT Press, chap 9, p 123–133

  • Spearman, C. (2010). The proof and measurement of association between two things. International Journal of Epidemiology, 39(5), 1137–1150. https://doi.org/10.2307/1422689

    Article  Google Scholar 

  • Tang, J., Zhang, J., & Yao, L., et al. (2008). Arnetminer: Extraction and Mining of Academic Social Networks. In: Proceedings of the 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, pp 990–998, https://doi.org/10.1145/1401890.1402008

  • Viegas, F., Canuto, S., & Gomes, C., et al. (2019). CluWords: exploiting semantic word clustering representation for enhanced topic modeling. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp 753–761

  • Viegas, F., Cunha, W., & Gomes, C., et al. (2020). Cluhtm - semantic hierarchical topic modeling based on cluwords. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 8138–8150, https://doi.org/10.18653/v1/2020.acl-main.724, https://doi.org/10.18653/v1/2020.acl-main.724

  • Viegas, F., Júnior, A. P. D. S., Cecilio, P., et al. (2022). Semantic academic profiler (SAP): A framework for researcher assessment based on semantic topic modeling. Scientometrics, 127(8), 5005–5026. https://doi.org/10.1007/s11192-022-04449-9

    Article  Google Scholar 

  • Von Bergen, C.W., & Bressler, M.S. (2017). Academe’s Unspoken Ethical Dilemma: Author Inflation in Higher Education. Research in Higher Education Journal 32

  • Wang, K., Shen, Z., Huang, C., et al. (2019). A Review of Microsoft Academic Services for Science of Science Studies. Frontiers in Big Data, 2,. https://doi.org/10.3389/fdata.2019.00045

  • Wasserman, L. (2005). All of statistics: A concise course in statistical inference (1st ed.). Springer.

    MATH  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the authors individual research grants from CAPES, CNPq and FAPEMIG, and by the projects MASWeb and INCT-Cyber.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edré Moreira.

Ethics declarations

Conflict of interest

the authors have no financial or non-financial interests to disclose.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moreira, E., Meira, W., Gonçalves, M.A. et al. The rise of hyperprolific authors in computer science: characterization and implications. Scientometrics 128, 2945–2974 (2023). https://doi.org/10.1007/s11192-023-04676-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-023-04676-8

Keywords

Navigation