Skip to main content
Log in

Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Recent studies have shown that the coverage of Scopus and Web of Science (WoS) databases differs substantially. Consequently, the citation counts of a paper are different depending on the database used, making it difficult to apply both together. To address this problem, this paper aims to examine whether the percentile- and stochastic-based approach is effective for converting citation counts between two databases while guaranteeing its time-normalization. For this analysis, we collected a dataset of 326,345 papers, published in 1987–2017 in the top 10% source titles of the following fields: Industrial and Manufacturing Engineering, Aquatic Science, Social Psychology and Archaeology. First, we applied the linear regression model to the citation percentiles of indexed papers in both databases. Secondly, we used the predicted results of this linear dependence, combined with the Monte Carlo simulations, to obtain the probability density function of a percentile from papers in the database in which they are missing. The results indicate that, with the method proposed in this paper, it is possible to convert the citation counts of articles between Scopus and WoS. In addition, it also predicts the citation impact of a missing paper on one of those databases, based on the citation impact on the other database. Tests on subsamples, using Lin’s concordance coefficient, suggest substantial agreement between estimated and real citation values. This allows the combined use of the citation counts of two databases, improving the coverage and accuracy of both bibliometric studies and bibliometric indicators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abramo, G., D’Angelo, C. A., & Soldatenkova, A. (2017). An investigation on the skewness patterns and fractal nature of research productivity distributions at field and discipline level. Journal of Informetrics,11(1), 324–335.

    Article  Google Scholar 

  • Abramo, G., D’Angelo, C. A., & Felici, G. (2019). Predicting publication long-term impact through a combination of early citations and journal impact factor. Journal of Informetrics,13(1), 32–49.

    Article  Google Scholar 

  • Adam, A., Ras, R., Bhattu, A. S., Raman, A., & Perera, M. (2017). "Researching the research" in prostate cancer: A comparative bibliometric analysis of the top 100 cited articles in the field of prostate cancer. Current Urology,11(1), 26–35.

    Article  Google Scholar 

  • Alajmi, B., & Alhaji, T. (2018). Mapping the field of knowledge management: Bibliometric and content analysis of journal of information and knowledge management for the period from 2002 to 2016. Journal of Information and Knowledge Management,17(3), 1850027.

    Article  Google Scholar 

  • Bohl, M. A., Turner, J. D., Little, A. S., Nakaji, P., & Ponce, F. A. (2017). Assessing the relevancy of "Citation Classics" in neurosurgery: Part II foundational papers in neurosurgery. World Neurosurgery,104, 939–966.

    Article  Google Scholar 

  • Bornmann, L. (2013). How to analyze percentile citation impact data meaningfully in bibliometrics: the statistical analysis of distributions, percentile rank classes, and top-cited papers. Journal of the American Society for Information Science and Technology,64(3), 587–595.

    Article  Google Scholar 

  • Bornmann, L. (2014). H-Index research in scientometrics: A summary. Journal of Informetrics,8(3), 749–750.

    Article  Google Scholar 

  • Bornmann, L., & Leydesdorff, L. (2017). Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data. Journal of Informetrics,11(1), 164–175.

    Article  Google Scholar 

  • Bornmann, L., & Leydesdorff, L. (2018). Count highly-cited papers instead of papers with h citations: use normalized citation counts and compare “like with like”! Scientometrics,115(2), 1119–1123.

    Article  Google Scholar 

  • Bornmann, L., & Marx, W. (2014). How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations. Scientometrics,98(1), 487–509.

    Article  Google Scholar 

  • Bornmann, L., & Wohlrabe, K. (2019). Normalisation of citation impact in economics. Scientometrics,120(2), 841–884.

    Article  Google Scholar 

  • Bornmann, L., Leydesdorff, L., & Wang, J. (2013). Which percentile-based approach should be preferred for calculating normalized citation impact values? An empirical comparison of five approaches including a newly developed citation-rank approach (P100). Journal of Informetrics,7(4), 933–944.

    Article  Google Scholar 

  • Brito, R., & Rodríguez-Navarro, A. (2018). Research assessment by percentile-based double rank analysis. Journal of Informetrics,12(1), 315–329.

    Article  Google Scholar 

  • Chen, J. S., Hubbard, S., & Rubin, Y. (2001). Estimating the hydraulic conductivity at the South Oyster Site from geophysical tomographic data using Bayesian techniques based on the normal linear regression model. Water Resources Research,37(6), 1603–1613.

    Article  Google Scholar 

  • Darko, A., & Chan, A. P. C. (2016). Critical analysis of green building research trend in construction journals. Habitat International,57, 53–63.

    Article  Google Scholar 

  • Davies, J., Fortin, N. M., & Lemieux, T. (2017). Wealth inequality: Theory, measurement and decomposition. Canadian Journal of Economics,50(5), 1224–1261.

    Article  Google Scholar 

  • De Groote, S. L., & Raszewski, R. (2012). Coverage of Google Scholar, Scopus, and Web of Science: A case study of the h-index in nursing. Nursing Outlook,60(6), 391–400.

    Article  Google Scholar 

  • Demarest, B., Freeman, G., & Sugimoto, C. R. (2014). The reviewer in the mirror: examining gendered and ethnicized notions of reciprocity in peer review. Scientometrics,101(1), 717–735.

    Article  Google Scholar 

  • Dokur, M., & Uysal, E. (2018). Top 100 cited articles in traumatology: A bibliometric analysis. Turkish Journal of Trauma and Emergency Surgery,24(4), 294–302.

    Google Scholar 

  • Fairclough, R., & Thelwall, M. (2015). More precise methods for national research citation impact comparisons. Journal of Informetrics,9(4), 895–906.

    Article  Google Scholar 

  • Filardo, G., da Graca, B., Sass, D. M., Pollock, B. D., Smith, E. B., & Martinez, M. A.-M. (2016). Trends and comparison of female first authorship in high impact medical journals: observational study (1994–2014). BMJ (Clinical Research Ed.),352, i847.

    Google Scholar 

  • Franceschini, F., Maisano, D., & Mastrogiacomo, L. (2016). Empirical analysis and classification of database errors in Scopus and Web of Science. Journal of Informetrics,10(4), 933–953.

    Article  Google Scholar 

  • Glänzel, W. (2011). The application of characteristic scores and scales to the evaluation and ranking of scientific journals. Journal of Information Science,37(1), 40–48.

    Article  MathSciNet  Google Scholar 

  • Goel, P. K., & DeGroot, M. H. (1980). Only normal distributions have linear posterior expectations in linear regression. Journal of the American Statistical Association,75(372), 895–900.

    Article  MathSciNet  MATH  Google Scholar 

  • Gómez-Núñez, A. J., Vargas-Quesada, B., de Moya-Anegón, F., & Glänzel, W. (2011). Improving SCImago Journal & Country Rank (SJR) subject classification through reference analysis. Scientometrics,89(3), 741–758.

    Article  Google Scholar 

  • González-Betancor, S. M., & Dorta-González, P. (2017). An indicator of the impact of journals based on the percentage of their highly cited publications. Online Information Review,41(3), 398–411.

    Article  Google Scholar 

  • Harzing, A.-W., & Alakangas, S. (2016). Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison. Scientometrics,106(2), 787–804.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America,102(46), 16569–16572.

    Article  MATH  Google Scholar 

  • Hnaien, F., Yalaoui, F., Mhadhbi, A., & Nourelfath, M. (2016). A mixed-integer programming model for integrated production and maintenance. IFAC-PapersOnLine, 49(12), 556–561.

    Article  Google Scholar 

  • Jann, B. (2016). Assessing inequality using percentile shares. Stata Journal,16(2), 264–300.

    Article  Google Scholar 

  • Jiang, Z., Schrank, C., Mariethoz, G., & Cox, M. (2013). Permeability estimation conditioned to geophysical downhole log data in sandstones of the northern Galilee Basin, Queensland: Methods and application. Journal of Applied Geophysics,93, 43–51.

    Article  Google Scholar 

  • Kosteas, V. D. (2018). Predicting long-run citation counts for articles in top economics journals. Scientometrics,115, 1395–1412.

    Article  Google Scholar 

  • Laengle, S., Merigó, J. M., Miranda, J., Słowiński, R., Bomze, I., Borgonovo, E., et al. (2017). Forty years of the European Journal of Operational Research: A bibliometric overview. European Journal of Operational Research,262(3), 803–816.

    Article  MATH  Google Scholar 

  • Li, J., Burnham, J. F., Lemley, T., & Britton, R. M. (2010). Citation analysis: Comparison of Web of Science®, Scopus™, SciFinder®, and Google Scholar. Journal of Electronic Resources in Medical Libraries,7(3), 196–217.

    Article  Google Scholar 

  • Lin, L. I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics,45(1), 255–268.

    Article  MATH  Google Scholar 

  • Martín-Martín, A., Orduna-Malea, E., Thelwall, M., & López-Cózar, E. D. (2018a). Google Scholar, Web of Science, and Scopus: A systematic comparison of citations in 252 subject categories. Journal of Informetrics,12(4), 1160–1177.

    Article  Google Scholar 

  • Martín-Martín, A., Orduna-Malea, E., & López-Cózar, E. (2018b). Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison. Scientometrics,116(3), 2175–2188.

    Article  Google Scholar 

  • McBride, G. B. (2005). A proposal for strength-of-agreement criteria for Lin’s concordance correlation coefficient. In NIWA client report, HAM2005-062.

  • Milojević, S., Radicchi, F., & Bar-Ilan, J. (2017). Citation success index—An intuitive pair-wise journal comparison metric. Journal of Informetrics,11(1), 223–231.

    Article  Google Scholar 

  • Mishra, A. K. (2018). Household income inequality and income mobility: Implications towards equalizing longer-term incomes in India. International Economic Journal,32(2), 271–290.

    Article  Google Scholar 

  • Mishra, A. K., & Kumar, A. (2018). What lies behind income inequality and income mobility in India? Implications and the way forward. International Journal of Social Economics,45(9), 1369–1384.

    Article  Google Scholar 

  • Moed, H. F., Bar-Ilan, J., & Halevi, G. (2016). A new methodology for comparing Google Scholar and Scopus. Journal of Informetrics,10(2), 533–551.

    Article  Google Scholar 

  • Mongeon, P., & Paul-Hus, A. (2016). The journal coverage of Web of Science and Scopus: A comparative analysis. Scientometrics,106(1), 213–228.

    Article  Google Scholar 

  • Palisade. (2016a). StatTools 7.6, Ithaca NY: Palisade Corporation. www.palisade.com

  • Palisade. (2016b). @Risk 7.6, Ithaca NY: Palisade Corporation. www.palisade.com

  • Pan, R. K., Petersen, A. M., Pammolli, F., & Fortunato, S. (2018). The memory of science: Inflation, myopia, and the knowledge network. Journal of Informetrics,12(3), 656–678.

    Article  Google Scholar 

  • Pech, G., & Delgado, C. (2019). Method for comparison of the number of citations from papers in different databases. In 17th International conference and Proceedings on scientometrics and informetrics, ISSI 2019, No. 2, (pp. 2419–2429).

  • Pech, G., Delgado, C., & Vieira, N. (2019). Percentile citation-based method for screening the most highly cited papers in longitudinal bibliometric studies and systematic literature reviews. In 12th Annual conference and Proceedings of the EuroMed Academy of Business, EUROMED 2019, (pp. 911–923).

  • Pesta, B. J. (2018). Bibliometric analysis across eight years 2008–2015 of Intelligence articles: An updating of Wicherts (2009). Intelligence,67, 26–32.

    Article  Google Scholar 

  • Petersen, A. M., Pan, R. K., Pammolli, F., & Fortunato, S. (2019). Methods to account for citation inflation in research evaluation. Research Policy,48(7), 1855–1865.

    Article  Google Scholar 

  • Radicchi, F., & Castellano, C. (2012). A reverse engineering approach to the suppression of citation biases reveals universal properties of citation distributions. PLoS ONE,7(3), e33833.

    Article  Google Scholar 

  • Rodriguez, M. A., & Pepe, A. (2008). On the relationship between the structural and socioacademic communities of a coauthorship network. Journal of Informetrics,2(3), 195–201.

    Article  Google Scholar 

  • Rodríguez-Navarro, A., & Brito, R. (2018). Double rank analysis for research assessment. Journal of Informetrics,12(1), 31–41.

    Article  Google Scholar 

  • Rousseau, R. (2007). The influence of missing publications on the Hirsch index. Journal of Informetrics,1(1), 2–7.

    Article  Google Scholar 

  • Santiago, A. M. A. et al. (2018). Relatório de Autoavaliação Institucional da Universidade do Estado do Rio de Janeiro - Comissão Própria de Avaliação da UERJ—CPA—ano base 2017 (Institutional Self-Evaluation Report of the Rio de Janeiro State University—UERJ Own Evaluation Committee—base year 2017), (p.15).

  • Schulz, J. (2016). Using Monte Carlo simulations to assess the impact of author name disambiguation quality on different bibliometric analyses. Scientometrics,107(3), 1283–1298.

    Article  Google Scholar 

  • Shang, G., Saladin, B., Fry, T., & Donohue, J. (2015). Twenty-six years of operations management research (1985–2010): Authorship patterns and research constituents in eleven top rated journals. International Journal of Production Research,53(20), 6161–6197.

    Article  Google Scholar 

  • Spanos, A. (1995). On normality and the linear regression model. Econometric Reviews,14(2), 195–203.

    Article  MathSciNet  MATH  Google Scholar 

  • Stegehuis, C., Litvak, N., & Waltman, L. (2015). Predicting the long-term citation impact of recent publications. Journal of Informetrics,9(3), 642–657.

    Article  Google Scholar 

  • Thelwall, M. (2016). The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach. Journal of Informetrics,10(1), 110–123.

    Article  Google Scholar 

  • Thelwall, M. (2019). The influence of highly cited papers on field normalised indicators. Scientometrics,118(2), 519–537.

    Article  Google Scholar 

  • Valderrama-Zurián, J.-C., Aguilar-Moya, R., Melero-Fuentes, D., & Aleixandre-Benavent, R. (2015). A systematic analysis of duplicate records in Scopus. Journal of Informetrics,9(3), 570–576.

    Article  Google Scholar 

  • Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics,10(2), 365–391.

    Article  Google Scholar 

  • Waltman, L., & Schreiber, M. (2013). On the calculation of percentile-based bibliometric indicators. Journal of the American Society for Information Science and Technology,64(2), 372–379.

    Article  Google Scholar 

  • Wang, J. (2013). Citation time window choice for research impact evaluation. Scientometrics,94(3), 851–872.

    Article  Google Scholar 

  • Wang, Q., & Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics,10(2), 347–364.

    Article  Google Scholar 

  • Wang, Y., Zeng, A., Fan, Y., & Di, Z. (2019). Ranking scientific publications considering the aging characteristics of citations. Scientometrics,120(1), 155–166.

    Article  Google Scholar 

  • Wildgaard, L., Schneider, J. W., & Larsen, B. (2014). A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics,101(1), 125–158.

    Article  Google Scholar 

  • Yamashita, Y., & Okubo, Y. (2006). Patterns of scientific collaboration between Japan and France: Inter-sectoral analysis using Probabilistic Partnership Index (PPI). Scientometrics,68(2), 303–324.

    Article  Google Scholar 

  • Yeung, A. W. K., Heinrich, M., & Atanasov, A. G. (2018). Ethnopharmacology-A bibliometric analysis of a field of research meandering between medicine and food science? Frontiers in Pharmacology,9, 215.

    Article  Google Scholar 

  • Zhang, Z., Cheng, Y., & Liu, N. C. (2014). Comparison of the effect of mean-based method and z-score for field normalization of citations at the level of Web of Science subject categories. Scientometrics,101(3), 1679–1693.

    Article  Google Scholar 

  • Zhu, H., & Zhu, Q. (2016). Mergers and acquisitions by Chinese firms: A review and comparison with other mergers and acquisitions research in the leading journals. Asia Pacific Journal of Management,33(4), 1107–1149.

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to two anonymous reviewers for their valuable recommendations to improve the manuscript. We acknowledge the support of ERDF—European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation—COMPETE 2020 Programme and the Portuguese funding agency, FCT—Fundação para a Ciência e a Tecnologia within project POCI-01-0145-FEDER-031821.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerson Pech.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pech, G., Delgado, C. Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases. Scientometrics 123, 223–252 (2020). https://doi.org/10.1007/s11192-020-03386-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03386-9

Keywords

Navigation