Abstract
Predatory journals have been a recent phenomenon, drawing attention from the academic community in the last decade. However, as the open access (OA) movement has gained momentum, the indiscriminate growth of predatory journals has had significant negative impacts on academic communication, scholarly publishing, and effective utilization of scientific resources. This rampant growth poses a serious threat to the healthy development of the OA movement and also undermines the integrity of research and the research ecosystem. Identifying predatory journals from the massive number of OA journals would assist scholars in evading negative consequences in areas of monetary investment, reputation, academic influence, and occupational advancement. Traditional methods for identifying predatory journals have relied heavily on the knowledge of domain experts. However, a large number of predatory journals exhibit latent and covert characteristics, and the growth rate of OA journals is extremely rapid, making it difficult for experts to identify these predatory journals from the vast number of OA journals. This paper proposes an interpretable machine learning model for early warning of predatory OA journals, which identifies predatory journals through the ensemble of multiple machine learning algorithms. Specifically, the proposed methodology first constructs an OA journal early warning indicator system and integrates multiple machine learning algorithms to compute the early warning values of OA journals. Then, the SHAP interpretable framework is introduced to analyze the causal factors of the early warning risks in a novel way. To verify the accuracy of the model's causal factors, we conduct a comparative analysis of domestic and foreign medical OA journals using case studies. The empirical analysis conducted in this study demonstrates the efficacy of the ensemble algorithm in accurately identifying the risk of predatory OA journals.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11192-024-04969-6/MediaObjects/11192_2024_4969_Fig9_HTML.png)
Similar content being viewed by others
References
Abp, A., Am, A., Ht, A., Sd, B., & Akm, A. (2020). Toward safer highways, application of xgboost and shap for real-time accident detection and feature analysis. Accident Analysis & Prevention, 136, 2313–2349.
Ahmad, S., & Waris, A. (2017). Comparison among selected journal quality indicators of mechanical engineering journals. Journal of Scientometric Research, 6(3), 151–158.
beda Sánchez, A. M., FernándezCano, A., & Callejas, Z. (2019). Using evaluative indicators of scientific journals to identify emergent research fronts in special education. Luis Gómez Chova, (pp. 3394–3403).
Beranová, L., Joachimiak, M. P., Kliegr, T., et al. (2022). Why was this cited? Explainable machine learning applied to COVID-19 research literature. Scientometrics, 127, 2313–2349. https://doi.org/10.1007/s11192-022-04314-9
Bohannon, J. (2013). Who’s afraid of peer review? Science, 342(6154), 60–65.
Bornmann, L., & Daniel, H. D. (2005). Does the h-index for ranking of scientists really work? Scientometrics, 65(3), 391–392.
Butler, D. (2008). Free journal-ranking tool enters citation market. Nature, 451(7174), 6.
Butler, D. (2013). Investigating journals: the dark side of publishing. Nature, 495(7442), 433–435.
Cantín, M., Muñoz, M., & Roa, I. (2015). Comparison between impact factor, eigenfactor score, and scimago journal rank indicator in anatomy and morphology journals. International Journal of Morphology, 33(3), 1183–1188.
Cheng, W., & Ren, S. (2016). Investigation on article processing charge for OA papers from the world’s major countries. Chinese Science Bulletin, 61(26), 2861–2868.
Clarivate. (2022). Journal Citation Reports. Retrieved July 31, 2022 from https://clarivate.com/zh-hant/news/news-releases-2022-0629/
Clarivate. (2023). Supporting integrity of the scholarly record: Our commitment to curation and selectivity in the Web of Science. Retrieved March 23, 2023 from https://clarivate.com/blog/supporting-integrity-of-the-scholarly-record-our-commitment-to-curation-and-selectivity-in-the-web-of-science/
Dadkhah, M., & Bianciardi, G. (2016). Ranking predatory journals: Solve the problem instead of removing it! Advanced Pharmaceutical Bulletin, 6(1), 1–4. https://doi.org/10.15171/apb.2016.001
Dai, Q., & Yuan, X. (2018). Academic reputation risk analysis and early warning research of open access journals. Chinese Journal of Scientific and Technical Periodical, 29(11), 1063–1071.
Ding, H., & Ruan, J. L. (2022). Exploring the factors influencing LIS scholars citing other’s works: An empirical research based on algorithmic attribution. Document, Information & Knowledge, 39(02), 83–97.
DOAJ. Directory of open access journals. Retrieved July 31, 2022 from https://doaj.org/
Dong, X., & Bollen, J. (2015). Computational models of consumer confidence from large-scale online attention data: crowd-sourcing econometrics. Plos One, 10(3), e0120039.
Falagas, M. E., Kouranos, V. D., Arencibia-Jorge, R., & Karageorgopoulos, D. E. (2008). Comparison of scimago journal rank indicator with journal impact factor. The FASEB Journal, 22(8), 2623–2628.
Fang, H. L. (2018). Comparison of cited half-life between Chinese and international SCI journals. Chinese Journal of Scientific and Technical Periodicals, 29(09), 935–939.
Feng, D., & Wu, G. (2022). Interpretable machine learning-based modeling approach for fundamental properties of concrete structures. Journal of Building Structures, 43(4), 228–238.
Fu, Z. K., Liu, B. X., Zhou, Z. Y., & Peng, Q. N. (2022). Research on patent quality analysis and classification prediction based on ensemble learning. Journal of Intelligence, 10, 89–96.
Garfield, E. (1955). Citation indexes for science: a new dimension in documentation through association of ideas. Science, 122(3159), 108–111.
Halim, Z., & Khan, S. (2019). A data science-based framework to categorize academic journals. Scientometrics, 119, 393–423. https://doi.org/10.1007/s11192-019-03035-w
He, Y., & Xu, X. (2022). Empirical study on quality evaluation of OA journals: A comparative analysis of double-blind and open review modes. Chinese Journal of Scientific and Technical Periodical, 33(3), 305–310.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Ences of the United States of America, 102(46), 16569–16572.
Hu, D. H., Ren, L., & Han, H. (2010). Quality control mechanisms for open access journals: A PLoS The Chinese Academy of Sciences study. Chinese Journal of Scientific and Technical Periodicals, 4, 4.
Huang, Y. Q., Liang, C. H., He, L., et al. (2016). Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. Journal of Clinical Oncology, 34(18), 2157.
Ibrahim, M., Louie, M., Modarres, C., & Paisley, J. (2019). Global explanations of neural networks: Mapping the landscape of predictions. http://arxiv.org/abs/arXiv:1902.02384
Jaafar, R., Pereira, V., Saab, S. S., & El-Kassar, A. N. (2021). Which journal ranking list? A case study in business and economics. Euromed Journal of Business, 16(4), 361–380. https://doi.org/10.1108/emjb-05-2020-0039
John, T. (2019). FTC hits predatory scientific publisher with a $50 million fine. Retrieved July 31, 2022 from https://arstechnica.com/science/2019/04/ftc-hits-predatory-scientific-publisher-with-a-50-million-fine/
John, M., & Liying, Y. (2017). Evaluating journal quality: A review of journal citation indicators and ranking in business and management. European Journal of Operational Research, 257(1), 323–337.
Li, X., Chen, Y., & Zhao, Y. (2022). Analysis and enlightenment of international high risk academic journals: A case study of early warning journals released by Chinese Academy of Sciences. Journal of Library and Information Science, 7(4), 67–73.
Li, J., Fang, Y., Sun, Y., & Han, L. (2020). Analysis of challenges and governance countermeasures of scientific research integrity in biomedical field based on retraction data. Bulletin of National Natural Science Foundation of China, 34(3), 305–310.
Lin, Y., Gan, H., Mo, L., & Bian, D. (2020). International impact analysis of the Chinese science and technology periodicals on the top list for seven consecutive years from 2011 to 2017 from the perspective of bibliometrics. Journal of Navy Medicine, 41(6), 741–747.
Lin, Z. (2021). Evolution of large comprehensive oversea open access scientific journal and enlightenment on the establishment of similar journals in China. Acta Editologica, 33(1), 114–118.
Liu, X. L., Fang, H. L., Zhou, Z. X., Dong, J. J., & Sheng, L. N. (2011). Controll study of bibliometrics characteristic in Chinese scientific and technologic journals with different self-cited rates. Acta Editologica, 23(1), 4.
Luan, M., Sun, D., Li, Z., & Zhu, R. (2020). Terrorism risk prediction model based on GRA-SVR—Taking “the Belt and Road” as an Example. Journal of Intelligence, 39(3), 37–41.
Lundberg, S., & Lee, S. I. (2017). A unified approach to interpreting model predictions. http://arxiv.org/abs/arXiv:1705.07874
Ma, Y., Han, Y. K., Chen, M. S., & Che, Y. Q. (2022). Study on dynamic evaluation of sci-tech journals based on time series model. Applied Sciences-Basel, 12(24), 26. https://doi.org/10.3390/app122412864
Mo, J., & Ma, J. H. (2012). Quality evaluation and problems of chinese science and technology journals—Based on Scientists’ Questionnaire Survey. Chinese Journal of Scientific and Technical Periodicals, 23(6), 8.
Moed, H. F. (2011). The source normalized impact per paper is a valid and sophisticated indicator of journal citation impact. Journal of the Association for Information Science & Technology, 62(1), 211–213.
National Science Library, Chinese Academy of Sciences. (2020). Early warning list of international journals (trial). Retrieved July 31, 2022 from https://earlywarning.fenqubiao.com
Normile, D. Big-name scientists surprised to find themselves on journal board. Retrieved July 31, 2022 from https://www.science.org/content/article/big-name-scientists-surprised-find-themselves-journal-board
Paji, D. (2015). On the stability of citation-based journal rankings. Journal of Informetrics, 9(4), 990–1006.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: explaining the predictions of any classifier. ACM. doi, 10(1145/2939672), 2939778.
Shapley, L. S. (1953). A value for n-person games. Princeton University Press.
Su, Lx., Lyu, Ph., Yang, Z., et al. (2015). Scientometric cognitive and evaluation on smart city related construction and building journals data. Scientometrics, 105, 449–470. https://doi.org/10.1007/s11192-015-1697-0
Sun, R., An, L., & Li, G. (2022). Patent value prediction based on multi-feature fusion—Taking 5G technology as a case. Journal of Modern Information, 11, 87–96.
The Paper. (2021). Nearly a year after the two offices issued a document, many universities have established a “negative list” of periodicals. Retrieved July 31, 2022 from https://baijiahao.baidu.com/s?id=1630860580442462650
Tian, Y. P., Li, G., & Mao, J. (2023). Predicting the evolution of scientific communities by interpretable machine learning approaches. Journal of Informetrics, 17(2), 20. https://doi.org/10.1016/j.joi.2023.101399
Valderrama, P., Valderrama, A., & Baca, P. (2020). Bibliometric analysis and evaluation of the journal medicina oral patología oral y cirugía bucal (2008–2018). Medicina oral, patologia oral y cirugia bucal,. https://doi.org/10.4317/medoral.23289
Vundavalli, S., Naidu, G., Bhargav, A., Praveen, B. H., & Babburi, S. (2016). Quality of reporting of randomized controlled trials in ten academic indian dental journals. Indian Journal of Dental Research, 27(2), 116.
Wei, M. (2019). Research on impact evaluation of open access journals. Scientometrics, 122(3), 1027–1049.
Wolpert, A. J. (2013). For the sake of inquiry and knowledge–the inevitability of open access. New England Journal of Medicine, 368(9), 785–787.
Wu, T., Yang, J., Chen, C., Zhao, J., & Sun, J. L. (2015). Research on comprehensive evaluation indicators of scientific and technological journal citations based on factor analysis. Chinese Journal of Scientific and Technological Periodicals, 26(2), 5.
Yang, H., Tao, X., Du, H., & Xu, L. (2017). Review on quality evaluation methods of open acces journals. Acta Editologica, 29(2), 150–152.
Yu, L. P., & Du, W. (2023). Periodical classfication and its characteristics based on the relationship between timeliness and influence. Information and Documentation Services, 01, 52–61.
Yu, L. P., & Pan, W. B. (2022). Key indicators of journal evaluation based on K-means and PLS-DA. Journal of Library and Information Science in Agriculture, 34(12), 55–64.
Zarifmahmoudi, L., Jamali, J., & Sadeghi, R. (2015). Google scholar journal metrics: Comparison with impact factor and scimago journal rank indicator for nuclear medicine journals. Iranian Journal of Nuclear Medicine, 23(1), 8–14.
Zhang, H., & Huang, S. (2007). Discussion about the evaluation system on OA journals. Journal of Information, 16(3), 124–126.
Zhao, R. Y., & Wang, X. (2019). Evaluation and comparison of influence in international open access journals between China and USA. Scientometrics, 120(3), 1091–1110.
Zhao, T., Dai, T., Lun, Z., & Gao, Y. (2021). An analysis of recently retracted articles by authors affiliated with hospitals in mainland china. Journal of Scholarly Publishing, 52(2), 107–122.
Zong, Z. J. (2022). Characteristics of journals on the early warning list. Journal of Intelligence, 41(12), 8.
Acknowledgements
This work was supported by the China Scholarship Council.
Funding
Funding was provided 2020 Hubei Provincial Social Science Foundation Pre-Funded Projects (Grant No. 20ZD053), Social Science Foundation of Shaanxi Province (Grant No. 19CTQ030).
Author information
Authors and Affiliations
Contributions
MKL: Collected the data, Contributed data or analysis tools, Performed the analysis, Wrote the manuscript.WJH: Comment on the overall framework of the paper, provide article revisions, and offer ideas. LTY: Collected experimental data, redid experiments, and wrote revisions. ZL: Conceived and designed the analysis, Wrote the manuscript and designed the figures, Other contribution.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, J., Liu, T., Mu, K. et al. Identification and causal analysis of predatory open access journals based on interpretable machine learning. Scientometrics 129, 2131–2158 (2024). https://doi.org/10.1007/s11192-024-04969-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-024-04969-6