Abstract
In this paper, the machine learning tools were used to identify key features influencing citation impact. Both the papers’ external and quality information were considered in constructing papers’ feature space. Based on the feature space, the soft fuzzy rough set was used to generate a series of associated feature subsets. Then, the KNN classifier was used to find the feature subset with the best classification performance. The results show that citation impact could be predicted by objectively assessed factors. Both the papers’ quality and external features, mainly represented as the reputation of the first author, are contributed to future citation impact.
Similar content being viewed by others
References
Aksnes, D. W. (2003). Characteristics of highly cited papers. Research Evaluation, 12(3), 159–170.
Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Burrell, Q. L. (2001). Stochastic modelling of the first-citation distribution. Scientometrics, 52, 3–12.
Burrell, Q. L. (2002a). On the nth-citation distribution and obsolescence. Scientometrics, 53, 309–323.
Burrell, Q. L. (2002b). Will this paper ever be cited? Journal of the American Society for Information Science and Technology, 53, 232–235.
Burrell, Q. L. (2003). Predicting future citation behavior. Journal of the American Society for Information Science and Technology, 54(5), 372–378.
Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transaction on Information Theory, IT-13(1), 21–27.
Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.
Dubois, D., & Prade, H. (1990). Rough fuzzy sets and fuzzy rough sets. General Systems, 17, 191–209.
Fu, L., & Aliferis, C. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85, 257–270.
Glänzel, W. (1997). On the reliability of predictions based on stochastic citation processes. Scientometrics, 40(3), 481–492.
Glänzel, W., Rinia, E. J., & Brocken, M. G. M. (1995). A bibliometric study of highly cited European physics papers in the 80s. Research Evaluation, 5(2), 113–122.
Glänzel, W., & Schubert, A. (1995). Predictive aspects of a stochastic model for citation processes. Information Processing and Management, 31(1), 69–80.
Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better late than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.
Hewings, A., Lillis, T., & Vladimirou, D. (2010). Who’s citing whose writings? A corpus based study of citations as interpersonal resource in English medium national and English medium international journals. Journal of English for Academic Purposes, 9(2), 102–115.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 16569–16572.
Hu, Q. H., An, S., & Yu, D. R. (2010). Soft fuzzy rough sets for robust feature evaluation and selection. Information Sciences, 180, 4384–4400.
Kim, K. (2004). The motivation for citing specific references by social scientists in Korea: The phenomenon of co-existing references. Scientometrics, 59(1), 79–93.
Laband, D. N., & Piette, M. J. (1994). Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors. Journal of Political Economy, 102, 194–203.
Levitt, J. M., & Thelwall, M. (2008). Is multidisciplinary research more highly cited? A macrolevel study. Journal of the American Society for Information Science and Technology, 59(12), 1973–1984.
Levitt, J. M., & Thelwall, M. (2009). The most highly cited Library and Information Science articles: Interdisciplinarity, first authors and citation patterns. Scientometrics, 78(1), 45–67.
Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.
Penas, C. S., & Willett, P. (2006). Gender differences in publication and citation counts in librarianship and information science research. Journal of Information Science, 32(5), 480–485.
Rong, T., & Martin, A. S. (2008). Author-rated importance of cited references in biology and psychology publications. Journal of Documentation, 64(2), 246–272.
Sagi, I., & Yechiam, E. (2008). Amusing titles in scientific journals and article citation. Journal of Information Science, 34(5), 680–687.
Van Dalen, H. P., & Henkens, K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50, 455–482.
Van Dalen, H. P., & Henkens, K. (2005). Signals in science-on the importance of signaling in gaining attention in science. Scientometrics, 64(2), 209–233.
Wang, M. Y., Yu, G., & Yu, D. R. (2011). Mining typical features for highly cited papers. Scientometrics, 87(3), 695–706.
Xia, J. F., Myers, R. L., & Wihoite, S. K. (2011). Multiple open access availability and citation impact. Journal of Information Science, 37(1), 19–28.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant Nos. 71003020; 70973031), the special funds of Central College Basic Scientific Research Bursary (Grant No. DL11CB09), and the Postdoctoral Science Foundation of Heilongjiang Province.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, M., Yu, G., An, S. et al. Discovery of factors influencing citation impact based on a soft fuzzy rough set model. Scientometrics 93, 635–644 (2012). https://doi.org/10.1007/s11192-012-0766-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-012-0766-x