Skip to main content
Log in

Discovery of factors influencing citation impact based on a soft fuzzy rough set model

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this paper, the machine learning tools were used to identify key features influencing citation impact. Both the papers’ external and quality information were considered in constructing papers’ feature space. Based on the feature space, the soft fuzzy rough set was used to generate a series of associated feature subsets. Then, the KNN classifier was used to find the feature subset with the best classification performance. The results show that citation impact could be predicted by objectively assessed factors. Both the papers’ quality and external features, mainly represented as the reputation of the first author, are contributed to future citation impact.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aksnes, D. W. (2003). Characteristics of highly cited papers. Research Evaluation, 12(3), 159–170.

    Article  Google Scholar 

  • Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.

    Article  Google Scholar 

  • Burrell, Q. L. (2001). Stochastic modelling of the first-citation distribution. Scientometrics, 52, 3–12.

    Article  Google Scholar 

  • Burrell, Q. L. (2002a). On the nth-citation distribution and obsolescence. Scientometrics, 53, 309–323.

    Article  Google Scholar 

  • Burrell, Q. L. (2002b). Will this paper ever be cited? Journal of the American Society for Information Science and Technology, 53, 232–235.

    Article  Google Scholar 

  • Burrell, Q. L. (2003). Predicting future citation behavior. Journal of the American Society for Information Science and Technology, 54(5), 372–378.

    Article  Google Scholar 

  • Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.

    Article  Google Scholar 

  • Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transaction on Information Theory, IT-13(1), 21–27.

    Google Scholar 

  • Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.

    Article  Google Scholar 

  • Dubois, D., & Prade, H. (1990). Rough fuzzy sets and fuzzy rough sets. General Systems, 17, 191–209.

    Article  MATH  Google Scholar 

  • Fu, L., & Aliferis, C. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85, 257–270.

    Article  Google Scholar 

  • Glänzel, W. (1997). On the reliability of predictions based on stochastic citation processes. Scientometrics, 40(3), 481–492.

    Article  Google Scholar 

  • Glänzel, W., Rinia, E. J., & Brocken, M. G. M. (1995). A bibliometric study of highly cited European physics papers in the 80s. Research Evaluation, 5(2), 113–122.

    Google Scholar 

  • Glänzel, W., & Schubert, A. (1995). Predictive aspects of a stochastic model for citation processes. Information Processing and Management, 31(1), 69–80.

    Google Scholar 

  • Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better late than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.

    Article  Google Scholar 

  • Hewings, A., Lillis, T., & Vladimirou, D. (2010). Who’s citing whose writings? A corpus based study of citations as interpersonal resource in English medium national and English medium international journals. Journal of English for Academic Purposes, 9(2), 102–115.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 16569–16572.

    Article  Google Scholar 

  • Hu, Q. H., An, S., & Yu, D. R. (2010). Soft fuzzy rough sets for robust feature evaluation and selection. Information Sciences, 180, 4384–4400.

    Article  MathSciNet  Google Scholar 

  • Kim, K. (2004). The motivation for citing specific references by social scientists in Korea: The phenomenon of co-existing references. Scientometrics, 59(1), 79–93.

    Article  Google Scholar 

  • Laband, D. N., & Piette, M. J. (1994). Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors. Journal of Political Economy, 102, 194–203.

    Article  Google Scholar 

  • Levitt, J. M., & Thelwall, M. (2008). Is multidisciplinary research more highly cited? A macrolevel study. Journal of the American Society for Information Science and Technology, 59(12), 1973–1984.

    Article  Google Scholar 

  • Levitt, J. M., & Thelwall, M. (2009). The most highly cited Library and Information Science articles: Interdisciplinarity, first authors and citation patterns. Scientometrics, 78(1), 45–67.

    Article  Google Scholar 

  • Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.

    Article  Google Scholar 

  • Penas, C. S., & Willett, P. (2006). Gender differences in publication and citation counts in librarianship and information science research. Journal of Information Science, 32(5), 480–485.

    Article  Google Scholar 

  • Rong, T., & Martin, A. S. (2008). Author-rated importance of cited references in biology and psychology publications. Journal of Documentation, 64(2), 246–272.

    Article  Google Scholar 

  • Sagi, I., & Yechiam, E. (2008). Amusing titles in scientific journals and article citation. Journal of Information Science, 34(5), 680–687.

    Article  Google Scholar 

  • Van Dalen, H. P., & Henkens, K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50, 455–482.

    Article  Google Scholar 

  • Van Dalen, H. P., & Henkens, K. (2005). Signals in science-on the importance of signaling in gaining attention in science. Scientometrics, 64(2), 209–233.

    Article  Google Scholar 

  • Wang, M. Y., Yu, G., & Yu, D. R. (2011). Mining typical features for highly cited papers. Scientometrics, 87(3), 695–706.

    Article  Google Scholar 

  • Xia, J. F., Myers, R. L., & Wihoite, S. K. (2011). Multiple open access availability and citation impact. Journal of Information Science, 37(1), 19–28.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant Nos. 71003020; 70973031), the special funds of Central College Basic Scientific Research Bursary (Grant No. DL11CB09), and the Postdoctoral Science Foundation of Heilongjiang Province.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyang Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Yu, G., An, S. et al. Discovery of factors influencing citation impact based on a soft fuzzy rough set model. Scientometrics 93, 635–644 (2012). https://doi.org/10.1007/s11192-012-0766-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-012-0766-x

Keywords

Navigation