Mining typical features for highly cited papers

Wang, Mingyang; Yu, Guang; Yu, Daren

doi:10.1007/s11192-011-0366-1

Mining typical features for highly cited papers

Published: 08 March 2011

Volume 87, pages 695–706, (2011)
Cite this article

Scientometrics Aims and scope Submit manuscript

Mingyang Wang^1,2,
Guang Yu¹ &
Daren Yu³

923 Accesses
40 Citations
Explore all metrics

Abstract

In this paper, we discuss the application of the data mining tools to identify typical features for highly cited papers (HCPs). By integrating papers’ external features and quality features, the feature space used to model HCPs was established. Then, a series of predictor teams were extracted from the feature space with rough set reduction framework. Each predictor team was used to construct a base classifier. Then the five base classifiers with the highest classification performance and larger diversity on whole were selected to construct a multi-classifier system (MCS) for HCPs. The combination prediction model obtained better performance than models of a single predictor team. 11 typical prediction features for HCPs were extracted on the basis of the MCS. The findings show that both the papers’ inner quality and external features, mainly represented as the reputation of the authors and journals, contribute to generation of HCPs in future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute-based quality classification of academic papers

Article 30 November 2017

Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods

Article Open access 19 September 2022

Research on the Prediction of Highly Cited Papers Based on PCA-BPNN

References

Ball, M. O., Golden, B. L., & Vohra, R. V. (1989). Finding the most vital arcs in a network. Operations Research Letters, 8(2), 73–76.
Article MATH MathSciNet Google Scholar
Bordley, R. F. (1982). A multiplicative formula for aggregating probability assessments. Management Science, 28, 1137–1148.
Article MATH Google Scholar
Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.
Article Google Scholar
Cao W. G., Xie S. L., & Qiao X. D. (2008) Research on the identification methods of key nodes in supply chain information networks, Logistics: The Emerging Frontiers of Transportation and Development in China: Proceeding of the 8th International Conference of Chinese Logistics and Transportation Professionals 1949–1954.
Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.
Article Google Scholar
Fu, L., & Aliferis, C. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85, 257–270.
Article Google Scholar
Gilbert, N. G. (1977). Referencing as persuasion. Social Studies of Science, 7, 113–122.
Article Google Scholar
Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better later than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.
Article Google Scholar
Hewings, A., Lillis, T., & Vladimirou, D. (2010). Who’s citing whose writings? A corpus based study of citations as interpersonal resource in English medium national and English medium international journals. Journal of English for Academic Purposes, 9(2), 102–115.
Article Google Scholar
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 16569–16572.
Article Google Scholar
Huang, X., Yu, D. R., Hu, Q. H., Wang, H. N., & Cui, Y. M. (2010). Short-term solar flare prediction using predictor teams. Solar Physics, 263(1), 175–184.
Article Google Scholar
Kim, K. (2004). The motivation for citing specific references by social scientists in Korea: The phenomenon of co-existing references. Scientometrics, 59(1), 79–93.
Article Google Scholar
Laband, D. N., & Piette, M. J. (1994). Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors. Journal of Political Economy, 102, 194–203.
Article Google Scholar
Leimu, R., & Koricheva, J. (2005). What determines the citation frequency of ecological papers? Trends in Ecology & Evolution, 20(1), 28–32.
Article Google Scholar
Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.
Article Google Scholar
Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht, The Netherlands: Kluwer Academic Publishers.
MATH Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA.: Morgan Kaufmann.
Google Scholar
Tang, R., & Safer, M. A. (2008). Author-rated importance of cited references in biology and psychology publications. Journal of Documentation, 64(2), 246–272.
Article Google Scholar
Van Dalen, H. P., & Henkens, H. P. K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50, 455–482.
Article Google Scholar
Van Dalen, H. P., & Henkens, H. P. K. (2005). Signals in science—On the importance of signaling in gaining attention in science. Scientometrics, 64(2), 209–233.
Article Google Scholar
Wroblewski J. (1998) Genetic algorithm in decomposition and classification problems, Physica, Heidelberg, 2: 471–487
Xu, J., & Chen, H. (2005). Criminal network analysis and visualization. Communications of the ACM, 48(6), 100–107.
Article Google Scholar
Yang, C. C., & Sageman, M. (2009). Analysis of terrorist social networks with fractal views. Journal of Information Science, 35(3), 299–320.
Article Google Scholar
Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 194, 257–319.
Article Google Scholar

Download references

Acknowledgments

We thank Dr. Xin Huang for the fruitful discussion. This work was supported by the National Natural Science Foundation of China (Grant Nos. 71003020; 70973031), the special funds of Central College Basic Scientific Research Bursary (Grant No. DL09BB51), and the research foundation of the ISTIC-Thomson Reuters Joint Lab for Scientometrics Research.

Author information

Authors and Affiliations

School of Management, Harbin Institute of Technology, Harbin, 150001, People’s Republic of China
Mingyang Wang & Guang Yu
Northeast Forestry University, Harbin, 150040, People’s Republic of China
Mingyang Wang
School of Power Engineering, Harbin Institute of Technology, Harbin, 150001, People’s Republic of China
Daren Yu

Authors

Mingyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Daren Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mingyang Wang or Daren Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Yu, G. & Yu, D. Mining typical features for highly cited papers. Scientometrics 87, 695–706 (2011). https://doi.org/10.1007/s11192-011-0366-1

Download citation

Received: 11 December 2010
Published: 08 March 2011
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11192-011-0366-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining typical features for highly cited papers

Abstract

Access this article

Similar content being viewed by others

Attribute-based quality classification of academic papers

Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods

Research on the Prediction of Highly Cited Papers Based on PCA-BPNN

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining typical features for highly cited papers

Abstract

Access this article

Similar content being viewed by others

Attribute-based quality classification of academic papers

Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods

Research on the Prediction of Highly Cited Papers Based on PCA-BPNN

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation