Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers

Wolcott, Holly N.; Fouch, Matthew J.; Hsu, Elizabeth R.; DiJoseph, Leo G.; Bernaciak, Catherine A.; Corrigan, James G.; Williams, Duane E.

doi:10.1007/s11192-016-1861-1

Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers

Published: 22 February 2016

Volume 107, pages 807–817, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

Holly N. Wolcott¹,
Matthew J. Fouch¹,
Elizabeth R. Hsu²,
Leo G. DiJoseph¹,
Catherine A. Bernaciak¹,
James G. Corrigan² &
…
Duane E. Williams³

963 Accesses
14 Citations
Explore all metrics

Abstract

Research funding organizations invest substantial resources to monitor mission-relevant research findings to identify and support promising new lines of inquiry. To that end, we have been pursuing the development of tools to identify research publications that have a strong likelihood of driving new avenues of research. This paper describes our work towards incorporating multiple time-dependent and -independent features of publications into a model to identify candidate breakthrough papers as early as possible following publication. We used multiple random forest models to assess the ability of indicators to reliably distinguish a gold standard set of breakthrough publications as identified by subject matter experts from among a comparison group of similar Thomson Reuters Web of Science™ publications. These indicators were then tested for their predictive value in random forest models. Model parameter optimization and variable selection were used to construct a final model based on indicators that can be measured within 6 months post-publication; the final model had an estimated true positive rate of 0.77 and false positive rate of 0.01.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to design bibliometric research: an overview and a framework proposal

Article Open access 06 March 2024

Literature reviews as independent studies: guidelines for academic practice

Article Open access 14 October 2022

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

References

Boyack, K. W., & Börner, K. (2003). Indicator-assisted evaluation and funding of research: Visualizing the influence of grants on the number and citation counts of research papers. Journal of the American Society for Information Science and Technology, 54, 447–461.
Article Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
MathSciNet MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article MATH Google Scholar
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57, 359–377.
Article Google Scholar
Chen, C. (2012). Predictive effects of structural variation on citation counts. Journal of the American Society for Information Science and Technology, 63, 431–449.
Article Google Scholar
Compañó, R., & Hullmann, A. (2002). Forecasting the development of nanotechnology with the help of science and technology indicators. Nanotechnology, 13, 243.
Article Google Scholar
Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695. http://igraph.org.
Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information Science and Technology, 63, 2351–2369.
Article Google Scholar
Fujita, K., Kajikawa, Y., Mori, J., & Sakata, I. (2012). Detecting research fronts using different types of combinational citation: detecting research fronts using different types of combinational citation.
Garfield, E. (1955). Citation indexes for science—New dimension in documentation through association of ideas. Science, 122, 108–111.
Article Google Scholar
Garfield, E., & Malin, M. V. (1968). Can Nobel Price winners be predicted? In 135th annual meeting. American Association for Advancement of Science.
Garfield, E., Sher, I. H., & Torpie, R. J. (1964). The use of citation data in writing the history of science (p. 75). Philadelphia, PA: Institute for Scientific Information.
Google Scholar
Huang, Y. H., Hsu, C. N., & Lerman, K. (2013). Identifying transformative scientific research. In IEEE 13th international conference on data mining (ICDM) (pp. 291–300).
Klavans, R., Boyack, K. W., & Small, H. (2012). Indicators and precursors of “hot science”. In 17th international conference on science and technology indicators (pp. 475–487).
Klavans, R., Boyack, K. W., & Small, H. (2013). Identifying emergent opportunities in science. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.460.8771&rep=rep1&type=pdf.
Liaw, A., & Weiner, M. (2002). Classification and regression by random forest. R News, 2, 18–22.
Google Scholar
NSB (National Science Board). (2007). Enhancing support of transformative research at the national science foundation (p. 14). Arlington: National Science Foundation. https://www.nsf.gov/nsb/documents/2007/tr_report.pdf.
Ponomarev, I., Williams, D., Hackett, C., Schnell, J., & Haak, L. (2014a). Predicting highly cited papers: A method for early detection of candidate breakthroughs. Technological Forecasting and Social Change, 81, 49–55.
Article Google Scholar
Ponomarev, I., Williams, D., Lawton, B., Cross, D., Seger, Y., Schnell, J., et al. (2014b). Breakthrough paper indicator: Early detection and measurement of ground-breaking research.
Small, H. (1973). Co-citation in scientific literature—New measure of relationship between 2 documents. Journal of the American Society for Information Science, 24, 265–269.
Article Google Scholar
Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68, 595–610.
Article MathSciNet Google Scholar
Wolcott, H. N., Fouch, M. J., Hsu, E., Bernaciak, C., Corrigan, J., & Williams, D. (2015). Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers. In 15th international conference on scientometrics and informetrics (pp. 403–408).

Download references

Acknowledgments

This study was improved by contributions from Danielle Daee (NCI); Di Cross, and Joshua Schnell (Thomson Reuters); and extends work by Ilya Ponomarev (formerly Thomson Reuters) and Charles Hackett (National Institutes of Allergy and Infectious Diseases). This work was supported in part by NIH contract #HHS263201000058B.

Author information

Authors and Affiliations

Intellectual Property and Science, Thomson Reuters, Rockville, MD, 20850, USA
Holly N. Wolcott, Matthew J. Fouch, Leo G. DiJoseph & Catherine A. Bernaciak
Office of Science Planning and Assessment, National Cancer Institute, Bethesda, MD, 20892, USA
Elizabeth R. Hsu & James G. Corrigan
ÜberResearch, Bethesda, MD, 20814, USA
Duane E. Williams

Authors

Holly N. Wolcott
View author publications
You can also search for this author in PubMed Google Scholar
Matthew J. Fouch
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth R. Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Leo G. DiJoseph
View author publications
You can also search for this author in PubMed Google Scholar
Catherine A. Bernaciak
View author publications
You can also search for this author in PubMed Google Scholar
James G. Corrigan
View author publications
You can also search for this author in PubMed Google Scholar
Duane E. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duane E. Williams.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wolcott, H.N., Fouch, M.J., Hsu, E.R. et al. Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers. Scientometrics 107, 807–817 (2016). https://doi.org/10.1007/s11192-016-1861-1

Download citation

Received: 17 July 2015
Published: 22 February 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11192-016-1861-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

Literature reviews as independent studies: guidelines for academic practice

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling time-dependent and -independent indicators to facilitate identification of breakthrough research papers

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

Literature reviews as independent studies: guidelines for academic practice

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation