Skip to main content
Log in

Research status and trend analysis of global biomedical text mining studies in recent 10 years

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Objective

In recent years, with the abrupt growth of the amount of biomedical literature, a lot of implicit laws and new knowledge were buried in the vast literature, while the text mining technology, if applied in the biomedical field, can integrate and analyze massive biomedical literature data, obtaining valuable information to improve people’s understanding of biomedical phenomena. This paper mainly discussed the research status of text mining technology applied in the biomedical field in recent 10 years in order to provide a reference for further studies of other researchers.

Methods

Biomedical text mining literature included in SCI from 2004 to 2013 were retrieved and filtered and then were analyzed from the perspectives of annual changes, regional distribution, research institutions, journals sources, research fields, keywords and so on.

Results

The total amount of global biomedical text mining literature is on the rise, among which literature relevant to named entity recognition, entity relation extraction, text categorization, text clustering, abbreviations extraction and co-occurrence analysis take up a large percentage; studies in USA and the UK are in the leading position.

Conclusion

Compared with other much more mature research topics, the application of text mining technology in biomedicine is still a relatively new research field worldwide, while with the constantly improving awareness of this field and deepening researches in this area, a number of core research areas, core research institutes and core research fields have been formed in this field. Therefore, further researches of this field will inject new vitality in the development of biomedicine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

References

  • Bayer, A. E., & Folger, J. (1966). Some correlates of a citation measure of productivity in science. Sociology of education, 39, 381–390.

    Article  Google Scholar 

  • Braun, T., Schubert, A. P., & Kostoff, R. N. (2000). Growth and trends of fullerene research as reflected in its journal literature. Chemical Reviews, 100(1), 23–38.

    Article  Google Scholar 

  • de Solla Price, D. J., & Beaver, D. (1966). Collaboration in an invisible college. American Psychologist, 21(11), 1011.

    Article  Google Scholar 

  • Donaldson, I., Martin, J., De Bruijn, B., Wolting, C., Lay, V., Tuekam, B., & Hogue, C. W. (2003). PreBIND and Textomy–mining the biomedical literature for protein–protein interactions using a support vector machine. BMC bioinformatics, 4(1), 11.

    Article  Google Scholar 

  • Fleuren, W. W., Verhoeven, S., Frijters, R., Heupers, B., Polman, J., van Schaik, R., & Alkema, W. (2011). CoPub update: CoPub 5.0 a text mining system to answer biological questions. Nucleic Acids Research, 39, 450–454.

    Article  Google Scholar 

  • Frijters, R., Heupers, B., van Beek, P., Bouwhuis, M., van Schaik, R., de Vlieg, J., & Alkema, W. (2008). CoPub: A literature-based keyword enrichment tool for microarray data analysis. Nucleic Acids Research, 36, 406–410.

    Article  Google Scholar 

  • Han, J. S., & Ho, Y. S. (2011). Global trends and performances of acupuncture research. Neuroscience and Biobehavioral Reviews, 35(3), 680–687.

    Article  Google Scholar 

  • He, M., Wang, Y., & Li, W. (2009). PPI finder: A mining tool for human protein–protein interactions. PLoS One, 4(2), e4554.

    Article  Google Scholar 

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.

    Article  Google Scholar 

  • Hirsch, J. E. (2007). Does the h index have predictive power? Proceedings of the National Academy of Sciences, 104(49), 19193–19198.

    Article  Google Scholar 

  • Hu, X. (2004). Integration of cluster ensemble and text summarization for gene expression analysis. In Proceedings of fourth IEEE symposium on bioinformatics and bioengineering, 2004. BIBE 2004 (pp 251–258). IEEE.

  • Hur, J., Schuyler, A. D., & Feldman, E. L. (2009). SciMiner: Web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics, 25(6), 838–840.

    Article  Google Scholar 

  • Kinney, A. L. (2007). National scientific facilities and their science impact on nonbiomedical research. Proceedings of the National Academy of Sciences, 104(46), 17943–17947.

    Article  Google Scholar 

  • Krallinger, M., Leitner, F., Rodriguez-Penagos, C., & Valencia, A. (2008). Overview of the protein–protein interaction annotation extraction task of BioCreative II. Genome Biology, 9(Suppl 2), S4.

    Article  Google Scholar 

  • Leung, S., Chan, K., & Song, L. (2006). Publishing trends in Chinese medicine and related subjects documented in WorldCat. Health Information and Libraries Journal, 23(1), 13–22.

    Article  Google Scholar 

  • Li, L. L., Ding, G., Feng, N., Wang, M. H., & Ho, Y. S. (2009). Global stem cell research trend: Bibliometric analysis as a tool for mapping of trends from 1991 to 2006. Scientometrics, 80(1), 39–58.

    Article  Google Scholar 

  • Li, T., Ho, Y. S., & Li, C. Y. (2008). Bibliometric analysis on global Parkinson’s disease research trends during 1991–2006. Neuroscience Letters, 441(3), 248–252.

    Article  Google Scholar 

  • Li, C., Zhang, Y., & Gao, Z. (1999). A new clustering algorithm. Journal of Pattern Recognition and Artificial Intelligence, 12(2), 205–209.

    Google Scholar 

  • Liu, H., Hu, Z. Z., Torii, M., Wu, C., & Friedman, C. (2006). Quantitative assessment of dictionary-based protein named entity tagging. Journal of the American Medical Informatics Association, 13(5), 497–507.

    Article  MATH  Google Scholar 

  • Liu, X., & Wang, Z. (2010). Statistics and analysis of the high-cited papers of information science research from 2004 to 2008. Journal of Intelligence, 29(1), 64–67.

    Google Scholar 

  • Lv, T., & Jiang, Y. (2010). Application of text mining in biomedical field. The Chinese Medicine Books Intelligence Magazine, 19(4), 56–64.

    Google Scholar 

  • Macias-Chapula, C. A. (2000). AIDS in Haiti: A bibliometric analysis. Bulletin of the Medical Library Association, 88(1), 56.

    Google Scholar 

  • Miwa, M., Sætre, R., Miyao, Y., & Tsujii, J. I. (2009). Protein–protein interaction extraction by leveraging multiple kernels and parsers. International Journal of Medical Informatics, 78(12), e39–e46.

    Article  Google Scholar 

  • Muller, H., & Mancuso, F. (2008). Identification and analysis of co-occurrence networks with NetCutter. PLoS One, 3(9), e3178.

    Article  Google Scholar 

  • Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2002). Association of genes to genetically inherited diseases using data mining. Nature Genetics, 31(3), 316–319.

    Google Scholar 

  • Ramos, J. M., Padilla, S., Masia, M., & Gutierrez, F. (2008). A bibliometric analysis of tuberculosis research indexed in PubMed, 1997–2006. The International Journal of Tuberculosis and Lung Disease, 12(12), 1461–1468.

    Google Scholar 

  • Rodriguez-Esteban, R. (2009). Biomedical text mining and its applications. PLoS Computational Biology, 5(12), e1000597.

    Article  Google Scholar 

  • Saha, S. K., Sarkar, S., & Mitra, P. (2009). Feature selection techniques for maximum entropy based biomedical named entity recognition. Journal of Biomedical Informatics, 42(5), 905–911.

    Article  Google Scholar 

  • Schwartz, A. S., & Hearst, M. A. (2003). A simple algorithm for identifying abbreviation definitions in biomedical text. In Pacific Symposium on Biocomputing (Vol. 8, pp. 451–462).

  • Si, L., & Kanungo, T. (2005). Thresholding strategies for text classifiers: TREC 2005 Biomedical Triage Task Experiments. In TREC.

  • Smalheiser, N. R., & Swanson, D. R. (1994). Assessing a gap in the biomedical literature-magnesium-deficiency and neurologic disease. Neuroscience Research Communications, 15(1), 1–9.

    Google Scholar 

  • Smith, L., Rindflesch, T., & Wilbur, W. J. (2004). MedPost: A part-of-speech tagger for bioMedical text. Bioinformatics, 20(14), 2320–2321.

    Article  Google Scholar 

  • Sorensen, A. A. (2009). Alzheimer’s disease research: Scientific productivity and impact of the top 100 investigators in the field. Journal of Alzheimer’s Disease, 16(3), 451.

    Google Scholar 

  • Tari, L., Anwar, S., Liang, S., Cai, J., & Baral, C. (2010). Discovering drug–drug interactions: A text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics, 26(18), 1547–1553.

    Article  Google Scholar 

  • Theodosiou, T., Darzentas, N., Angelis, L., & Ouzounis, C. A. (2008). PuReD-MCL: A graph-based PubMed document clustering methodology. Bioinformatics, 24(17), 1935–1941.

    Article  Google Scholar 

  • Tsuruoka, Y., Miwa, M., Hamamoto, K., Tsujii, J. I., & Ananiadou, S. (2011). Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics, 27(13), i111–i119.

    Article  Google Scholar 

  • Tsuruoka, Y., Tateishi, Y., Kim, J. D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. I. (2005). Developing a robust part-of-speech tagger for biomedical text. Advances in Informatics, 3746, 382–392.

    Article  Google Scholar 

  • Tsuruoka, Y., Tsujii, J. I., & Ananiadou, S. (2008). FACTA: A text search engine for finding associated biomedical concepts. Bioinformatics, 24(21), 2559–2560.

    Article  Google Scholar 

  • Tulipano, P. K., Tao, Y., Millar, W. S., Zanzonico, P., Kolbert, K., Xu, H., & Friedman, C. (2007). Natural language processing and visualization in the molecular imaging domain. Journal of Biomedical Informatics, 40(3), 270–281.

    Article  Google Scholar 

  • Ugolini, D., Puntoni, R., Perera, F. P., Schulte, P. A., & Bonassi, S. (2007). A bibliometric analysis of scientific production in cancer molecular epidemiology. Carcinogenesis, 28(8), 1774–1779.

    Article  Google Scholar 

  • Wang, H., & Zhao, T. (2008). Research and development of biomedical text mining. Journal of Chinese Information Processing, 22(3), 89–98.

    MATH  Google Scholar 

  • Xie, S., Zhang, J., & Ho, Y. S. (2008). Assessment of world aerosol research trends by bibliometric analysis. Scientometrics, 77(1), 113–130.

    Article  Google Scholar 

  • Zhang, H. Q., He, D. G., He, L., & Li, J. (1997). The literature of Qigong: Publication patterns and subject headings. International Forum on Information and Documentation, 22(3), 38–44.

    Google Scholar 

Download references

Acknowledgments

This research is supported by Young Talent Project of Beijing (No. YETP0821) and Research Project for Practice Development of National TCM Clinical Research Bases.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Le Wang.

Additional information

Xing Zhai, Zhihong Li and Kuo Gao have contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 27 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, X., Li, Z., Gao, K. et al. Research status and trend analysis of global biomedical text mining studies in recent 10 years. Scientometrics 105, 509–523 (2015). https://doi.org/10.1007/s11192-015-1700-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1700-9

Keywords

Navigation