Abstract
Objective
In recent years, with the abrupt growth of the amount of biomedical literature, a lot of implicit laws and new knowledge were buried in the vast literature, while the text mining technology, if applied in the biomedical field, can integrate and analyze massive biomedical literature data, obtaining valuable information to improve people’s understanding of biomedical phenomena. This paper mainly discussed the research status of text mining technology applied in the biomedical field in recent 10 years in order to provide a reference for further studies of other researchers.
Methods
Biomedical text mining literature included in SCI from 2004 to 2013 were retrieved and filtered and then were analyzed from the perspectives of annual changes, regional distribution, research institutions, journals sources, research fields, keywords and so on.
Results
The total amount of global biomedical text mining literature is on the rise, among which literature relevant to named entity recognition, entity relation extraction, text categorization, text clustering, abbreviations extraction and co-occurrence analysis take up a large percentage; studies in USA and the UK are in the leading position.
Conclusion
Compared with other much more mature research topics, the application of text mining technology in biomedicine is still a relatively new research field worldwide, while with the constantly improving awareness of this field and deepening researches in this area, a number of core research areas, core research institutes and core research fields have been formed in this field. Therefore, further researches of this field will inject new vitality in the development of biomedicine.


References
Bayer, A. E., & Folger, J. (1966). Some correlates of a citation measure of productivity in science. Sociology of education, 39, 381–390.
Braun, T., Schubert, A. P., & Kostoff, R. N. (2000). Growth and trends of fullerene research as reflected in its journal literature. Chemical Reviews, 100(1), 23–38.
de Solla Price, D. J., & Beaver, D. (1966). Collaboration in an invisible college. American Psychologist, 21(11), 1011.
Donaldson, I., Martin, J., De Bruijn, B., Wolting, C., Lay, V., Tuekam, B., & Hogue, C. W. (2003). PreBIND and Textomy–mining the biomedical literature for protein–protein interactions using a support vector machine. BMC bioinformatics, 4(1), 11.
Fleuren, W. W., Verhoeven, S., Frijters, R., Heupers, B., Polman, J., van Schaik, R., & Alkema, W. (2011). CoPub update: CoPub 5.0 a text mining system to answer biological questions. Nucleic Acids Research, 39, 450–454.
Frijters, R., Heupers, B., van Beek, P., Bouwhuis, M., van Schaik, R., de Vlieg, J., & Alkema, W. (2008). CoPub: A literature-based keyword enrichment tool for microarray data analysis. Nucleic Acids Research, 36, 406–410.
Han, J. S., & Ho, Y. S. (2011). Global trends and performances of acupuncture research. Neuroscience and Biobehavioral Reviews, 35(3), 680–687.
He, M., Wang, Y., & Li, W. (2009). PPI finder: A mining tool for human protein–protein interactions. PLoS One, 4(2), e4554.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.
Hirsch, J. E. (2007). Does the h index have predictive power? Proceedings of the National Academy of Sciences, 104(49), 19193–19198.
Hu, X. (2004). Integration of cluster ensemble and text summarization for gene expression analysis. In Proceedings of fourth IEEE symposium on bioinformatics and bioengineering, 2004. BIBE 2004 (pp 251–258). IEEE.
Hur, J., Schuyler, A. D., & Feldman, E. L. (2009). SciMiner: Web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics, 25(6), 838–840.
Kinney, A. L. (2007). National scientific facilities and their science impact on nonbiomedical research. Proceedings of the National Academy of Sciences, 104(46), 17943–17947.
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., & Valencia, A. (2008). Overview of the protein–protein interaction annotation extraction task of BioCreative II. Genome Biology, 9(Suppl 2), S4.
Leung, S., Chan, K., & Song, L. (2006). Publishing trends in Chinese medicine and related subjects documented in WorldCat. Health Information and Libraries Journal, 23(1), 13–22.
Li, L. L., Ding, G., Feng, N., Wang, M. H., & Ho, Y. S. (2009). Global stem cell research trend: Bibliometric analysis as a tool for mapping of trends from 1991 to 2006. Scientometrics, 80(1), 39–58.
Li, T., Ho, Y. S., & Li, C. Y. (2008). Bibliometric analysis on global Parkinson’s disease research trends during 1991–2006. Neuroscience Letters, 441(3), 248–252.
Li, C., Zhang, Y., & Gao, Z. (1999). A new clustering algorithm. Journal of Pattern Recognition and Artificial Intelligence, 12(2), 205–209.
Liu, H., Hu, Z. Z., Torii, M., Wu, C., & Friedman, C. (2006). Quantitative assessment of dictionary-based protein named entity tagging. Journal of the American Medical Informatics Association, 13(5), 497–507.
Liu, X., & Wang, Z. (2010). Statistics and analysis of the high-cited papers of information science research from 2004 to 2008. Journal of Intelligence, 29(1), 64–67.
Lv, T., & Jiang, Y. (2010). Application of text mining in biomedical field. The Chinese Medicine Books Intelligence Magazine, 19(4), 56–64.
Macias-Chapula, C. A. (2000). AIDS in Haiti: A bibliometric analysis. Bulletin of the Medical Library Association, 88(1), 56.
Miwa, M., Sætre, R., Miyao, Y., & Tsujii, J. I. (2009). Protein–protein interaction extraction by leveraging multiple kernels and parsers. International Journal of Medical Informatics, 78(12), e39–e46.
Muller, H., & Mancuso, F. (2008). Identification and analysis of co-occurrence networks with NetCutter. PLoS One, 3(9), e3178.
Perez-Iratxeta, C., Bork, P., & Andrade, M. A. (2002). Association of genes to genetically inherited diseases using data mining. Nature Genetics, 31(3), 316–319.
Ramos, J. M., Padilla, S., Masia, M., & Gutierrez, F. (2008). A bibliometric analysis of tuberculosis research indexed in PubMed, 1997–2006. The International Journal of Tuberculosis and Lung Disease, 12(12), 1461–1468.
Rodriguez-Esteban, R. (2009). Biomedical text mining and its applications. PLoS Computational Biology, 5(12), e1000597.
Saha, S. K., Sarkar, S., & Mitra, P. (2009). Feature selection techniques for maximum entropy based biomedical named entity recognition. Journal of Biomedical Informatics, 42(5), 905–911.
Schwartz, A. S., & Hearst, M. A. (2003). A simple algorithm for identifying abbreviation definitions in biomedical text. In Pacific Symposium on Biocomputing (Vol. 8, pp. 451–462).
Si, L., & Kanungo, T. (2005). Thresholding strategies for text classifiers: TREC 2005 Biomedical Triage Task Experiments. In TREC.
Smalheiser, N. R., & Swanson, D. R. (1994). Assessing a gap in the biomedical literature-magnesium-deficiency and neurologic disease. Neuroscience Research Communications, 15(1), 1–9.
Smith, L., Rindflesch, T., & Wilbur, W. J. (2004). MedPost: A part-of-speech tagger for bioMedical text. Bioinformatics, 20(14), 2320–2321.
Sorensen, A. A. (2009). Alzheimer’s disease research: Scientific productivity and impact of the top 100 investigators in the field. Journal of Alzheimer’s Disease, 16(3), 451.
Tari, L., Anwar, S., Liang, S., Cai, J., & Baral, C. (2010). Discovering drug–drug interactions: A text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics, 26(18), 1547–1553.
Theodosiou, T., Darzentas, N., Angelis, L., & Ouzounis, C. A. (2008). PuReD-MCL: A graph-based PubMed document clustering methodology. Bioinformatics, 24(17), 1935–1941.
Tsuruoka, Y., Miwa, M., Hamamoto, K., Tsujii, J. I., & Ananiadou, S. (2011). Discovering and visualizing indirect associations between biomedical concepts. Bioinformatics, 27(13), i111–i119.
Tsuruoka, Y., Tateishi, Y., Kim, J. D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. I. (2005). Developing a robust part-of-speech tagger for biomedical text. Advances in Informatics, 3746, 382–392.
Tsuruoka, Y., Tsujii, J. I., & Ananiadou, S. (2008). FACTA: A text search engine for finding associated biomedical concepts. Bioinformatics, 24(21), 2559–2560.
Tulipano, P. K., Tao, Y., Millar, W. S., Zanzonico, P., Kolbert, K., Xu, H., & Friedman, C. (2007). Natural language processing and visualization in the molecular imaging domain. Journal of Biomedical Informatics, 40(3), 270–281.
Ugolini, D., Puntoni, R., Perera, F. P., Schulte, P. A., & Bonassi, S. (2007). A bibliometric analysis of scientific production in cancer molecular epidemiology. Carcinogenesis, 28(8), 1774–1779.
Wang, H., & Zhao, T. (2008). Research and development of biomedical text mining. Journal of Chinese Information Processing, 22(3), 89–98.
Xie, S., Zhang, J., & Ho, Y. S. (2008). Assessment of world aerosol research trends by bibliometric analysis. Scientometrics, 77(1), 113–130.
Zhang, H. Q., He, D. G., He, L., & Li, J. (1997). The literature of Qigong: Publication patterns and subject headings. International Forum on Information and Documentation, 22(3), 38–44.
Acknowledgments
This research is supported by Young Talent Project of Beijing (No. YETP0821) and Research Project for Practice Development of National TCM Clinical Research Bases.
Author information
Authors and Affiliations
Corresponding author
Additional information
Xing Zhai, Zhihong Li and Kuo Gao have contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhai, X., Li, Z., Gao, K. et al. Research status and trend analysis of global biomedical text mining studies in recent 10 years. Scientometrics 105, 509–523 (2015). https://doi.org/10.1007/s11192-015-1700-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-015-1700-9