Abstract
Text mining has become an increasingly significant role in processing medical information. The research of text mining enhanced medical has attracted much attention in view from the substantial expansion of literature. This study aims to systematically review the existing academic research outputs of the field from Web of Science and PubMed by using techniques such as geographic visualization, collaboration degree, social network analysis, and topic modeling analysis. Specifically, publication statistical characteristics, geographical distribution, collaboration relations, and research topic are quantitatively analyzed. This study contributes to the text mining enhanced medical research field in a number of ways. First, it provides the latest research status for researchers who are interested in the field through literature analysis. Second, it helps scholars become more aware of the research subfields through hot topic identification. Third, it provides insights to researchers engaging in the field and motivates attention on the relevant research.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmed S, de Jager CA, Haigh AM, Garrard P (2013) Semantic processing in connected speech at a uniformly early stage of autopsy-confirmed Alzheimer’s disease. Neuropsychology 27(1):79
Alzheimer’s A (2015) 2015 Alzheimer’s disease facts and figures. Alzheimers Dement 11(3):332
Apte C, Damerau F, Weiss SM, Apte C, Damerau F, Weiss S (1998) Text mining with decision trees and decision rules. In: Proceedings of the conference on automated learning and discorery, Workshop 6: learning from text and the web, Citeseer
Baker NC, Ekins S, Williams AJ, Tropsha A (2018) A bibliometric review of drug repurposing. Drug Discov Today 23:661–672
Batet M, Sánchez D, Valls A (2011) An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform 44(1):118–125
Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1:17–35
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science 323(5916):892–895
Bouyssou D, Marchant T (2011) Ranking scientists and departments in a consistent manner. J Am Soc Inf Sci Technol 62(9):1761–1769
Chen X, Chen B, Zhang C, Hao T (2017a) Discovering the recent research in natural language processing field based on a statistical approach. In: International symposium on emerging technologies for education, Springer, pp 507–517
Chen X, Weng H, Hao T (2017b) A data-driven approach for discovering the recent research status of diabetes in China. In: International conference on health information science, Springer, pp 89–101
Chen X, Ding R, Xu K, Wang S, Hao T, Zhou Y (2018a) A bibliometric review of natural language processing empowered mobile computing. Wirel Commun Mob Comput. https://doi.org/10.1155/2018/1827074
Chen X, Xie H, Wang FL, Liu Z, Xu J, Hao T (2018b) A bibliometric analysis of natural language processing in medical research. BMC Med Inform Decis 18(1):14
Chou WyS, Prestin A, Kunath S (2014) Obesity in social media: a mixed methods analysis. Transl Behav Med 4(3):314–323
Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in R. J Stat Softw 25(5):1–54
Fey MK, Jenkins LS (2015) Debriefing practices in nursing education programs: Results from a national study. Nurs Educ Perspect 36(6):361–366
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 363–370
Fraser V, Llewellyn G (2015) Good, bad or absent: discourses of parents with disabilities in Australian news media. J Appl Res Intellect Disabil 28(4):319–329
Fu Hz, Ho Ys, Sui Ym, Li Zs (2010) A bibliometric analysis of solid waste research during the period 1993–2008. Waste Manag 30(12):2410–2417
Glanzel W (2003) Bibliometrics as a research field: a course on theory and application of bibliometric indicators. Course Handouts. http://www.norslis.net/2004/ib_Module_KUL.pdf
Grün B, Hornik K (2011) Topicmodels: an R package for fitting topic models. J Stat Softw 40(13):1–30
He P, Deng Z, Wang H, Liu Z (2016) Model approach to grammatical evolution: theory and case study. Soft Comput 20(9):3537–3548
He P, Deng Z, Gao C, Wang X, Li J (2017) Model approach to grammatical evolution: deep-structured analyzing of model and representation. Soft Comput 21(18):5413–5423
Hearst M (2003) What is text mining. SIMS, UC, Berkeley
Hoek J, Gifford H, Maubach N, Newcombe R (2014) A qualitative analysis of messages to promote smoking cessation among pregnant women. BMJ Open 4(11):e006716
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 289–296
Jiang H, Qiang M, Lin P (2016) A topic modeling based bibliometric exploration of hydropower research. Renew Sustain Energ Rev 57:226–237
Jin J, Yan X, Li Y, Li Y (2016) How users adopt healthcare information: an empirical study of an online Q&A community. Int J Med Inform 86:91–103
Kantrowitz-Gordon I, Altman MR, Vandermause R (2016) Prolonged distress of parents after early preterm birth. Jognn J Obst Gyn Neo 45(2):196–209
Khan MS, Ullah W, Riaz IB, Bhulani N, Manning WJ, Tridandapani S, Khosa F (2017) Top 100 cited articles in cardiovascular magnetic resonance: a bibliometric analysis. J Cardiovasc Magn Reson 18(1):87
Kim K, Han Y, Js Kim (2015) Korean nurses ethical dilemmas, professional values and professional quality of life. Nurs Ethics 22(4):467–478
Knight R, Shoveller JA, Oliffe JL, Gilbert M, Frank B, Ogilvie G (2012) Masculinities, guy talkand manning up: a discourse analysis of how young men talk about sexual health. Sociol Health Ill 34(8):1246–1261
Leonard AD, Markham CM, Bui T, Shegog R, Paul ME (2010) Lowering the risk of secondary HIV transmission: insights from HIV-positive youth and health care providers. Perspect Sex Reprod Health 42(2):110–116
Li W, Zhao Y (2015) Bibliometric analysis of global environmental assessment research in a 20-year period. Environ Impact Assess Rev 50:158–166
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110
Lin W, Xu S, He L, Li J (2017) Multi-resource scheduling and power simulation for cloud computing. Inform Sci 397:168–186
Liu F, Tur G, Hakkani-Tür D, Yu H (2011) Towards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions. J Am Med Inform Assoc 18(5):625–630
Lu K, Wolfram D (2012) Measuring author research relatedness: a comparison of word-based, topic-based, and author cocitation approaches. J Assoc Inf Sci Technol 63(10):1973–1986
Lucini FR, Fogliatto FS, da Silveira GJ, Neyeloff JL, Anzanello MJ, Kuchenbecker RdS, Schaan BD (2017) Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform 100:1–8
Luo M, Chang X, Li Z, Nie L, Hauptmann AG, Zheng Q (2017) Simple to complex cross-modal learning to rank. Comput Vis Image Underst 163:67–77
Mann HB (1945) Nonparametric tests against trend. Econometrica 13:245–259
Mårtensson G, Jacobsson JW, Engström M (2014) Mental health nursing staff’s attitudes towards mental illness: an analysis of related factors. J Psychiatr Ment Health Nurs 21(9):782–788
Mazloumian A (2012) Predicting scholars’ scientific impact. Plos One 7(11):e49246
Merigó JM, Gil-Lafuente AM, Yager RR (2015) An overview of fuzzy research with bibliometric indicators. Appl Soft Comput 27:420–433
Meystre S, Haug PJ (2005) Automation of a problem list using natural language processing. BMC Med Inform Decis Mak 5(1):30
Nafade V, Nash M, Huddart S, Pande T, Gebreselassie N, Lienhardt C, Pai M (2018) A bibliometric analysis of tuberculosis research, 2007–2016. Plos One 13(6):e0199706
Nichols LG (2014) A topic model approach to measuring interdisciplinarity at the national science foundation. Scientometrics 100(3):741–754
Oscar N, Fox PA, Croucher R, Wernick R, Keune J, Hooker K (2017) Machine learning, sentiment analysis, and tweets: an examination of Alzheimers disease stigma on twitter. J Gerontol B Psychol 72(5):742–751
Peñaloza C, Benetello A, Tuomiranta L, Heikius IM, Järvinen S, Majos MC, Cardona P, Juncadella M, Laine M, Martin N et al (2015) Speech segmentation in aphasia. Aphasiology 29(6):724–743
Pistono A, Jucla M, Barbeau EJ, Saint-Aubert L, Lemesle B, Calvet B, Köpke B, Puel M, Pariente J (2016) Pauses during autobiographical discourse reflect episodic memory processes in early Alzheimers disease. J Alzheimers Dis 50(3):687–698
Rees CE, Monrouxe LV, McDonald LA (2015) My mentor kicked a dying woman’s bedanalysing UK nursing studentsmost memorable professionalism dilemmas. J Adv Nurs 71(1):169–180
Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503–520
Romero C, Ventura S (2007) Educational data mining: a survey from 1995 to 2005. Expert Syst Appl 33(1):135–146
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Saraswathi K, Tamilarasi A (2016) Ant colony optimization based feature selection for opinion mining classification. J Med Imaging Health Inform 6(7):1594–1599
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG (2010) Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J Am Med Inform Assoc 17(5):507–513
Shin S, Park JH, Kim JH (2015) Effectiveness of patient simulation in nursing education: meta-analysis. Nurse Educ Today 35(1):176–182
Tan H, Gao Y (2017) Patch-based principal covariance discriminative learning for image set classification. IEEE Access 5:15001–15012
Tan H, Gao Y, Ma Z (2018) Regularized constraint subspace based method for image set classification. Pattern Recogn 76:434–448
Teh YW, Jordan MI, Beal MJ, Blei DM (2005) Sharing clusters among related groups: hierarchical Dirichlet processes. In: Advances in neural information processing systems, pp 1385–1392
Wang H, Wang W, Cui Z, Zhou X, Zhao J, Li Y (2018) A new dynamic firefly algorithm for demand estimation of water resources. Inform Sci 438:95–106
Wei Y, Mi Z, Zhang H (2013) Progress of integrated assessment models for climate policy. Syst Eng Theory Pract 33(8):1905–1915
Yau CK, Porter A, Newman N, Suominen A (2014) Clustering scientific documents with topic modeling. Scientometrics 100(3):767–786
Yeung AWK, Goto TK, Leung WK (2017) The changing landscape of neuroscience research, 2006–2015: a bibliometric study. Front Neurosci Switz 11:120
Yu D, Xu Z, Wang W (2018) Bibliometric analysis of fuzzy theory research in china: a 30-year perspective. Knowl Based Syst 141:188–199
Zhang K, Wang Q, Liang QM, Chen H (2016) A bibliometric analysis of research on carbon tax from 1989 to 2014. Renew Sustain Energy Rev 58:297–310
Zhang S, Yang Z, Xing X, Gao Y, Xie D, Wong HS (2017) Generalized pair-counting similarity measures for clustering and cluster ensembles. IEEE Access 5:16904–16918
Zhong S, Geng Y, Liu W, Gao C, Chen W (2016) A bibliometric review on natural resource accounting during 1995–2014. J Clean Prod 139:122–132
Acknowledgements
The work was funded by the grant from National Natural Science Foundation of China (No. 61772146) and Guangzhou Science Technology and Innovation Commission (No. 201803010063).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by B. B. Gupta.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
See Table 8.
Rights and permissions
About this article
Cite this article
Hao, T., Chen, X., Li, G. et al. A bibliometric analysis of text mining in medical research. Soft Comput 22, 7875–7892 (2018). https://doi.org/10.1007/s00500-018-3511-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3511-4