Skip to main content

Advertisement

A bibliometric analysis of text mining in medical research

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Text mining has become an increasingly significant role in processing medical information. The research of text mining enhanced medical has attracted much attention in view from the substantial expansion of literature. This study aims to systematically review the existing academic research outputs of the field from Web of Science and PubMed by using techniques such as geographic visualization, collaboration degree, social network analysis, and topic modeling analysis. Specifically, publication statistical characteristics, geographical distribution, collaboration relations, and research topic are quantitatively analyzed. This study contributes to the text mining enhanced medical research field in a number of ways. First, it provides the latest research status for researchers who are interested in the field through literature analysis. Second, it helps scholars become more aware of the research subfields through hot topic identification. Third, it provides insights to researchers engaging in the field and motivates attention on the relevant research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://en.wikipedia.org/wiki/Text_mining.

  2. http://www.zhukun.org/haoty/resources.asp?id=JSC_cocountry.

  3. http://www.zhukun.org/haoty/resources.asp?id=JSC_coaffiliation.

  4. http://www.zhukun.org/haoty/resources.asp?id=JSC_coauthor.

  5. https://en.wikipedia.org/wiki/Parenting.

References

  • Ahmed S, de Jager CA, Haigh AM, Garrard P (2013) Semantic processing in connected speech at a uniformly early stage of autopsy-confirmed Alzheimer’s disease. Neuropsychology 27(1):79

    Article  Google Scholar 

  • Alzheimer’s A (2015) 2015 Alzheimer’s disease facts and figures. Alzheimers Dement 11(3):332

    Article  Google Scholar 

  • Apte C, Damerau F, Weiss SM, Apte C, Damerau F, Weiss S (1998) Text mining with decision trees and decision rules. In: Proceedings of the conference on automated learning and discorery, Workshop 6: learning from text and the web, Citeseer

  • Baker NC, Ekins S, Williams AJ, Tropsha A (2018) A bibliometric review of drug repurposing. Drug Discov Today 23:661–672

    Article  Google Scholar 

  • Batet M, Sánchez D, Valls A (2011) An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform 44(1):118–125

    Article  Google Scholar 

  • Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1:17–35

    Article  MathSciNet  Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  • Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science 323(5916):892–895

    Article  Google Scholar 

  • Bouyssou D, Marchant T (2011) Ranking scientists and departments in a consistent manner. J Am Soc Inf Sci Technol 62(9):1761–1769

    Article  Google Scholar 

  • Chen X, Chen B, Zhang C, Hao T (2017a) Discovering the recent research in natural language processing field based on a statistical approach. In: International symposium on emerging technologies for education, Springer, pp 507–517

  • Chen X, Weng H, Hao T (2017b) A data-driven approach for discovering the recent research status of diabetes in China. In: International conference on health information science, Springer, pp 89–101

  • Chen X, Ding R, Xu K, Wang S, Hao T, Zhou Y (2018a) A bibliometric review of natural language processing empowered mobile computing. Wirel Commun Mob Comput. https://doi.org/10.1155/2018/1827074

    Google Scholar 

  • Chen X, Xie H, Wang FL, Liu Z, Xu J, Hao T (2018b) A bibliometric analysis of natural language processing in medical research. BMC Med Inform Decis 18(1):14

    Google Scholar 

  • Chou WyS, Prestin A, Kunath S (2014) Obesity in social media: a mixed methods analysis. Transl Behav Med 4(3):314–323

    Article  Google Scholar 

  • Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in R. J Stat Softw 25(5):1–54

    Article  Google Scholar 

  • Fey MK, Jenkins LS (2015) Debriefing practices in nursing education programs: Results from a national study. Nurs Educ Perspect 36(6):361–366

    Article  Google Scholar 

  • Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, pp 363–370

  • Fraser V, Llewellyn G (2015) Good, bad or absent: discourses of parents with disabilities in Australian news media. J Appl Res Intellect Disabil 28(4):319–329

    Article  Google Scholar 

  • Fu Hz, Ho Ys, Sui Ym, Li Zs (2010) A bibliometric analysis of solid waste research during the period 1993–2008. Waste Manag 30(12):2410–2417

    Article  Google Scholar 

  • Glanzel W (2003) Bibliometrics as a research field: a course on theory and application of bibliometric indicators. Course Handouts. http://www.norslis.net/2004/ib_Module_KUL.pdf

  • Grün B, Hornik K (2011) Topicmodels: an R package for fitting topic models. J Stat Softw 40(13):1–30

    Article  Google Scholar 

  • He P, Deng Z, Wang H, Liu Z (2016) Model approach to grammatical evolution: theory and case study. Soft Comput 20(9):3537–3548

    Article  Google Scholar 

  • He P, Deng Z, Gao C, Wang X, Li J (2017) Model approach to grammatical evolution: deep-structured analyzing of model and representation. Soft Comput 21(18):5413–5423

    Article  Google Scholar 

  • Hearst M (2003) What is text mining. SIMS, UC, Berkeley

    Google Scholar 

  • Hoek J, Gifford H, Maubach N, Newcombe R (2014) A qualitative analysis of messages to promote smoking cessation among pregnant women. BMJ Open 4(11):e006716

    Article  Google Scholar 

  • Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 289–296

  • Jiang H, Qiang M, Lin P (2016) A topic modeling based bibliometric exploration of hydropower research. Renew Sustain Energ Rev 57:226–237

    Article  Google Scholar 

  • Jin J, Yan X, Li Y, Li Y (2016) How users adopt healthcare information: an empirical study of an online Q&A community. Int J Med Inform 86:91–103

    Article  Google Scholar 

  • Kantrowitz-Gordon I, Altman MR, Vandermause R (2016) Prolonged distress of parents after early preterm birth. Jognn J Obst Gyn Neo 45(2):196–209

    Article  Google Scholar 

  • Khan MS, Ullah W, Riaz IB, Bhulani N, Manning WJ, Tridandapani S, Khosa F (2017) Top 100 cited articles in cardiovascular magnetic resonance: a bibliometric analysis. J Cardiovasc Magn Reson 18(1):87

    Article  Google Scholar 

  • Kim K, Han Y, Js Kim (2015) Korean nurses ethical dilemmas, professional values and professional quality of life. Nurs Ethics 22(4):467–478

    Article  Google Scholar 

  • Knight R, Shoveller JA, Oliffe JL, Gilbert M, Frank B, Ogilvie G (2012) Masculinities, guy talkand manning up: a discourse analysis of how young men talk about sexual health. Sociol Health Ill 34(8):1246–1261

    Article  Google Scholar 

  • Leonard AD, Markham CM, Bui T, Shegog R, Paul ME (2010) Lowering the risk of secondary HIV transmission: insights from HIV-positive youth and health care providers. Perspect Sex Reprod Health 42(2):110–116

    Article  Google Scholar 

  • Li W, Zhao Y (2015) Bibliometric analysis of global environmental assessment research in a 20-year period. Environ Impact Assess Rev 50:158–166

    Article  Google Scholar 

  • Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng 29(10):2100–2110

    Article  Google Scholar 

  • Lin W, Xu S, He L, Li J (2017) Multi-resource scheduling and power simulation for cloud computing. Inform Sci 397:168–186

    Article  Google Scholar 

  • Liu F, Tur G, Hakkani-Tür D, Yu H (2011) Towards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions. J Am Med Inform Assoc 18(5):625–630

    Article  Google Scholar 

  • Lu K, Wolfram D (2012) Measuring author research relatedness: a comparison of word-based, topic-based, and author cocitation approaches. J Assoc Inf Sci Technol 63(10):1973–1986

    Article  Google Scholar 

  • Lucini FR, Fogliatto FS, da Silveira GJ, Neyeloff JL, Anzanello MJ, Kuchenbecker RdS, Schaan BD (2017) Text mining approach to predict hospital admissions using early medical records from the emergency department. Int J Med Inform 100:1–8

    Article  Google Scholar 

  • Luo M, Chang X, Li Z, Nie L, Hauptmann AG, Zheng Q (2017) Simple to complex cross-modal learning to rank. Comput Vis Image Underst 163:67–77

    Article  Google Scholar 

  • Mann HB (1945) Nonparametric tests against trend. Econometrica 13:245–259

    Article  MathSciNet  Google Scholar 

  • Mårtensson G, Jacobsson JW, Engström M (2014) Mental health nursing staff’s attitudes towards mental illness: an analysis of related factors. J Psychiatr Ment Health Nurs 21(9):782–788

    Google Scholar 

  • Mazloumian A (2012) Predicting scholars’ scientific impact. Plos One 7(11):e49246

    Article  Google Scholar 

  • Merigó JM, Gil-Lafuente AM, Yager RR (2015) An overview of fuzzy research with bibliometric indicators. Appl Soft Comput 27:420–433

    Article  Google Scholar 

  • Meystre S, Haug PJ (2005) Automation of a problem list using natural language processing. BMC Med Inform Decis Mak 5(1):30

    Article  Google Scholar 

  • Nafade V, Nash M, Huddart S, Pande T, Gebreselassie N, Lienhardt C, Pai M (2018) A bibliometric analysis of tuberculosis research, 2007–2016. Plos One 13(6):e0199706

    Article  Google Scholar 

  • Nichols LG (2014) A topic model approach to measuring interdisciplinarity at the national science foundation. Scientometrics 100(3):741–754

    Article  MathSciNet  Google Scholar 

  • Oscar N, Fox PA, Croucher R, Wernick R, Keune J, Hooker K (2017) Machine learning, sentiment analysis, and tweets: an examination of Alzheimers disease stigma on twitter. J Gerontol B Psychol 72(5):742–751

    Article  Google Scholar 

  • Peñaloza C, Benetello A, Tuomiranta L, Heikius IM, Järvinen S, Majos MC, Cardona P, Juncadella M, Laine M, Martin N et al (2015) Speech segmentation in aphasia. Aphasiology 29(6):724–743

    Article  Google Scholar 

  • Pistono A, Jucla M, Barbeau EJ, Saint-Aubert L, Lemesle B, Calvet B, Köpke B, Puel M, Pariente J (2016) Pauses during autobiographical discourse reflect episodic memory processes in early Alzheimers disease. J Alzheimers Dis 50(3):687–698

    Article  Google Scholar 

  • Rees CE, Monrouxe LV, McDonald LA (2015) My mentor kicked a dying woman’s bedanalysing UK nursing studentsmost memorable professionalism dilemmas. J Adv Nurs 71(1):169–180

    Article  Google Scholar 

  • Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503–520

    Article  Google Scholar 

  • Romero C, Ventura S (2007) Educational data mining: a survey from 1995 to 2005. Expert Syst Appl 33(1):135–146

    Article  Google Scholar 

  • Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  Google Scholar 

  • Saraswathi K, Tamilarasi A (2016) Ant colony optimization based feature selection for opinion mining classification. J Med Imaging Health Inform 6(7):1594–1599

    Article  Google Scholar 

  • Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG (2010) Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J Am Med Inform Assoc 17(5):507–513

    Article  Google Scholar 

  • Shin S, Park JH, Kim JH (2015) Effectiveness of patient simulation in nursing education: meta-analysis. Nurse Educ Today 35(1):176–182

    Article  Google Scholar 

  • Tan H, Gao Y (2017) Patch-based principal covariance discriminative learning for image set classification. IEEE Access 5:15001–15012

    Article  Google Scholar 

  • Tan H, Gao Y, Ma Z (2018) Regularized constraint subspace based method for image set classification. Pattern Recogn 76:434–448

    Article  Google Scholar 

  • Teh YW, Jordan MI, Beal MJ, Blei DM (2005) Sharing clusters among related groups: hierarchical Dirichlet processes. In: Advances in neural information processing systems, pp 1385–1392

  • Wang H, Wang W, Cui Z, Zhou X, Zhao J, Li Y (2018) A new dynamic firefly algorithm for demand estimation of water resources. Inform Sci 438:95–106

    Article  MathSciNet  Google Scholar 

  • Wei Y, Mi Z, Zhang H (2013) Progress of integrated assessment models for climate policy. Syst Eng Theory Pract 33(8):1905–1915

    Google Scholar 

  • Yau CK, Porter A, Newman N, Suominen A (2014) Clustering scientific documents with topic modeling. Scientometrics 100(3):767–786

    Article  Google Scholar 

  • Yeung AWK, Goto TK, Leung WK (2017) The changing landscape of neuroscience research, 2006–2015: a bibliometric study. Front Neurosci Switz 11:120

    Google Scholar 

  • Yu D, Xu Z, Wang W (2018) Bibliometric analysis of fuzzy theory research in china: a 30-year perspective. Knowl Based Syst 141:188–199

    Article  Google Scholar 

  • Zhang K, Wang Q, Liang QM, Chen H (2016) A bibliometric analysis of research on carbon tax from 1989 to 2014. Renew Sustain Energy Rev 58:297–310

    Article  Google Scholar 

  • Zhang S, Yang Z, Xing X, Gao Y, Xie D, Wong HS (2017) Generalized pair-counting similarity measures for clustering and cluster ensembles. IEEE Access 5:16904–16918

    Article  Google Scholar 

  • Zhong S, Geng Y, Liu W, Gao C, Chen W (2016) A bibliometric review on natural resource accounting during 1995–2014. J Clean Prod 139:122–132

    Article  Google Scholar 

Download references

Acknowledgements

The work was funded by the grant from National Natural Science Foundation of China (No. 61772146) and Guangzhou Science Technology and Innovation Commission (No. 201803010063).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xieling Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by B. B. Gupta.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 8.

Table 8 The list of keywords related to the “text mining” determined by relevant domain experts in the field

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hao, T., Chen, X., Li, G. et al. A bibliometric analysis of text mining in medical research. Soft Comput 22, 7875–7892 (2018). https://doi.org/10.1007/s00500-018-3511-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3511-4

Keywords