Abstract
An emerging problem regarding tea and health has been pyramidally addressed as research concerns from limited literature reviews to identify an important and interesting challenge. Although past studies activated for various topics and diverse purposes of tea and health, a gap for using hybrid intelligence-based techniques to discover useful information from literature analysis exists in the exploration of curative effects on tea against fatal diseases other than Western medicine therapies. This study is motivated to bridge this gap by solving this research issue for healthcare applications between tea and health. Thus, this study proposes a hybrid method of an intelligent/objective text mining technique and topic modeling principally by latent Dirichlet allocation with VEM method and Gibbs sampling as along with measurements for three evaluation metrics for model performance from published articles. In the experiment materials, this study sets conditions to collect 2109 journal articles from the Web of Science from 2010 to 2017. We divided this into three datasets that each corresponded to the three periods to differentiate discrepancies in future trends. This study contributes eight beneficial directions as follows: (1) From a technical view, the figure of VEM’s perplexity has a screen plot, but Gibbs sampling is smooth and good; and, interestingly, the greater the number of topics, the lower the perplexity is; (2) in empirical results, the terms for primary topics are tea and tea compounds, and secondary topics are associated with terms for issues regarding tea and health; (3) this study yields five research findings with key empirical evidence that tea has a natural and important preventive impact on treating diseases, especially cancer; (4) as to any managerial implications, early preventions and treatments by greater tea consumption as a valuable healthcare activity with medicinal purposes; (5) regarding the novelty of this research, this study fills the gap in a hybrid knowledge-based objective text mining and topic modeling technique than past researchers have as regards tea and health issues that were only based on traditional content analysis methods; (6) for this study’s strengths, it achieved manpower cost reductions and relative objectivity because the objective LDA method is rarely used for topic modeling when compared to past studies; (7) for the research significance, the proposed method benefits efficient analysis from massive amounts of extant data for exploring latent information, accelerates research processes quickly, improves understanding for new hypotheses, and identifies key questions for further research; and (8) for conclusive research importance, this study offers new rationales for medical application and discovers differentiations and gaps for studying research trends.
Similar content being viewed by others
References
Abbas A, Zhang L, Khan SU (2014) A literature review on the state-of-the-art in patent analysis. World Patent Inf 37:3–13. https://doi.org/10.1016/j.wpi.2013.12.006
Alam M-N, Almoyad M, Huq F (2018) Polyphenols in colorectal cancer: current state of knowledge including clinical trials and molecular mechanism of action. BioMed Res Int. Article ID 4154185, pp 1–29. https://doi.org/10.1155/2018/4154185
Alibasic A, Simsekler MCE, Kurfess T, Woon W-L, Omar MA (2020) Utilizing data science techniques to analyze skill and demand changes in healthcare occupations: case study on USA and UAE healthcare sector. Soft Comput 24:4959–4976. https://doi.org/10.1007/s00500-019-04247-1
Amado A, Cortez P, Rita P, Moro S (2018) Research trends on big data in marketing: a text mining and topic modeling based literature analysis. Eur Res Manag Bus Econ 24:1–7. https://doi.org/10.1016/j.iedeen.2017.06.002
Arab L, Liu W, Elashoff D (2009) Green and black tea consumption and risk of stroke: a meta-analysis. Stroke 40(5):1786–1792
Balentine DA, Wiseman SA, Bouwens LCM (1997) The chemistry of tea flavonoids. Crit Rev Food Sci Nutr 37:693–704. https://doi.org/10.1080/10408399709527797
Bastani K, Namavari H, Shaffer J (2019) Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints. Expert Syst Appl 127:256–271. https://doi.org/10.1016/j.eswa.2019.03.001
Benn JA (2015) Tea in China: a religious and cultural history. University of Hawai’i Press, Honolulu
Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84. https://doi.org/10.1145/2133806.2133826
Blei DM, Lafferty JD (2005) Correlated topic models. In: Proceedings of the 18th international conference on neural information processing systems. MIT Press, Vancouver, pp 147–154
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993
Cabrera C, Artacho R, Gimenez R (2006) Beneficial effects of green tea—a review. J Am Coll Nutr 25:79–99. https://doi.org/10.1080/07315724.2006.10719518
Cao H (2013) Polysaccharides from Chinese tea: recent advance on bioactivity and function. Int J Biol Macromol 62:76–79. https://doi.org/10.1016/j.ijbiomac.2013.08.033
Cheng K, Chi N-N, Liu J-D (2019) Green tea extract for treatment of cancers: a systematic review protocol. Medicine 98(15):e15117. https://doi.org/10.1097/MD.0000000000015117
Choi HS, Lee WS, Sohn SY (2017) Analyzing research trends in personal information privacy using topic modeling. Comput Secur 67:244–253. https://doi.org/10.1016/j.cose.2017.03.007
Chrystal P (2014) Tea: a very British beverage. Amberley Publishing, Stroud
da Silva Pinto M (2013) Tea: a new perspective on health benefits. Food Res Int 53: 558–567. https://doi.org/10.1016/j.foodres.2013.01.038
Deerwester S, Dumais ST, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41:391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6%3c391:AID-ASI1%3e3.0.CO;2-9
Delen D, Crossland MD (2008) Seeding the survey and analysis of research literature with text mining. Expert Syst Appl 34:1707–1720. https://doi.org/10.1016/j.eswa.2007.01.035
Fan W, Wallace L, Rich S, Zhang Z (2006) Tapping the power of text mining. Commun ACM 49:76–82. https://doi.org/10.1145/1151030.1151032
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview. In: Usama MF, Gregory P-S, Padhraic S, Ramasamy U (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, pp 1–34
Feldman R, Regev Y, Hurvitz E, Finkelstein-Landau M (2003) Mining the biomedical literature using semantic analysis and natural language processing techniques. BIOSILICO 1:69–80. https://doi.org/10.1016/S1478-5382(03)02330-8
Graham HN (1992) Green tea composition, consumption, and polyphenol chemistry. Prev Med 21:334–350. https://doi.org/10.1016/0091-7435(92)90041-F
Greenberg JA, Axen KV, Schnoll R, Boozer CN (2005) Coffee, tea and diabetes: the role of weight loss and caffeine. Int J Obes 29(9):1121–1129. https://doi.org/10.1038/sj.ijo.0802999
Hao T, Chen X, Li G, Yan J (2018) A bibliometric analysis of text mining in medical research. Soft Comput 22:7875–7892. https://doi.org/10.1007/s00500-018-3511-4
Hobbs JR, Walker DE, Amsler RA (1982) Natural language access to structured text. In: Proceedings of the 9th conference on computational linguistics—volume 1. Academia Praha, Prague, Czechoslovakia, pp 127–132. https://doi.org/10.3115/991813.991833
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, Berkeley, CA, pp 50–57. https://doi.org/10.1145/312624.312649
Hosoda K, Wang M-F, Liao M-L, Chuang C-K, Iha M, Clevidence B, Yamamoto S (2003) Antihyperglycemic effect of oolong tea in type 2 diabetes. Diabetes Care 26(6):1714–1718. https://doi.org/10.2337/diacare.26.6.1714
Jain A, Manghani C, Kohli S, Nigam D, Rani V (2013) Tea and human health: the dark shadows. Toxicol Lett 220:82–87. https://doi.org/10.1016/j.toxlet.2013.04.010
Jiao L, Bi L, Lu Y, Wang Q, Gong Y, Shi J, Xu L (2018) Cancer chemoprevention and therapy using Chinese herbal medicine. Biol Proced Online 20(1):1–14. https://doi.org/10.1186/s12575-017-0066-1
Jing L-P, Huang H-K, Shi H-B (2002) Improved feature selection approach TFIDF in text mining. In: Proceedings of international conference on machine learning and cybernetics, vol 942, pp 944–946. https://doi.org/10.1109/ICMLC.2002.1174522
Kajima S, Tanaka Y, Uchiyama Y (2017) Japanese sake and tea as place-based products: a comparison of regional certifications of globally important agricultural heritage systems, geopark, biosphere reserves, and geographical indication at product level certification. J Ethnic Foods 4:80–87. https://doi.org/10.1016/j.jef.2017.05.006
Kao Y-H, Chang H-H, Lee M-J, Chen C-L (2006) Tea, obesity, and diabetes. Mol Nutr Food Res 50:188–210. https://doi.org/10.1002/mnfr.200500109
Kiselev V-I, Ashrafyan L-A, Muyzhnek E-L, Gerfanova E-V, Antonova I-B, Aleshikova O-I, Sarkar F-H (2018) A new promising way of maintenance therapy in advanced ovarian cancer: a comparative clinical study. BMC Cancer 18, Article number: 904. https://doi.org/10.1186/s12885-018-4792-9
Koch KR (2007) Gibbs sampler by sampling-importance-resampling. J Geod 81:581–591. https://doi.org/10.1007/s00190-006-0121-1
Koga T, Meydani M (2001) Effect of plasma metabolites of (+)-catechin and quercetin on monocyte adhesion to human aortic endothelial cells. Am J Clin Nutr 73:941–948. https://doi.org/10.1093/ajcn/73.5.941
Kohsaka R, Matsuoka H (2015) Analysis of Japanese municipalities with Geopark, MAB, and GIAHS certification: quantitative approach to official records with text-mining methods. SAGE Open 5:1–10. https://doi.org/10.1177/2158244015617517
Mahmood T, Naveed A, Khan B (2010) The morphology, characteristics, and medicinal properties of Camellia sinensis’ tea. J Med Plant Res 4(19):2028–2033. https://doi.org/10.5897/JMPR10.010
Mair VH, Hoh E (2009) The true history of tea. Thames and Hudson, New York
Marcos-Pablos S, García-Peñalvo FJ (2020) Information retrieval methodology for aiding scientific database search. Soft Comput 24:5551–5560. https://doi.org/10.1007/s00500-018-3568-0
Martin MA, Goya L, Ramos S (2017) Protective effects of tea, red wine and cocoa in diabetes. Evidences from human studies. Food Chem Toxicol 109:302–314. https://doi.org/10.1016/j.fct.2017.09.015
Mietzner D, Reger G (2005) Advantages and disadvantages of Scenario approaches for strategic foresight. Int J Technol Intell Plan 1:220–239. https://doi.org/10.1504/IJTIP.2005.006516
Moro S, Cortez P, Rita P (2015) Business intelligence in banking: a literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Syst Appl 42:1314–1324. https://doi.org/10.1016/j.eswa.2014.09.024
Munday R (2016) Tea: health effects. In: Caballero B, Finglas PM, Toldrá F (eds) Encyclopedia of food and health. Academic Press, Oxford, pp 273–278. https://doi.org/10.1016/B978-0-12-384947-2.00686-3
Nechuta S, Shu X-O, Li H-L, Yang G, Ji B-T, Xiang Y-B, Cai H, Chow W-H, Gao Y-T, Zheng W (2012) Prospective cohort study of tea consumption and risk of digestive system cancers: Results from the Shanghai Women’s Health Study. Am J Clin Nutr 96:1056–1063. https://doi.org/10.3945/ajcn.111.031419
Neyestani TR, Shariatzade N, Kalayi A, Gharavi A, Khalaji N, Dadkhah M, Zowghi T, Haidari H, Shab-bidar S (2010) Regular daily intake of black tea improves oxidative stress biomarkers and decreases serum C-reactive protein levels in type 2 diabetic patients. Ann Nutr Metab 57:40–49. https://doi.org/10.1159/000312666
Pastoriza S, Mesías M, Cabrera C, Rufián-Henares JA (2017) Healthy properties of green and white teas: an update. Food Funct 8:2650–2662. https://doi.org/10.1039/C7FO00611J
Rady I, Mohameda H, Rady M, Siddiqui I-A, Mukhtara H (2018) Cancer preventive and therapeutic effects of EGCG, the major polyphenol in green tea. Egypt J Basic Applied Sci 5(1):1–23. https://doi.org/10.1016/j.ejbas.2017.12.001
Rashid J, Shah S-M-A, Irtaza A (2019) Fuzzy topic modeling approach for text mining over short text. Inf Process Manag 56(6):102060. https://doi.org/10.1016/j.ipm.2019.102060
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, New York
Sangaiah AK, Tirkolaee EB, Goli A, Dehnavi-Arani S (2019a) Robust optimization and mixed-integer linear programming model for LNG supply chain planning problem. Soft Comput. https://doi.org/10.1007/s00500-019-04010-6
Sangaiah AK, Suraki MY, Sadeghilalimi M, Bozorgi SM, Hosseinabadi AAR, Wang J (2019b) A new meta-heuristic algorithm for solving the flexible dynamic job-shop problem with parallel machines. Symmetry 11(2):165. https://doi.org/10.3390/sym11020165
Sangaiah AK, Medhane DV, Han T, Hossain MS, Muhammad G (2019c) Enforcing position-based confidentiality with machine learning paradigm through mobile edge computing in real-time industrial informatics. IEEE Trans Ind Inf 15(7):4189–4196. https://doi.org/10.1109/TII.2019.2898174
Shelton S, Badejo E (2018) Does green tea reduce the risk of breast cancer? Evid-Based Pract 21:48. https://doi.org/10.1097/01.EBP.0000545084.51626.b7
Shen L, Song L-G, Ma H, Jin C-N, Wang J-A, Xiang M-X (2012) Tea consumption and risk of stroke: a dose–response meta-analysis of prospective studies. J Zhejiang Univ Sci B 13:652–662. https://doi.org/10.1631/jzus.B1201001
Tian L, Huang J (2019) Antioxidant effects of tea catechins on the shelf life of raw minced duck meat. Food Sci Technol 39(1):59–65. https://doi.org/10.1590/fst.25217
van Dieren S, Uiterwaal CSPM, van der Schouw YT, van der A DL, Boer JMA, Spijkerman A, Grobbee DE, Beulens JWJ (2009) Coffee and tea consumption and risk of type 2 diabetes. Diabetologia 52:2561–2569. https://doi.org/10.1007/s00125-009-1516-3
Vo D-T, Ock C-Y (2015) Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst Appl 42:1684–1698. https://doi.org/10.1016/j.eswa.2014.09.031
Wei W, Guo C, Chen J, Tang L, Sun L (2019) CCODM: conditional co-occurrence degree matrix document representation method. Soft Comput 23:1239–1255. https://doi.org/10.1007/s00500-017-2844-8
Wolfram S (2007) Effects of green tea and EGCG on cardiovascular and metabolic health. J Am Coll Nutr 26:373S–388S. https://doi.org/10.1080/07315724.2007.10719626
Xie X, Ge S, Hu F, Xie M, Jiang N (2019) An improved algorithm for sentiment analysis based on maximum entropy. Soft Comput 23:599–611. https://doi.org/10.1007/s00500-017-2904-0
Xu B, Lin H, Lin Y, Guan Y (2020) Integrating social annotations into topic models for personalized document retrieval. Soft Comput 24:1707–1716. https://doi.org/10.1007/s00500-019-03998-1
Xuan J, Lu J, Zhang G (2019) Cooperative hierarchical Dirichlet processes: superposition vs. maximization. Artif Intell 271:43–73. https://doi.org/10.1016/j.artint.2018.10.005
Yao LH, Jiang YM, Shi J, Tomas-Barberan FA, Datta N, Singanusong R, Chen SS (2004) Flavonoids in food and their health benefits. Plant Foods Hum Nutr 59:113–122. https://doi.org/10.1007/s11130-004-0049-7
Zahedi E, Saraee M (2018) SSAM: Toward supervised sentiment and aspect modeling on different levels of labeling. Soft Comput 22:7989–8000. https://doi.org/10.1007/s00500-017-2746-9
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, YS., Cheng, CH. & Hung, WL. A systematic review to identify the effects of tea by integrating an intelligence-based hybrid text mining and topic model. Soft Comput 25, 3291–3315 (2021). https://doi.org/10.1007/s00500-020-05377-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05377-7