Abstract
As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We consider only F-measure here because it is a comprehensive metric incorporating both Precision and Recall.
References
Yang, H., Kundakcioglu, E., Li, J., Wu, T.F., Mitchell, J.R., Hara, A.K., Pavlicek, W., Hu, L.S., Silva, A.C., Zwart, C.M., et al.: Healthcare intelligence: turning data into knowledge. IEEE Intell. Syst. 29(3), 54–68 (2014)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Xia, R., Zong, C., Hu, X., Cambria, E.: Feature ensemble plus sample selection: domain adaptation for sentiment classification. IEEE Intell. Syst. 28(3), 10–18 (2013)
Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010)
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: Autoclass: a bayesian classification system. In: Readings in Knowledge Acquisition and Learning. pp. 431–441. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Yang, Y.: Expert network: effective and efficient learning from human decisions in text categorization and retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 13–22. Springer-Verlag New York, Inc., New York (1994)
Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: International Joint Conference on Neural Networks, 1989 IJCNN, pp. 593–605. IEEE (1989)
Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M., Strachan, R.: Hybrid decision tree and naive bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4), 1937–1946 (2014)
Ghahabi, O., Hernando, J.: Deep belief networks for i-vector based speaker recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1700–1704. IEEE (2014)
Liu, T.: A novel text classification approach based on deep belief network. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part I. LNCS, vol. 6443, pp. 314–321. Springer, Heidelberg (2010)
Wang, B., Liu, B., Wang, X., Sun, C., Zhang, D.: Deep learning approaches to semantic relevance modeling for chinese question-answer pairs. ACM Trans. Asian Lang. Inf. Process. (TALIP) 10(4), 21 (2011)
Lin, Y., Brown, R., Yang, H., Li, S., Lu, H., Chen, H.: Data mining large-scale electronic health records for clinical support. IEEE Intell. Syst. 26(5), 87–90 (2011)
Klahold, A., Uhr, P., Ansari, F., Fathi, M.: Using word association to detect multi-topic structures in text documents. IEEE Intell. Syst. 29(5), 40–46 (2014)
Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B.E., Dalianis, H., Chapman, W.W.: Cue-based assertion classification for Swedish clinical text—developing a lexicon for pyConTextSwe. Artif. Intell. Med. 61(3), 137–144 (2014)
Liu, W., Sweeney, H.J., Chung, B., Glance, D.G.: Constructing consumer-oriented medical terminology from the web a supervised classifier ensemble approach. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 770–781. Springer, Heidelberg (2014)
Minarro-GimĂ©nez, J.A., MarĂn-Alonso, O., Samwald, M.: Exploring the application of deep learning techniques on medical text corpora. Stud. Health Technol. Inform. 205, 584–588 (2013)
Sibunruang, C., Polpinij, J.: Ontology-based text classification for filtering cholangiocarcinoma documents from PubMed. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 266–277. Springer, Heidelberg (2014)
Sinha, A.P., Zhao, H.: Incorporating domain knowledge into data mining classifiers: an application in indirect lending. Decis. Support Syst. 46(1), 287–299 (2008)
Liu, X., Cai, L., Akiyoshi, M., Komoda, N.: A classification method of knowledge cards in Japanese and Chinese by using domain-specific dictionary. In: Omatu, S., Paz Santana, J.F., González, S.R., Molina, J.M., Bernardos, A.M., RodrĂguez, J.M.C. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 151, pp. 453–460. Springer, Heidelberg (2012)
Omesoft: Diabetes dictionary. http://shouji.baidu.com/software/item?docid=1018036888&from=as
Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approximate Reasoning 50(7), 969–978 (2009)
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10, 1 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. the. J. Mach. Learn. Res. 3, 993–1022 (2003)
Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Rutgers University (2001)
Acknowledgements
The research presented in this paper has been funded by grants from the National High-tech R&D Program of China (Grant No. SS2015AA020102), the US NSF grant (IIP-1417181), the 1000-Talent program, and Tsinghua University Initiative Scientific Research Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, X., Zhang, Y., Xu, J., Xing, C., Chen, H. (2016). Deep Learning Based Topic Identification and Categorization: Mining Diabetes-Related Topics on Chinese Health Websites. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, X., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9642. Springer, Cham. https://doi.org/10.1007/978-3-319-32025-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-32025-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32024-3
Online ISBN: 978-3-319-32025-0
eBook Packages: Computer ScienceComputer Science (R0)