Deep Learning Based Topic Identification and Categorization: Mining Diabetes-Related Topics on Chinese Health Websites

Chen, Xinhuan; Zhang, Yong; Xu, Jennifer; Xing, Chunxiao; Chen, Hsinchun

doi:10.1007/978-3-319-32025-0_30

Xinhuan Chen¹⁹,
Yong Zhang¹⁹,
Jennifer Xu²⁰,
Chunxiao Xing¹⁹ &
…
Hsinchun Chen^19,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9642))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3727 Accesses
6 Citations

Abstract

As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We consider only F-measure here because it is a comprehensive metric incorporating both Precision and Recall.

References

Yang, H., Kundakcioglu, E., Li, J., Wu, T.F., Mitchell, J.R., Hara, A.K., Pavlicek, W., Hu, L.S., Silva, A.C., Zwart, C.M., et al.: Healthcare intelligence: turning data into knowledge. IEEE Intell. Syst. 29(3), 54–68 (2014)
Article Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Xia, R., Zong, C., Hu, X., Cambria, E.: Feature ensemble plus sample selection: domain adaptation for sentiment classification. IEEE Intell. Syst. 28(3), 10–18 (2013)
Article Google Scholar
Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010)
Article Google Scholar
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: Autoclass: a bayesian classification system. In: Readings in Knowledge Acquisition and Learning. pp. 431–441. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Yang, Y.: Expert network: effective and efficient learning from human decisions in text categorization and retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 13–22. Springer-Verlag New York, Inc., New York (1994)
Google Scholar
Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: International Joint Conference on Neural Networks, 1989 IJCNN, pp. 593–605. IEEE (1989)
Google Scholar
Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M., Strachan, R.: Hybrid decision tree and naive bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4), 1937–1946 (2014)
Article Google Scholar
Ghahabi, O., Hernando, J.: Deep belief networks for i-vector based speaker recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1700–1704. IEEE (2014)
Google Scholar
Liu, T.: A novel text classification approach based on deep belief network. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part I. LNCS, vol. 6443, pp. 314–321. Springer, Heidelberg (2010)
Google Scholar
Wang, B., Liu, B., Wang, X., Sun, C., Zhang, D.: Deep learning approaches to semantic relevance modeling for chinese question-answer pairs. ACM Trans. Asian Lang. Inf. Process. (TALIP) 10(4), 21 (2011)
Google Scholar
Lin, Y., Brown, R., Yang, H., Li, S., Lu, H., Chen, H.: Data mining large-scale electronic health records for clinical support. IEEE Intell. Syst. 26(5), 87–90 (2011)
Google Scholar
Klahold, A., Uhr, P., Ansari, F., Fathi, M.: Using word association to detect multi-topic structures in text documents. IEEE Intell. Syst. 29(5), 40–46 (2014)
Article Google Scholar
Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B.E., Dalianis, H., Chapman, W.W.: Cue-based assertion classification for Swedish clinical text—developing a lexicon for pyConTextSwe. Artif. Intell. Med. 61(3), 137–144 (2014)
Article Google Scholar
Liu, W., Sweeney, H.J., Chung, B., Glance, D.G.: Constructing consumer-oriented medical terminology from the web a supervised classifier ensemble approach. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 770–781. Springer, Heidelberg (2014)
Google Scholar
Minarro-Giménez, J.A., Marín-Alonso, O., Samwald, M.: Exploring the application of deep learning techniques on medical text corpora. Stud. Health Technol. Inform. 205, 584–588 (2013)
Google Scholar
Sibunruang, C., Polpinij, J.: Ontology-based text classification for filtering cholangiocarcinoma documents from PubMed. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 266–277. Springer, Heidelberg (2014)
Google Scholar
Sinha, A.P., Zhao, H.: Incorporating domain knowledge into data mining classifiers: an application in indirect lending. Decis. Support Syst. 46(1), 287–299 (2008)
Article Google Scholar
Liu, X., Cai, L., Akiyoshi, M., Komoda, N.: A classification method of knowledge cards in Japanese and Chinese by using domain-specific dictionary. In: Omatu, S., Paz Santana, J.F., González, S.R., Molina, J.M., Bernardos, A.M., Rodríguez, J.M.C. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 151, pp. 453–460. Springer, Heidelberg (2012)
Chapter Google Scholar
Omesoft: Diabetes dictionary. http://shouji.baidu.com/software/item?docid=1018036888&from=as
Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approximate Reasoning 50(7), 969–978 (2009)
Article Google Scholar
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10, 1 (2010)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. the. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Rutgers University (2001)
Google Scholar

Download references

Acknowledgements

The research presented in this paper has been funded by grants from the National High-tech R&D Program of China (Grant No. SS2015AA020102), the US NSF grant (IIP-1417181), the 1000-Talent program, and Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Research Institute of Information Technology, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, China
Xinhuan Chen, Yong Zhang, Chunxiao Xing & Hsinchun Chen
Department of Computer Information Systems, Bentley University, Waltham, USA
Jennifer Xu
MIS Department, University of Arizona, Tucson, USA
Hsinchun Chen

Authors

Xinhuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chunxiao Xing
View author publications
You can also search for this author in PubMed Google Scholar
Hsinchun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinhuan Chen .

Editor information

Editors and Affiliations

Georgia Institute of Technology, Atlanta, Georgia, USA
Shamkant B. Navathe
University of Texas at Dallas, Richardson, Texas, USA
Weili Wu
University of Minnesota, Minneapolis, Minnesota, USA
Shashi Shekhar
Renmin University, Beijing, China
Xiaoyong Du
Fudan University, Shanghai, China
X. Sean Wang
Rutgers, The State University of New Jer, New Brunswick, New Jersey, USA
Hui Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Zhang, Y., Xu, J., Xing, C., Chen, H. (2016). Deep Learning Based Topic Identification and Categorization: Mining Diabetes-Related Topics on Chinese Health Websites. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, X., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9642. Springer, Cham. https://doi.org/10.1007/978-3-319-32025-0_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-32025-0_30
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32024-3
Online ISBN: 978-3-319-32025-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics