Skip to main content

Deep Learning Based Topic Identification and Categorization: Mining Diabetes-Related Topics on Chinese Health Websites

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9642))

Included in the following conference series:

Abstract

As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We consider only F-measure here because it is a comprehensive metric incorporating both Precision and Recall.

References

  1. Yang, H., Kundakcioglu, E., Li, J., Wu, T.F., Mitchell, J.R., Hara, A.K., Pavlicek, W., Hu, L.S., Silva, A.C., Zwart, C.M., et al.: Healthcare intelligence: turning data into knowledge. IEEE Intell. Syst. 29(3), 54–68 (2014)

    Article  Google Scholar 

  2. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Xia, R., Zong, C., Hu, X., Cambria, E.: Feature ensemble plus sample selection: domain adaptation for sentiment classification. IEEE Intell. Syst. 28(3), 10–18 (2013)

    Article  Google Scholar 

  4. Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010)

    Article  Google Scholar 

  5. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., Freeman, D.: Autoclass: a bayesian classification system. In: Readings in Knowledge Acquisition and Learning. pp. 431–441. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  6. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  8. Yang, Y.: Expert network: effective and efficient learning from human decisions in text categorization and retrieval. In: Croft, B.W., van Rijsbergen, C.J. (eds.) Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 13–22. Springer-Verlag New York, Inc., New York (1994)

    Google Scholar 

  9. Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: International Joint Conference on Neural Networks, 1989 IJCNN, pp. 593–605. IEEE (1989)

    Google Scholar 

  10. Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M., Strachan, R.: Hybrid decision tree and naive bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4), 1937–1946 (2014)

    Article  Google Scholar 

  11. Ghahabi, O., Hernando, J.: Deep belief networks for i-vector based speaker recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1700–1704. IEEE (2014)

    Google Scholar 

  12. Liu, T.: A novel text classification approach based on deep belief network. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part I. LNCS, vol. 6443, pp. 314–321. Springer, Heidelberg (2010)

    Google Scholar 

  13. Wang, B., Liu, B., Wang, X., Sun, C., Zhang, D.: Deep learning approaches to semantic relevance modeling for chinese question-answer pairs. ACM Trans. Asian Lang. Inf. Process. (TALIP) 10(4), 21 (2011)

    Google Scholar 

  14. Lin, Y., Brown, R., Yang, H., Li, S., Lu, H., Chen, H.: Data mining large-scale electronic health records for clinical support. IEEE Intell. Syst. 26(5), 87–90 (2011)

    Google Scholar 

  15. Klahold, A., Uhr, P., Ansari, F., Fathi, M.: Using word association to detect multi-topic structures in text documents. IEEE Intell. Syst. 29(5), 40–46 (2014)

    Article  Google Scholar 

  16. Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B.E., Dalianis, H., Chapman, W.W.: Cue-based assertion classification for Swedish clinical text—developing a lexicon for pyConTextSwe. Artif. Intell. Med. 61(3), 137–144 (2014)

    Article  Google Scholar 

  17. Liu, W., Sweeney, H.J., Chung, B., Glance, D.G.: Constructing consumer-oriented medical terminology from the web a supervised classifier ensemble approach. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 770–781. Springer, Heidelberg (2014)

    Google Scholar 

  18. Minarro-Giménez, J.A., Marín-Alonso, O., Samwald, M.: Exploring the application of deep learning techniques on medical text corpora. Stud. Health Technol. Inform. 205, 584–588 (2013)

    Google Scholar 

  19. Sibunruang, C., Polpinij, J.: Ontology-based text classification for filtering cholangiocarcinoma documents from PubMed. In: Ślȩzak, D., Tan, A.-H., Peters, J.F., Schwabe, L. (eds.) BIH 2014. LNCS, vol. 8609, pp. 266–277. Springer, Heidelberg (2014)

    Google Scholar 

  20. Sinha, A.P., Zhao, H.: Incorporating domain knowledge into data mining classifiers: an application in indirect lending. Decis. Support Syst. 46(1), 287–299 (2008)

    Article  Google Scholar 

  21. Liu, X., Cai, L., Akiyoshi, M., Komoda, N.: A classification method of knowledge cards in Japanese and Chinese by using domain-specific dictionary. In: Omatu, S., Paz Santana, J.F., González, S.R., Molina, J.M., Bernardos, A.M., Rodríguez, J.M.C. (eds.) Distributed Computing and Artificial Intelligence. AISC, vol. 151, pp. 453–460. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. Omesoft: Diabetes dictionary. http://shouji.baidu.com/software/item?docid=1018036888&from=as

  23. Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approximate Reasoning 50(7), 969–978 (2009)

    Article  Google Scholar 

  24. Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10, 1 (2010)

    Google Scholar 

  25. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. the. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  26. Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Rutgers University (2001)

    Google Scholar 

Download references

Acknowledgements

The research presented in this paper has been funded by grants from the National High-tech R&D Program of China (Grant No. SS2015AA020102), the US NSF grant (IIP-1417181), the 1000-Talent program, and Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinhuan Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, X., Zhang, Y., Xu, J., Xing, C., Chen, H. (2016). Deep Learning Based Topic Identification and Categorization: Mining Diabetes-Related Topics on Chinese Health Websites. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, X., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9642. Springer, Cham. https://doi.org/10.1007/978-3-319-32025-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32025-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32024-3

  • Online ISBN: 978-3-319-32025-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics