Skip to main content
Log in

Topic categorization and representation of health community generated data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The representation and categorization of professional health provider released data have been well investigated and practically implemented. These have facilitated browsing, search and high-order learning of health information. On the other hand, there has been little corresponding studies on the representation and categorization of health community generated data. It is usually more complex, inconsistent and ambiguous, and consequently raises challenges for data access and analytics. This paper explores various representations for health community generated data and categorizes these data in terms of health topics. In addition, this work utilizes pseudo-labeled data to train the supervised topic categorization models, and this makes the whole categorization process unsupervised and extendable to handle large-scale data. The extensive experiments on two real-world datasets reveal our interesting findings of the informative representation approaches and effective categorization models for health community generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://pewinternet.org/Reports/2013/Health-online.aspx

  2. www.webmd.com

  3. https://www.healthtap.com

  4. www.patientslikeme.com

  5. http://health.yahoo.net

  6. www.drugs.com

  7. www.haodf.com

  8. http://nlp.stanford.edu/software/tagger.shtml

  9. In this work, D2 is a general English Gigaword data of Linguistic Data Consortium (http://www.ldc.upenn.edu/)

  10. http://nlp.stanford.edu/downloads/tmt/tmt-0.4/

References

  1. Babashzadeh A, Huang J, Daoud M (2013) Exploiting semantics for improving clinical information retrieval. Proceedings of the International ACM SIGIR Conference 801–804

  2. Blei D, Ng A, Jordan M, Lafferty J (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Chan W, Yang W, Tang J, et al (2013) Community question topic categorization via hierarchical kernelized classification. Proceedings of the 22nd ACM International Conference on Information and Knowledge Management 959–968

  4. Chang X, Yang Y, Xing E, Yu Y (2015) Complex event detection using semantic saliency and nearly-isotonic SVM. Proceedings of the 32nd International Conference on Machine Learning 1348–1357

  5. Hersh W, Hickam D, Haynes R, Mckibbon K (1994) A performance and failure analysis of SAPHIRE with a MEDLINE test collection. J Am Med Inform Assoc 1(1):51–60

    Article  Google Scholar 

  6. Hong R, Li G, Nie L, Tang J, Chua T (2010) Exploring large scale data for multimedia QA: an initial study. Proceedings of the ACM International Conference on Image and Video Retrieval 74–81

  7. Kanavos A, Makris C, Theodoridis E (2015) Topic categorization of biomedical abstracts. Int J Artif Intell Tools. doi:10.1142/S0218213015400047

    Google Scholar 

  8. Kim M and Goebel R (2010) Detection and normalization of medical terms using domain-specific term frequency and adaptive ranking. IEEE International Conference on Information Technology and Applications in Biomedicine 1–5

  9. Li J, Liu C, Liu B, Mao R, Wang Y, Chen S, Yang J, Pan H, Wang Q (2015) Diversity-aware retrieval of medical records. Comput Ind 69:81–91

    Article  Google Scholar 

  10. Limsopatham N, Macdonald C and Ounis I (2013a) A task-specific query and document representation for medical records search. Proceedings of the European Conference on Advances in Information Retrieval 747–751

  11. Limsopatham N, Macdonald C and Ounis I (2013b) Learning to combine representations for medical records search. Proceedings of the International ACM SIGIR Conference 833–836

  12. Nie L, Wang M, Zha Z, Li G, and Chua T (2011) Multimedia answering: Enriching text QA with media information. Proceedings of the International ACM SIGIR Conference 695–704

  13. Nie L, Wang M, Gao Y, Zha Z, Chua T (2013a) Beyond text QA: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441

    Article  Google Scholar 

  14. Nie L, ZhaoY WX, Shen J, Chua T (2013b) Learning to recommend descriptive tags for questions in social forums. ACM Trans Inf Syst 32(1):5. doi:10.1145/2559157

    Google Scholar 

  15. Nie L, Wang M, Zhang L, et al. (2014a) Disease inference from health-related questions via sparse deep learning. IEEE Trans Knowl Data Eng 27(8):2107–2119

    Article  Google Scholar 

  16. Nie L, Li T, Akbari M, Shen J, Chua T (2014b) WenZher: comprehensive vertical search for healthcare domain. Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval 1245–1246

  17. Nie L, Akbari M, Li T, Chua T (2014c) A joint local-global approach for medical terminology assignment. In Medical Information Retrieval Workshop at SIGIR 2014, 24–27

  18. Nie L, Zhao Y, Akbari M, Shen J, Chua T (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409

    Article  Google Scholar 

  19. Qu B, Cong G, Li C, et al. (2012) An evaluation of classification models for question topic categorization. J Am Soc Inf Sci Technol 63(5):889–903

    Article  Google Scholar 

  20. Srinivasan P (1996) Optimal document-indexing vocabulary for MEDLINE. Inform Process Manag 32:503–514

    Article  Google Scholar 

  21. Trieschnigg D, Hiemstra D, de Jong F and Kraaij W (2010) A cross-lingual framework for monolingual biomedical information retrieval. Proceedings of the ACM Conference on Information and Knowledge Management 169–178

  22. Velardi P, Missikoff M and Basili R (2001) Identification of relevant terms to support the construction of domain ontologies. Proceedings of the workshop on Human Language Technology and Knowledge Management, doi:10.3115/1118220.1118225.

  23. Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013a) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. Proceedings of 2013 I.E. International Conference on Computer Vision 1177–1184

  24. Yan Y, Xu Z, Liu G, Ma Z, Sebe N (2013b) GLocal structural feature selection with sparsity for multimedia data understanding, Proceedings of the ACM International Conference on Multimedia 537–540

  25. Yan Y, Shen H, Liu G, Ma Z, Gao C, Sebe N (2014) GLocal tells you more: coupling GLocal structural for feature selection with sparsity for image and video classification. Comput Vis Image Underst 124:99–109

    Article  Google Scholar 

  26. Yan Y, Ricci E, Liu G, Sebe N (2015a) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995

    Article  MathSciNet  Google Scholar 

  27. Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann A, Sebe N (2015b) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878

    Article  MathSciNet  Google Scholar 

  28. Yang S, White R and Horvitz E (2013) Pursuing insights about healthcare utilization via geocoded search queries. Proceedings of the International ACM SIGIR Conference 993–996

  29. Zhang W, Ming Z, Zhang Y, Nie L, Liu T, Chua T (2012) The use of dependency relation graph to enhance the term weighting in question retrieval. Proceedings of the 25th International Conference on Computational Linguistics 3105–3120

  30. Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084

    Article  MathSciNet  Google Scholar 

  31. Zhang L, Yang Y, Gao Y, Yu Y, Wang C, Li X (2014a) A probabilistic associative model for segmenting weakly supervised images. IEEE Trans Image Process 23(9):4150–4159

    Article  MathSciNet  Google Scholar 

  32. Zhang L, Gao Y, Ji R, Xia Y, Dai Q, Li X (2014b) Actively learning human gaze shifting paths for semantics-aware photo cropping. IEEE Trans Image Process 23(5):2235–2245

    Article  MathSciNet  Google Scholar 

  33. Zhang L, Gao Y, Xia Y, Lu K, Shen J, Ji R (2014c) Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans Multimedia 16(2):470–479

    Article  Google Scholar 

  34. Zhang L, Gao Y, Xia Y, Dai Q, Li X (2015a) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron 62(1):564–571

    Article  Google Scholar 

  35. Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015b) An effective video summarization framework toward handheld devices. IEEE Trans Ind Electron 62(2):1309–1316

    Article  Google Scholar 

  36. Zhu D and Carterette B (2013) An adaptive evidence weighting method for medical record search. Proceedings of the International ACM SIGIR Conference 1025–1028

Download references

Acknowledgments

The work presented in this paper is partially supported by the National Natural Science Foundation of China under Grant No. 61100133 and the Major Projects of National Social Science Foundation of China under Grant No. 11&ZD189.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maofu Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, M., Zhang, H., Hu, H. et al. Topic categorization and representation of health community generated data. Multimed Tools Appl 76, 10541–10553 (2017). https://doi.org/10.1007/s11042-015-3094-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3094-3

Keywords

Navigation