Abstract
Online health communities (OHCs) are an important source of social support for cancer survivors and their informal caregivers. This research attempted to identify leaders in a popular online forum for cancer survivors and caregivers using classification techniques. We first extracted user features from many different perspectives, including contributions, network centralities, and linguistic features. Based on these features, we leveraged the structure of the social network among users and generated new neighborhood-based and cluster-based features. Classification results revealed that these features are discriminative for leader identification. Using these features, we developed a hybrid approach based on an ensemble classifier that performs better than many traditional metrics. This research has implications for understanding and managing OHCs.
Similar content being viewed by others
Notes
As one-class SVM only provides binary decisions, we specify the “outlier” ratio in the classification process, so that the classifier identifies K influential users.
For esthetic purposes, the threshold of 15 was chosen to better illustrate the relationship between discussion boards. A lower threshold will increase the number of edges in the figure and a higher threshold will make the network sparser.
References
Agarwal N, Liu H, Tang L, Yu PS (2008) Identifying the influential bloggers in a community. In: Proceedings of the international conference on web search and web data mining. ACM, pp 207–218
Albert R, Jeong H, Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406:378–382
Anonymous (2008) Calling all patients. Nat Biotechnol 26:953. doi:10.1038/nbt0908-953
Bambina A (2007) Online social support: the interplay of social networks and computer-mediated communication. Cambria Press, Youngstown
Barraclough J (1999) Cancer and emotion: A practical guide to psycho-oncology, 3rd edn. Wiley, London
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022. doi:10.1162/jmlr.2003.3.4-5.993
Breiman L (2001) Random forests. Mach Learn 45:5–32. doi:10.1023/a:1010933404324
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117
Brownstein CA, Brownstein JS, Williams DS et al (2009) The power of social networking in medicine. Nat Biotechnol 27:888–890. doi:10.1038/nbt1009-888
Büttcher S, Clarke CLA, Cormack GV (2010) Information retrieval: implementing and evaluating search energies. MIT Press, Cambridge
Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in twitter: the million follower fallacy. In: Proceedings of the fourth international AAAI conference on weblogs and social media (ICWSM’10), pp 10–17
Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6:1–6. doi:10.1145/1007730.1007733
Chou WY, Hunt YM, Beckjord EB et al (2009) Social media use in the United States: implications for health communication. J Med Internet Res 11:e48. doi:10.2196/jmir.1249
Cobb NK, Graham AL, Abrams DB (2010) Social network structure of a large online community for smoking cessation. Am J Public Health 100:1282–1289. doi:10.2105/ajph.2009.165449
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Dunkel-Schetter C (1984) Social support and cancer: findings based on patient interviews and their implications. J Soc Issues 40:77–98. doi:10.1111/j.1540-4560.1984.tb01108.x
Fox S (2011) The social life of health information, 2011. Pew Research Center’s Internet & American Life Project
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41
Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Mark Lett 12:211–223
Hu M, Liu B (2004) Mining and summarizing customer reviews. ACM 1014073:168–177
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 137–146
Kononenko I, Simec E, Sikonja MR- (1997) Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl Intell 7:39–55
Ma X, Chen G, Xiao J (2010) Analysis of an online health social network. In: Proceedings of the 1st ACM international health informatics symposium. ACM, pp 297–306
Mitchell TM (1997) Machine learning. McGraw-Hill, New York
Muchnik L, Aral S, Taylor SJ (2013) Social influence bias: a randomized experiment. Science 341:647–651. doi:10.1126/science.1240466
Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103:8577–8582. doi:10.1073/pnas.0601602103
Ng A, Jordan M (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. Adv Neural Inf Process Syst 2:841–848
Qiu B, Zhao K, Mitra P et al (2011) Get online support, feel better—sentiment analysis and dynamics in an online cancer survivor community. In: Proceedings of the third IEEE third international conference on social computing (SocialCom’11), pp 274–281
Rolia J, Yao W, Basu S et al (2013) Tell me what i don’t know-making the most of social health forums. HP Labs
Seni G, Elder JF (2010) Ensemble methods in data mining: improving accuracy through combining predictions. Synth Lect Data Min Knowl Discov 2:1–126
Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Tax DMJ (2001) One-class classification: concept-learning in the absence of counter-examples. Technische Universiteit Delft
Watts D (2002) A simple model of global cascades on random networks. Proc Natl Acad Sci USA 99:5766–5771
WHO (2011) Cancer. Retrieved from http://www.who.int/mediacentre/factsheets/fs297/en/
Yang CC, Tang X, Thuraisingham BM (2010) An analysis of user influence ranking algorithms on dark web forums. In: ACM SIGKDD Workshop on Intelligence and Security Informatics. ACM, Washington DC
Zhang J, Ackerman MS, Adamic L (2007) Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 221–230
Zhao K, Kumar A (2013) Who blogs what: understanding the publishing behavior of bloggers. World Wide Web 16:621–644. doi:10.1007/s11280-012-0167-3
Zhao K, Yen J, Greer G et al (2014) Finding influential users of online health communities: a new metric based on sentiment influence. J Am Med Inform Assoc. 21(e2):e212–e218. doi:10.1136/amiajnl-2013-002282
Zhou H, Zeng D, Zhang C (2009) Finding leaders from opinion networks. In: Proceedings of IEEE international conference on intelligence and security informatics (ISI’09), Dallas, TX, pp 266–268
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, K., Greer, G.E., Yen, J. et al. Leader identification in an online health community for cancer survivors: a social network-based classification approach. Inf Syst E-Bus Manage 13, 629–645 (2015). https://doi.org/10.1007/s10257-014-0260-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10257-014-0260-5