Abstract
Depression has become the world’s fourth major disease. Compared with the high incidence, however, the rate of depression medical treatment is very low because of the difficulty of diagnosis of mental problems. The social media opens one window to evaluate the users’ mental status. With the rapid development of Internet, people are accustomed to express their thoughts and feelings through social media. Thus social media provides a new way to find out the potential depressed people. In this paper, we propose a multi-kernel SVM based model to recognize the depressed people. Three categories of features, user microblog text, user profile and user behaviors, are extracted from their social media to describe users’ situations. According to the new characteristics of social media language, we build a special emotional dictionary consisted of text emotional dictionary and emoticon dictionary to extract microblog text features for word frequency statistics. Considering the heterogeneity between text feature and another two features, we employ multi-kernel SVM methods to adaptively select the optimal kernel for different features to find out users who may suffer from depression. Compared with Naive Bayes, Decision Trees, KNN, single-kernel SVM and ensemble method (libD3C), whose error reduction rates are 38, 43, 22, 21 and 11% respectively, the error rate of multi-kernel SVM method for identifying the depressed people is reduced to 16.54%. This indicates that the multi-kernel SVM method is the most appropriate way to find out depressed people based on social media data.
Similar content being viewed by others
Notes
World Health Organization, http://www.who.int/topics/depression/en/.
References
Banitaan S, Daimi K (2014) Using data mining to predict possible future depression cases. Int J Public Health Sci 3(4):231–240
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Burns MN, Begale M, Duffecy J, Gergle D, Karr CJ, Giangrande E, Mohr DC (2011) Harnessing context sensing to develop a mobile intervention for depression. J Med Internet Res 13(3):e55
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of ICML, pp 161–168
Chapelle O, Vapnik V (1999) Model selection for support vector machines. In: Proceedings of NIPS, pp 230–236
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learn 20(3):273–297
Dang J, Li A, Erickson D, Suemitsu A, Akagi M, Sakuraba K (2010) Comparison of emotion perception among different cultures. Acoust Sci Technol 31(6):394–402
Dao B, Nguyen T, Phung D, Venkatesh S (2014) Effect of mood, social connectivity and age in online depression community via topic and linguistic analysis. In: Proceedings of International Conference on Web Information Systems Engineering, pp 398–407
Darwin C (1872) The expression of the emotions in man and animals, 1st edn. John Murray, London
De Choudhury M, Counts S, Horvitz E (2013) Social media as a measurement tool of depression in populations. In: Proceedings of the 5th Annual ACM Web Science Conference, pp 47–56
De Choudhury M, Counts S, Horvitz EJ, Hoff A (2014) Characterizing and predicting postpartum depression from shared Facebook data. In: Proceedings of the 17th ACM conference on Computer supported cooperative work and social computing, pp 626–638
De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings of the ICWSM, pp 128–137
Do H, Kalousis A, Woznica A, Hilario M (2009) Margin and radius based multiple kernel learning. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 330–343
Dong Z, Dong Q (2006) HowNet and the computation of meaning. World Scientific, Singapore, pp 85–95
Doryab A, Min JK, Wiese J, Zimmerman J, Hong JI (2014) Detection of behavior change in people with depression. In: Proceedings of AAAI Workshop on MAIHA, pp 12–16
Ekman P (1992) An argument for basic emotions. Cognit Emot 6(3–4):169–200
Ellison NB (2007) Social network sites: definition, history, and scholarship. J Comput Mediat Commun 13(1):210–230
Ellison NB, Steinfield C, Lampe C (2007) The benefits of Facebook “friends:” Social capital and college students’ use of online social network sites. J Comput Mediat Comm 12(4):1143–1168
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41(1):176–190
Fletcher R (1987) Practical methods of optimization, 2nd edn. Wiley, New Jersey
Gilbert E, Karahalios K (2009) Predicting tie strength with social media. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 211–220
Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Haimson OL, Ringland K, Simpson S, Wolf CT (2014) Using depression analytics to reduce stigma via social media: BlueFriends. iConference (Social Media Expo)
Heidi L (2014) Medical research: if depression were cancer. Nature 515(7526):182–184
Kerri S (2014) Mental health: a world of depression. Nature 515(7526):180–181
Ku L, Chen H (2007) Mining opinions from the Web: Beyond relevance retrieval. J Am Soc Inf Sci Technol 58(12):1838–1850
Lin C, Chen W, Qiu C, Wu Y, Krishnan S, Zou Q (2014) LibD3C: ensemble classifiers with a clustering and dynamic selection strategy. Neurocomputing 123:424–435
Lin C, Huang Z, Yang F, Zou Q (2012) Identify content quality in online social networks. IET Commun 6(12):1618–1624
Neuman Y, Cohen Y, Assaf D, Kedma G (2012) Proactive screening for depression through metaphorical and automatic text analysis. Artif Intell Med 56(1):19–25
Nguyen T, Phung D, Dao B, Venkatesh S, Berk M (2014) Affective and content analysis of online depression communities. IEEE Trans Affect Comput 5(3):217–226
Park M, Cha C, Cha M (2012) Depressive moods of users portrayed in Twitter. In: Proceedings of the ACM SIGKDD Workshop on healthcare informatics, pp 1–8
Park M, McDonald DW, Cha M (2013) Perception differences between the depressed and non-depressed users in Twitter. In: Proceedings of ICWSM, pp 476–485
Park S, Lee SW, Kwak J, Cha M, Jeong B (2013) Activities on Facebook reveal the depressive state of users. J Med Internet Res 15(10):e217
Peng S, Hu Q, Chen Y, Dang J (2015) Improved support vector machine algorithm for heterogeneous data. Pattern Recognit 48(6):2072–2083
Pennebaker JW, Francis ME, Booth RJ (2007) Linguistic inquiry and word count (Computer Software). LIWC Inc
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: Proceedings of the 24th ICML, pp 775–782
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) SimpleMKL. J Mach Learn Res 9:2491–2521
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill computer science series
Schoen H, Gayo-Avello D, Takis MP, Mustafaraj E, Strohmaier M, Gloor P (2013) The power of prediction with social media. Internet Res 23(5):528–543
Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media, Berlin
Wang T, Rao J, Hu Q (2014) Supervised word sense disambiguation using semantic diffusion kernel. Eng Appl Artif Intell 27:167–174
Wang X, Zhang C, Ji Y, Sun L, Wu L, Bao Z (2013) A depression detection model based on sentiment analysis in micro-blog social network. In: Trends and Applications in Knowledge Discovery and Data Mining, pp 201–213
Wang X, Zhang C, Sun L (2013) An improved model for depression detection in micro-blog social network. In: Proceedings of the IEEE 13th ICDMW, pp 80–87
Wang Z, Chen S, Sun T (2008) MultiK-MHKS: a novel multiple kernel learning algorithm. IEEE Trans Pattern Anal Mach Intell 30(2):348–353
Wilson ML, Ali S, Valstar MF (2014) Finding information about mental health in microblogging platforms: a case study of depression. In: Proceedings of the 5th Information Interaction in Context Symposium, pp 8–17
Xu L, Lin H, Pan Y, Ren H, Chen J (2008) Constructing the affective lexicon ontology. J China Soc Sci Tech Inf 27(2):180–185
Yang B, Ollendick TH, Dong Q et al (1995) Only children and children with siblings in the People’s Republic of China: levels of fear, anxiety, and depression. Child Dev 66(5):1301–1311
Zhang H, Yu H, Xiong D, Liu Q (2003) HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the second SIGHAN workshop on Chinese language processing, pp 184–187
Zou Q, Zeng J, Cao L, Ji R (2016) A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173:346–354
Acknowledgements
This work is partly supported by National Program on Key Basic Research Project under Grant 2013CB329304, National Natural Science Foundation of China under Grant 61222210 and New Century Excellent Talents in University under Grant NCET-12-0399.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Peng, Z., Hu, Q. & Dang, J. Multi-kernel SVM based depression recognition using social media data. Int. J. Mach. Learn. & Cyber. 10, 43–57 (2019). https://doi.org/10.1007/s13042-017-0697-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0697-1