Abstract
Understanding users’ sentiment expression in social media is important in many domains, such as marketing and online applications. Is one demographic group inherently different from another? Does a group express the same sentiment both in private and public? How can we compare the sentiments of different groups composed of multiple attributes? In this paper, we take an interdisciplinary approach toward mining the patterns of textual sentiments and metadata. First, we look into several existing hypotheses in social science on the interplay between user characteristics and sentiments, as well as the related evidence in the field of social network data analysis. Second, we present a dataset with unique features (Facebook users chats and posts in multiple languages) and a procedure to process the data. Third, we test our hypotheses on this dataset and interpret the results. Fourth, under the subgroup discovery paradigm, we present an approach with two algorithms that generalizes single-attribute testing. This approach provides more detailed insight into the relationships among attributes and reveals interesting attribute value combinations with distinct sentiments. It also offers novel hypotheses for examination in future studies. Fifth, because the number of mined subgroup comparisons can be large, we develop an exploratory visualization tool that summarizes the comparisons and highlights meta-patterns.
Similar content being viewed by others
Notes
Namely English, Dutch, French, German, Italian, Polish, Portuguese, Russian, Spanish, Swedish and Turkish.
Due to the limited space of this paper, we summarize and selectively report the results of the post hoc pairwise tests in Sect. 4. A complete report can be found at http://beaugogh.github.io/visualizations/mcells/data/pairwise.
because of the usage of Mann–Whitney U test.
We use “relation.” to denote the relationship status “in a relationship.”
The tasks that are unique in Yi et al. (2007) are marked with \(*\).
The online article Krzywinski (2009) gives examples on the deficiencies of tables showing data.
In a force-directed graph layout, heavily connected nodes form clusters.
The questions Q3 and Q4 also fall under this task description, because Q3 inquires about the relationship between two columns, and Q4 inquires about the relationship between a set of comparisons and their corresponding items.
References
Abbasi A, Hassan A, Dhar M (2014) Benchmarking twitter sentiment analysis tools. In: The international conference on language resources and evaluation. pp 823–829
Amar R, Eagan J, Stasko J, (2005) Low-level components of analytic activity in information visualization. In: IEEE Symposium on information visualization, 2005. INFOVIS 2005. IEEE, pp 111–117
De Wolf R, Gao B, Berendt B, Pierson J (2015) The promise of audience transparency: exploring users perceptions and behaviors towards visualizations of networked audiences on facebook. Telemat Inform 32(4):890–908
Dzyuba V, van Leeuwen M (2013) Interactive discovery of interesting subgroup sets. In: Advances in intelligent data analysis XII. Springer, pp 150–161
Furnas GW (1986) Generalized fisheye views, vol 17. ACM, New York
Gratzl S, Lex A, Gehlenborg N, Pfister H, Streit M (2013) Lineup: visual analysis of multi-attribute rankings. IEEE Trans Vis Comput Graph 19(12):2277–2286
Gross JJ, Carstensen LL, Pasupathi M, Tsai J, Götestam Skorpen C, Hsu AY (1997) Emotion and aging: experience, expression, and control. Psychol Aging 12(4):590
Gross JJ, Richards JM, John OP (2006) Emotion regulation in everyday life. Emot Regul Couples Fam: Pathw Dysfunct Health 2006:13–35
Holten D (2006) Hierarchical edge bundles: visualization of adjacency relations in hierarchical data. IEEE Trans Vis Comput Graph 12(5):741–748
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, pp 249–271
Kramer AD, Guillory JE, Hancock JT (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci 111(24):8788–8790
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621
Krzywinski M (2009) Circos visualizing tables, part I. http://circos.ca/presentations/articles/vis_tables1/
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19(9):1639–1645
Leman D, Feelders A, Knobbe A (2008) Exceptional model mining. In: Machine learning and knowledge discovery in databases. Springer, pp 1–16
Leung CKS, Irani PP, Carmichael CL (2008a) Fisviz: a frequent itemset visualizer. In: Advances in knowledge discovery and data mining. Springer, pp 644–652
Leung CKS, Irani PP, Carmichael CL (2008b) Wifisviz: effective visualization of frequent itemsets. In: Eighth IEEE international conference on data mining, 2008. ICDM’08. IEEE, pp 875–880
Lui M, Baldwin T (2012) langid.py: An off-the-shelf language identification tool. In: Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, pp 25–30
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50–60
Marco Lui TB (2014) Accurate language identification of Twitter messages. In: Proceedings of the 5th workshop on language analysis for social media (LASM)@EACL. pp 17–25
Munzner T, Kong Q, Ng RT, Lee J, Klawe J, Radulovic D, Leung CK (2005) Visual mining of power sets with large alphabets. Technical Report UBC CS TR-2005-25, Department of Computer Science, The University of British Columbia, Vancouver
Nakatani S (2011) Language detection library for Java. https://code.google.com/p/language-detection/
Pham T, Hess R, Ju C, Zhang E, Metoyer R (2010) Visualization of diversity in large multivariate data sets. IEEE Trans Vis Comput Graph 16(6):1053–1062
Rao R, Card SK (1995) Exploring large tables with the table lens. In: Conference companion on human factors in computing systems. ACM, pp 403–404
Scherer KR (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4):695–729
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52:591–611
Shiffrin RM, Schneider W (1977) Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychol Rev 84(2):127
Shneiderman B (1996) The eyes have it: A task by data type taxonomy for information visualizations. In: Visual languages, 1996. Proceedings., IEEE Symposium on. IEEE, pp 336–343
Siganos A, Vagenas-Nanos E, Verwijmeren P (2014) Facebook’s daily sentiment and international stock markets. J Econ Behav Organ 107:730–743
Stone AA, Schwartz JE, Broderick JE, Deaton A (2010) A snapshot of the age distribution of psychological well-being in the United States. Proc Natl Acad Sci 107(22):9985–9990
Taylor J (2009) Emotional experience and romantic relationship status in emerging adult college women and men. Colorado State University, Fort Collins
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010a) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558
Thelwall M, Wilkinson D, Uppal S (2010b) Data mining emotion in social network communication: gender differences in MySpace. J Am Soc Inf Sci Technol 61(1):190–199
Tukey JW (1977) Box-and-whisker plots. In: Exploratory data analysis, pp 39–43
van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242
Ware C (2012) Information visualization: perception for design. Elsevier, Amsterdam
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Principles of data mining and knowledge discovery. Springer, pp 78–87
Yap SC, Anusic I, Lucas RE (2012) Does personality moderate reaction and adaptation to major life events? Evidence from the british household panel survey. J Res Personal 46(5):477–488
Yi JS, ah Kang Y, Stasko JT, Jacko JA (2007) Toward a deeper understanding of the role of interaction in information visualization. IEEE Trans Vis Comput Graph 13(6):1224–1231
Zhou MX, Feiner SK (1998) Visual task characterization for automated visual discourse synthesis. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co, pp 392–399
Acknowledgments
We thank Prof. Thelwall for his support with the SentiStrength tool. The research presented in this paper was supported by the Strategic Basic Research (SBO) Program of the Flemish Agency for Innovation via Science and Technology (IWT) through the project SPION (Grant No. 100048), and from the organization Fund for Scientific Research for Flanders (FWO) through the project Data Mining for Privacy in Social Networks (Grant No. G068611N).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, B., Berendt, B. & Vanschoren, J. Toward understanding online sentiment expression: an interdisciplinary approach with subgroup comparison and visualization. Soc. Netw. Anal. Min. 6, 68 (2016). https://doi.org/10.1007/s13278-016-0385-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0385-2