Skip to main content
Log in

Toward understanding online sentiment expression: an interdisciplinary approach with subgroup comparison and visualization

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Understanding users’ sentiment expression in social media is important in many domains, such as marketing and online applications. Is one demographic group inherently different from another? Does a group express the same sentiment both in private and public? How can we compare the sentiments of different groups composed of multiple attributes? In this paper, we take an interdisciplinary approach toward mining the patterns of textual sentiments and metadata. First, we look into several existing hypotheses in social science on the interplay between user characteristics and sentiments, as well as the related evidence in the field of social network data analysis. Second, we present a dataset with unique features (Facebook users chats and posts in multiple languages) and a procedure to process the data. Third, we test our hypotheses on this dataset and interpret the results. Fourth, under the subgroup discovery paradigm, we present an approach with two algorithms that generalizes single-attribute testing. This approach provides more detailed insight into the relationships among attributes and reveals interesting attribute value combinations with distinct sentiments. It also offers novel hypotheses for examination in future studies. Fifth, because the number of mined subgroup comparisons can be large, we develop an exploratory visualization tool that summarizes the comparisons and highlights meta-patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://code.google.com/p/cld2/.

  2. Namely English, Dutch, French, German, Italian, Polish, Portuguese, Russian, Spanish, Swedish and Turkish.

  3. Due to the limited space of this paper, we summarize and selectively report the results of the post hoc pairwise tests in Sect. 4. A complete report can be found at http://beaugogh.github.io/visualizations/mcells/data/pairwise.

  4. because of the usage of Mann–Whitney U test.

  5. We use “relation.” to denote the relationship status “in a relationship.”

  6. The tasks that are unique in Yi et al. (2007) are marked with \(*\).

  7. The online article Krzywinski (2009) gives examples on the deficiencies of tables showing data.

  8. https://en.wikipedia.org/wiki/Chord_diagram.

  9. http://beaugogh.github.io/visualizations/mcells/.

  10. https://github.com/mbostock/d3/wiki/Force-Layout.

  11. In a force-directed graph layout, heavily connected nodes form clusters.

  12. The questions Q3 and Q4 also fall under this task description, because Q3 inquires about the relationship between two columns, and Q4 inquires about the relationship between a set of comparisons and their corresponding items.

References

  • Abbasi A, Hassan A, Dhar M (2014) Benchmarking twitter sentiment analysis tools. In: The international conference on language resources and evaluation. pp 823–829

  • Amar R, Eagan J, Stasko J, (2005) Low-level components of analytic activity in information visualization. In: IEEE Symposium on information visualization, 2005. INFOVIS 2005. IEEE, pp 111–117

  • De Wolf R, Gao B, Berendt B, Pierson J (2015) The promise of audience transparency: exploring users perceptions and behaviors towards visualizations of networked audiences on facebook. Telemat Inform 32(4):890–908

    Article  Google Scholar 

  • Dzyuba V, van Leeuwen M (2013) Interactive discovery of interesting subgroup sets. In: Advances in intelligent data analysis XII. Springer, pp 150–161

  • Furnas GW (1986) Generalized fisheye views, vol 17. ACM, New York

    Google Scholar 

  • Gratzl S, Lex A, Gehlenborg N, Pfister H, Streit M (2013) Lineup: visual analysis of multi-attribute rankings. IEEE Trans Vis Comput Graph 19(12):2277–2286

    Article  Google Scholar 

  • Gross JJ, Carstensen LL, Pasupathi M, Tsai J, Götestam Skorpen C, Hsu AY (1997) Emotion and aging: experience, expression, and control. Psychol Aging 12(4):590

    Article  Google Scholar 

  • Gross JJ, Richards JM, John OP (2006) Emotion regulation in everyday life. Emot Regul Couples Fam: Pathw Dysfunct Health 2006:13–35

    Google Scholar 

  • Holten D (2006) Hierarchical edge bundles: visualization of adjacency relations in hierarchical data. IEEE Trans Vis Comput Graph 12(5):741–748

    Article  Google Scholar 

  • Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, pp 249–271

  • Kramer AD, Guillory JE, Hancock JT (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci 111(24):8788–8790

    Article  Google Scholar 

  • Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621

    Article  MATH  Google Scholar 

  • Krzywinski M (2009) Circos visualizing tables, part I. http://circos.ca/presentations/articles/vis_tables1/

  • Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19(9):1639–1645

    Article  Google Scholar 

  • Leman D, Feelders A, Knobbe A (2008) Exceptional model mining. In: Machine learning and knowledge discovery in databases. Springer, pp 1–16

  • Leung CKS, Irani PP, Carmichael CL (2008a) Fisviz: a frequent itemset visualizer. In: Advances in knowledge discovery and data mining. Springer, pp 644–652

  • Leung CKS, Irani PP, Carmichael CL (2008b) Wifisviz: effective visualization of frequent itemsets. In: Eighth IEEE international conference on data mining, 2008. ICDM’08. IEEE, pp 875–880

  • Lui M, Baldwin T (2012) langid.py: An off-the-shelf language identification tool. In: Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, pp 25–30

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50–60

    Article  MathSciNet  MATH  Google Scholar 

  • Marco Lui TB (2014) Accurate language identification of Twitter messages. In: Proceedings of the 5th workshop on language analysis for social media (LASM)@EACL. pp 17–25

  • Munzner T, Kong Q, Ng RT, Lee J, Klawe J, Radulovic D, Leung CK (2005) Visual mining of power sets with large alphabets. Technical Report UBC CS TR-2005-25, Department of Computer Science, The University of British Columbia, Vancouver

  • Nakatani S (2011) Language detection library for Java. https://code.google.com/p/language-detection/

  • Pham T, Hess R, Ju C, Zhang E, Metoyer R (2010) Visualization of diversity in large multivariate data sets. IEEE Trans Vis Comput Graph 16(6):1053–1062

    Article  Google Scholar 

  • Rao R, Card SK (1995) Exploring large tables with the table lens. In: Conference companion on human factors in computing systems. ACM, pp 403–404

  • Scherer KR (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4):695–729

    Article  Google Scholar 

  • Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52:591–611

    Article  MathSciNet  MATH  Google Scholar 

  • Shiffrin RM, Schneider W (1977) Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychol Rev 84(2):127

    Article  Google Scholar 

  • Shneiderman B (1996) The eyes have it: A task by data type taxonomy for information visualizations. In: Visual languages, 1996. Proceedings., IEEE Symposium on. IEEE, pp 336–343

  • Siganos A, Vagenas-Nanos E, Verwijmeren P (2014) Facebook’s daily sentiment and international stock markets. J Econ Behav Organ 107:730–743

    Article  Google Scholar 

  • Stone AA, Schwartz JE, Broderick JE, Deaton A (2010) A snapshot of the age distribution of psychological well-being in the United States. Proc Natl Acad Sci 107(22):9985–9990

    Article  Google Scholar 

  • Taylor J (2009) Emotional experience and romantic relationship status in emerging adult college women and men. Colorado State University, Fort Collins

    Google Scholar 

  • Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010a) Sentiment strength detection in short informal text. J Am Soc Inf Sci Technol 61(12):2544–2558

    Article  Google Scholar 

  • Thelwall M, Wilkinson D, Uppal S (2010b) Data mining emotion in social network communication: gender differences in MySpace. J Am Soc Inf Sci Technol 61(1):190–199

    Article  Google Scholar 

  • Tukey JW (1977) Box-and-whisker plots. In: Exploratory data analysis, pp 39–43

  • van Leeuwen M, Knobbe A (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242

    Article  MathSciNet  Google Scholar 

  • Ware C (2012) Information visualization: perception for design. Elsevier, Amsterdam

    Google Scholar 

  • Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Principles of data mining and knowledge discovery. Springer, pp 78–87

  • Yap SC, Anusic I, Lucas RE (2012) Does personality moderate reaction and adaptation to major life events? Evidence from the british household panel survey. J Res Personal 46(5):477–488

    Article  Google Scholar 

  • Yi JS, ah Kang Y, Stasko JT, Jacko JA (2007) Toward a deeper understanding of the role of interaction in information visualization. IEEE Trans Vis Comput Graph 13(6):1224–1231

    Article  Google Scholar 

  • Zhou MX, Feiner SK (1998) Visual task characterization for automated visual discourse synthesis. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co, pp 392–399

Download references

Acknowledgments

We thank Prof. Thelwall for his support with the SentiStrength tool. The research presented in this paper was supported by the Strategic Basic Research (SBO) Program of the Flemish Agency for Innovation via Science and Technology (IWT) through the project SPION (Grant No. 100048), and from the organization Fund for Scientific Research for Flanders (FWO) through the project Data Mining for Privacy in Social Networks (Grant No. G068611N).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, B., Berendt, B. & Vanschoren, J. Toward understanding online sentiment expression: an interdisciplinary approach with subgroup comparison and visualization. Soc. Netw. Anal. Min. 6, 68 (2016). https://doi.org/10.1007/s13278-016-0385-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0385-2

Keywords

Navigation