skip to main content
10.1145/3366424.3383296acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

A Hierarchical Clustering Algorithm for Characterizing Social Media Users

Published: 20 April 2020 Publication History

Abstract

In this paper we propose a method to characterize user behavior from their engagement with enterprise social media. Content analysis often suffers challenges due to noise. Here we study behavior using temporal activity, i.e., the number of posts per month represented as a time series. User posting volume on social media has a long tailed nature. It causes time series clustering algorithms to result in unbalanced clusters with either very few users or almost all users. Thus we propose a hierarchical time series clustering algorithm to group users according to their behavioral homogeneity and provide interpretable characterizations to the resulting clusters. Users in distinct clusters deviate significantly in their topics of interest while being homophilic (near identical or similar minded) within the cluster. Goodness of the clustering is observed over Enterprise Social Media (ESM); Stackexchange; and Linux Kernel Mailing List (LKML) datasets as opposed to existing clustering techniques.

References

[1]
Anthony Bagnall, Luke Davis, Jon Hills, and Jason Lines. 2012. Transformation Based Ensembles for Time Series Classification. In Proceedings of the 2012 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Anaheim, California, USA, 307–318. https://doi.org/10.1137/1.9781611972825.27
[2]
Anthony Bagnall, Jason Lines, Jon Hills, and Aaron Bostrom. 2015. Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles. IEEE Transactions on Knowledge and Data Engineering 27, 9 (Sep. 2015), 2522–2535. https://doi.org/10.1109/TKDE.2015.2416723
[3]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning research 3, Jan (2003), 993–1022.
[4]
Ryan L Boyd and James W Pennebaker. 2017. Language-based personality: a new approach to personality in a digital world. Current Opinion in Behavioral Sciences 18 (2017), 63 – 68. https://doi.org/10.1016/j.cobeha.2017.07.017 Big data in the behavioural sciences.
[5]
Cindy K. Chung and James W. Pennebaker. 2007. The psychological function of function words. In K. Fiedler (Ed.), Social communication: Frontiers of social psychology. Psychology Press, New York, 343–359.
[6]
David L. Davies and Donald W. Bouldin. 1979. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1, 2 (April 1979), 224–227. https://doi.org/10.1109/TPAMI.1979.4766909
[7]
Lipika Dey and Bhakti Gaonkar. 2012. Discovering regular and consistent behavioral patterns in topical tweeting. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, Tsukuba, Japan, 3464–3467.
[8]
Lipika Dey, Sameera Bharadwaja H., and Shefali Bhat. 2012. An Ontology-Based Mining of Consumer Feedbacks Using Fuzzy Reasoning. In Proceedings of The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01(WI-IAT ’12). IEEE Computer Society, Washington, DC, USA, 568–572. http://dl.acm.org/citation.cfm?id=2457524.2457648
[9]
David Easley and Jon Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, New York, NY, USA.
[10]
John W. Eaton. 1988. GNU Octave. https://www.gnu.org/software/octave.
[11]
Linux Foundation. 1994. Linux Kernel Mailing List Archive. http://lkml.iu.edu/hypermail/linux/kernel/
[12]
Ben D. Fulcher and Nick S. Jones. 2014. Highly Comparative Feature-Based Time-Series Classification. IEEE Transactions on Knowledge and Data Engineering 26, 12 (Dec 2014), 3026–3037. https://doi.org/10.1109/TKDE.2014.2316504
[13]
Toni Giorgino. 2009. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software, Articles 31, 7 (2009), 1–24. https://doi.org/10.18637/jss.v031.i07
[14]
Josif Grabocka, Nicolas Schilling, Martin Wistuba, and Lars Schmidt-Thieme. 2014. Learning Time-series Shapelets. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’14). ACM, New York, NY, USA, 392–401. https://doi.org/10.1145/2623330.2623613
[15]
Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and Anthony Bagnall. 2014. Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28, 4 (01 Jul 2014), 851–881. https://doi.org/10.1007/s10618-013-0322-1
[16]
Michal Kosinski, David Stillwell, and Thore Graepel. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences 110, 15(2013), 5802–5805. https://doi.org/10.1073/pnas.1218772110
[17]
Hrvoje Niksic. 2017. GNU Wget Software. https://www.gnu.org/software/wget/
[18]
Guolin Niu, Yi Long, and Victor O. K. Li. 2014. Temporal Behavior of Social Network Users in Information Diffusion. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Vol. 2. IEEE Computer Society, Warsaw, Poland, 150–157. https://doi.org/10.1109/WI-IAT.2014.92
[19]
Liadan O’Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, and Rajeev Motwani. 2002. Streaming-data algorithms for high-quality clustering. In Proceedings of the 18th International Conference on Data Engineering. IEEE Computer Society, San Jose, California, USA, 685–694. https://doi.org/10.1109/ICDE.2002.994785
[20]
Daniel Schneider, Scott Spurlock, and Megan Squire. 2016. Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List. In Proceedings of the 12th International Symposium on Open Collaboration(OpenSym ’16). ACM, New York, NY, USA, Article 2, 10 pages. https://doi.org/10.1145/2957792.2957801
[21]
Priyanka Sinha, Lipika Dey, Pabitra Mitra, and Anupam Basu. 2015. Mining HEXACO personality traits from Enterprise Social Media. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Lisboa, Portugal, 140–147. http://aclweb.org/anthology/W15-2920
[22]
StackExchange. 2017. StackExchange Dataset Archive. https://archive.org/details/stackexchange
[23]
Yla R. Tausczik and James W. Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29, 1 (2010), 24–54.
[24]
Machiko Toyoda, Yasushi Sakurai, and Yoshiharu Ishikawa. 2013. Pattern Discovery in Data Streams Under the Time Warping Distance. The VLDB Journal 22, 3 (June 2013), 295–318. https://doi.org/10.1007/s00778-012-0289-3
[25]
Utkarsh Upadhyay. 2014. StackOverflow data dump to Postgres DB. https://github.com/Networks-Learning/stackexchange-dump-to-postgres.
[26]
Wu Youyou, Michal Kosinski, and David Stillwell. 2015. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112, 4 (2015), 1036–1040. https://doi.org/10.1073/pnas.1418680112

Cited By

View all
  • (2024)A data‐driven approach to improve online consumer subscriptions by combining data visualization and machine learning methodsInternational Journal of Consumer Studies10.1111/ijcs.1303048:2Online publication date: 29-Feb-2024
  • (2024)Application of BERT Model for Unsupervised Text Classification using Hierarchical Clustering for Automatic Classification of Thesis Manuscript2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC60852.2024.10690039(278-284)Online publication date: 7-Aug-2024
  • (2023)Time-Series Clustering for Determining Behavioral-Based Brand Loyalty of Users Across Social MediaIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.321978110:4(1951-1965)Online publication date: Aug-2023
  • Show More Cited By

Index Terms

  1. A Hierarchical Clustering Algorithm for Characterizing Social Media Users
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '20: Companion Proceedings of the Web Conference 2020
        April 2020
        854 pages
        ISBN:9781450370240
        DOI:10.1145/3366424
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. clustering
        2. employee behavior
        3. enterprise social media
        4. time series

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '20
        Sponsor:
        WWW '20: The Web Conference 2020
        April 20 - 24, 2020
        Taipei, Taiwan

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)51
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 02 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A data‐driven approach to improve online consumer subscriptions by combining data visualization and machine learning methodsInternational Journal of Consumer Studies10.1111/ijcs.1303048:2Online publication date: 29-Feb-2024
        • (2024)Application of BERT Model for Unsupervised Text Classification using Hierarchical Clustering for Automatic Classification of Thesis Manuscript2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC60852.2024.10690039(278-284)Online publication date: 7-Aug-2024
        • (2023)Time-Series Clustering for Determining Behavioral-Based Brand Loyalty of Users Across Social MediaIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.321978110:4(1951-1965)Online publication date: Aug-2023
        • (2023)Analysis of the Behavior of Company Employees as Users of Information Systems or Tools, Based on Employees Clustering with K-means Algorithm2023 IEEE 64th International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS)10.1109/ITMS59786.2023.10317652(1-7)Online publication date: 5-Oct-2023
        • (2022)Mining Homophilic Groups of Users using Edge Attributed Node Embedding from Enterprise Social NetworksCompanion Proceedings of the Web Conference 202210.1145/3487553.3524726(1139-1147)Online publication date: 25-Apr-2022
        • (2020)Information Exposure From Relational Background Knowledge on Social Media2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA49011.2020.00041(282-291)Online publication date: Oct-2020

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media