research-article

Mining Homophilic Groups of Users using Edge Attributed Node Embedding from Enterprise Social Networks

Authors:

Priyanka Sinha,

Lipika DeyAuthors Info & Claims

WWW '22: Companion Proceedings of the Web Conference 2022

Pages 1139 - 1147

https://doi.org/10.1145/3487553.3524726

Published: 16 August 2022 Publication History

Abstract

We develop a method to identify groups of similarly behaving users with similar work contexts from their activity on enterprise social media. This would allow organizations to discover redundancies and increase efficiency. To better capture the network structure and communication characteristics, we model user communications with directed attributed edges in a graph. Communication parameters including engagement frequency, emotion words, and post lengths act as edge weights of the multiedge. Upon the resultant adjacency tensor, we develop a node embedding algorithm using higher order singular value tensor decomposition and convolutional autoencoder. We develop a peer group identification algorithm using the cluster labels obtained from the node embedding and show its results on Enron emails and StackExchange Workplace community. We observe that people of the same roles in enterprise social media are clustered together by our method. We provide a comparison with existing node embedding algorithms as a reference indicating that attributed social networks and our formulations are an efficient and scalable way to identify peer groups in an enterprise social network that aids in professional social matching.

Supplementary Material

Presentation slides (slides.pdf)

Download
1.34 MB

References

[1]

Rakesh Agrawal, Ramakrishnan Srikant, and Dilys Thomas. 2005. Privacy Preserving OLAP. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (Baltimore, Maryland) (SIGMOD ’05). Association for Computing Machinery, New York, NY, USA, 251–262. https://doi.org/10.1145/1066157.1066187

Digital Library

[2]

David Arthur and Sergei Vassilvitskii. 2007. K-Means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, Louisiana) (SODA ’07). Society for Industrial and Applied Mathematics, USA, 1027–1035.

Digital Library

[3]

Ayan Kumar Bhowmick, Koushik Meneni, Maximilien Danisch, Jean-Loup Guillaume, and Bivas Mitra. 2020. LouvainNE: Hierarchical Louvain Method for High Quality and Scalable Network Embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 43–51. https://doi.org/10.1145/3336191.3371800

Digital Library

[4]

Rui Chen, Qingyi Hua, Bo Wang, Min Zheng, Weili Guan, Xiang Ji, Quanli Gao, and Xiangjie Kong. 2019. A Novel Social Recommendation Method Fusing User’s Social Status and Homophily Based on Matrix Factorization Techniques. IEEE Access 7(2019), 18783–18798. https://doi.org/10.1109/ACCESS.2019.2893024

[5]

Lipika Dey and Bhakti Gaonkar. 2012. Discovering regular and consistent behavioral patterns in topical tweeting. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, Tsukuba, Japan, 3464–3467.

[6]

Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. 2018. Learning Structural Node Embeddings via Diffusion Wavelets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (London, United Kingdom) (KDD ’18). ACM, New York, NY, USA, 1320–1329. https://doi.org/10.1145/3219819.3220025

Digital Library

[7]

Ethan Fast, Binbin Chen, and Michael S. Bernstein. 2016. Empath: Understanding Topic Signals in Large-Scale Text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 4647–4657. https://doi.org/10.1145/2858036.2858535

Digital Library

[8]

Palash Goyal, Homa Hosseinmardi, Emilio Ferrara, and Aram Galstyan. 2018. Embedding Networks with Edge Attributes. In Proceedings of the 29th on Hypertext and Social Media (Baltimore, MD, USA) (HT ’18). ACM, New York, NY, USA, 38–42. https://doi.org/10.1145/3209542.3209571

Digital Library

[9]

Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). ACM, New York, NY, USA, 855–864. https://doi.org/10.1145/2939672.2939754

Digital Library

[10]

Janine Hacker, Rebecca Bernsmann, and Kai Riemer. 2017. Dimensions of User Behavior in Enterprise Social Networks. Springer International Publishing, Cham, 125–146. https://doi.org/10.1007/978-3-319-45133-6_7

[11]

Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Gaël Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). SciPy, Pasadena, CA USA, 11 – 15.

[12]

Jeff Heer and Andrew Fiore. 2015. UC Berkeley Enron Email Analysis. https://bailando.berkeley.edu/enron_email.html.

[13]

Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. https://spacy.io/.

[14]

Vijay Ingalalli, Dino Ienco, and Pascal Poncelet. 2018. Mining Frequent Subgraphs in Multigraphs. Information Sciences 451-452 (Jul 2018), 50–66. https://doi.org/10.1016/j.ins.2018.04.001

[15]

Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. 2020. Representation Learning for Dynamic Graphs: A Survey.Journal of Machine Learning Research 21, 70 (2020), 1–73.

[16]

Tamara G. Kolda. 2001. Orthogonal Tensor Decompositions. SIAM J. Matrix Anal. Appl. 23, 1 (July 2001), 243–255. https://doi.org/10.1137/S0895479800368354

Digital Library

[17]

Tamara G. Kolda and Brett W. Bader. 2009. Tensor Decompositions and Applications. SIAM Rev. 51, 3 (September 2009), 455–500. https://doi.org/10.1137/07070111X

Digital Library

[18]

Yanbei Liu, Qi Wang, Xiao Wang, Fang Zhang, Lei Geng, Jun Wu, and Zhitao Xiao. 2020. Community enhanced graph convolutional networks. Pattern Recognition Letters 138 (2020), 462–468. https://doi.org/10.1016/j.patrec.2020.08.015

[19]

Steven Loria. 2020. TextBlob: Simplified Text Processing. https://textblob.readthedocs.io/en/dev/.

[20]

Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69 (Feb 2004), 026113. Issue 2. https://doi.org/10.1103/PhysRevE.69.026113

[21]

Maximilian Nickel and Evert Rol. 2019. SKTensor Python3 Library. https://pypi.org/project/scikit-tensor-py3/.

[22]

Hrvoje Niksic. 2017. GNU Wget Software. https://www.gnu.org/software/wget/

[23]

Thomas Olsson, Jukka Huhtamäki, and Hannu Kärkkäinen. 2020. Directions for Professional Social Matching Systems. Communications of the ACM (CACM) 63, 2 (January 2020), 60–69. https://doi.org/10.1145/3363825

Digital Library

[24]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research 12 (November 2011), 2825–2830.

Digital Library

[25]

Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(Marina Del Rey, CA, USA) (WSDM ’18). Association for Computing Machinery, New York, NY, USA, 459–467. https://doi.org/10.1145/3159652.3159706

Digital Library

[26]

Kunal Ranjan and Lipika Dey. 2014. Email Analytics for Support Center Performance Analysis. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW). IEEE Computer Society, Los Alamitos, CA, USA, 810–817. https://doi.org/10.1109/ICDMW.2014.74

[27]

Leonardo F.R. Ribeiro, Pedro H.P. Saverese, and Daniel R. Figueiredo. 2017. Struc2vec: Learning Node Representations from Structural Identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Halifax, NS, Canada) (KDD ’17). Association for Computing Machinery, New York, NY, USA, 385–394. https://doi.org/10.1145/3097983.3098061

Digital Library

[28]

Aman Roy, Vinayak Kumar, Debdoot Mukherjee, and Tanmoy Chakraborty. 2020. Learning Multigraph Node Embeddings Using Guided Lévy Flights. In Advances in Knowledge Discovery and Data Mining, Hady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, and Sinno Jialin Pan (Eds.). Springer International Publishing, Cham, 524–537.

[29]

Aravind Sankar, Xinyang Zhang, Adit Krishnan, and Jiawei Han. 2020. Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 510–518. https://doi.org/10.1145/3336191.3371811

Digital Library

[30]

Daniel Schneider, Scott Spurlock, and Megan Squire. 2016. Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List. In Proceedings of the 12th International Symposium on Open Collaboration (Berlin, Germany) (OpenSym ’16). ACM, New York, NY, USA, Article 2, 10 pages. https://doi.org/10.1145/2957792.2957801

Digital Library

[31]

Priyanka Sinha, Lipika Dey, Pabitra Mitra, and Anupam Basu. 2015. Mining HEXACO personality traits from Enterprise Social Media. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Lisboa, Portugal, 140–147. http://aclweb.org/anthology/W15-2920

[32]

Priyanka Sinha, Lipika Dey, Pabitra Mitra, and Dilys Thomas. 2020. A Hierarchical Clustering Algorithm for Characterizing Social Media Users. Association for Computing Machinery, New York, NY, USA, 353–362. https://doi.org/10.1145/3366424.3383296

Digital Library

[33]

StackExchange. 2017. StackExchange Dataset Archive. https://archive.org/details/stackexchange

[34]

Sergio L. Toral, Rocío M. Torres, and Federico Barrero. 2009. Modelling Mailing List Behaviour in Open Source Projects: the Case of ARM Embedded Linux. J.UCS: Journal of Universal Computer Science 15, 3 (feb 2009), 648–664.

[35]

Utkarsh Upadhyay. 2015. StackOverflow data to postgres. https://github.com/Networks-Learning/stackexchange-dump-to-postgres.

[36]

Janine Viol, Rebecca Bernsmann, and Kai Riemer. 2015. ”Behavioural Dimensions for Discovering Knowledge Actor Roles Utilising Enterprise Social Network Metrics”. In Proceedings of Australasian Conference on Information Systems (ACIS) 2015. AIS, Adelaide,Australia, 13 pages. https://aisel.aisnet.org/acis2015/17

[37]

Wu Youyou, Michal Kosinski, and David Stillwell. 2015. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112, 4 (2015), 1036–1040. https://doi.org/10.1073/pnas.1418680112 arXiv:http://www.pnas.org/content/112/4/1036.full.pdf

[38]

Shuo Yu, Feng Xia, Kaiyuan Zhang, Zhaolong Ning, Jiaofei Zhong, and Chengfei Liu. 2017. Team Recognition in Big Scholarly Data: Exploring Collaboration Intensity. In 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress. IEEE, USA, 925–932. https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.155

[39]

Shuo Yu, Jin Xu, Chen Zhang, Feng Xia, Zafer Almakhadmeh, and Amr Tolba. 2019. Motifs in Big Networks: Methods and Applications. IEEE Access 7(2019), 183322–183338. https://doi.org/10.1109/ACCESS.2019.2960044

[40]

Kaiyuan Zhang, Shuo Yu, Liangtian Wan, Jianxin Li, and Feng Xia. 2019. Predictive Representation Learning in Motif-Based Graph Networks. In AI 2019: Advances in Artificial Intelligence, Jixue Liu and James Bailey (Eds.). Springer International Publishing, Cham, 177–188.

Index Terms

Mining Homophilic Groups of Users using Edge Attributed Node Embedding from Enterprise Social Networks

Recommendations

A Hierarchical Clustering Algorithm for Characterizing Social Media Users
WWW '20: Companion Proceedings of the Web Conference 2020

In this paper we propose a method to characterize user behavior from their engagement with enterprise social media. Content analysis often suffers challenges due to noise. Here we study behavior using temporal activity, i.e., the number of posts per ...
The effects of network sharing on knowledge-sharing activities and job performance in enterprise social media environments

This study examines the influence of the tertius iungens orientation on knowledge-sharing activities and individual job performance within enterprise social media environments. The empirical analysis reveals that knowledge self-efficacy, social ...
Social Media and the Digital Enterprise
Human-Computer Interaction – INTERACT 2019
Abstract
Over the last decade, the role of social media in enabling the digital enterprise has been rapidly growing. In order for digital enterprises to embrace the opportunities afforded by social media technologies, including the use of social media for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Companion Proceedings of the Web Conference 2022

April 2022

1338 pages

ISBN:9781450391306

DOI:10.1145/3487553

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Lionel Médini
Université Lyon 1, France
,
Ivan Herman
W3C / retired

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Indian Institute of Technology Kharagpur
Tata Consultancy Services Limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
91
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents