skip to main content
10.1145/3487553.3524726acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Mining Homophilic Groups of Users using Edge Attributed Node Embedding from Enterprise Social Networks

Published: 16 August 2022 Publication History

Abstract

We develop a method to identify groups of similarly behaving users with similar work contexts from their activity on enterprise social media. This would allow organizations to discover redundancies and increase efficiency. To better capture the network structure and communication characteristics, we model user communications with directed attributed edges in a graph. Communication parameters including engagement frequency, emotion words, and post lengths act as edge weights of the multiedge. Upon the resultant adjacency tensor, we develop a node embedding algorithm using higher order singular value tensor decomposition and convolutional autoencoder. We develop a peer group identification algorithm using the cluster labels obtained from the node embedding and show its results on Enron emails and StackExchange Workplace community. We observe that people of the same roles in enterprise social media are clustered together by our method. We provide a comparison with existing node embedding algorithms as a reference indicating that attributed social networks and our formulations are an efficient and scalable way to identify peer groups in an enterprise social network that aids in professional social matching.

Supplementary Material

Presentation slides (slides.pdf)

References

[1]
Rakesh Agrawal, Ramakrishnan Srikant, and Dilys Thomas. 2005. Privacy Preserving OLAP. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (Baltimore, Maryland) (SIGMOD ’05). Association for Computing Machinery, New York, NY, USA, 251–262. https://doi.org/10.1145/1066157.1066187
[2]
David Arthur and Sergei Vassilvitskii. 2007. K-Means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, Louisiana) (SODA ’07). Society for Industrial and Applied Mathematics, USA, 1027–1035.
[3]
Ayan Kumar Bhowmick, Koushik Meneni, Maximilien Danisch, Jean-Loup Guillaume, and Bivas Mitra. 2020. LouvainNE: Hierarchical Louvain Method for High Quality and Scalable Network Embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 43–51. https://doi.org/10.1145/3336191.3371800
[4]
Rui Chen, Qingyi Hua, Bo Wang, Min Zheng, Weili Guan, Xiang Ji, Quanli Gao, and Xiangjie Kong. 2019. A Novel Social Recommendation Method Fusing User’s Social Status and Homophily Based on Matrix Factorization Techniques. IEEE Access 7(2019), 18783–18798. https://doi.org/10.1109/ACCESS.2019.2893024
[5]
Lipika Dey and Bhakti Gaonkar. 2012. Discovering regular and consistent behavioral patterns in topical tweeting. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, Tsukuba, Japan, 3464–3467.
[6]
Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. 2018. Learning Structural Node Embeddings via Diffusion Wavelets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (London, United Kingdom) (KDD ’18). ACM, New York, NY, USA, 1320–1329. https://doi.org/10.1145/3219819.3220025
[7]
Ethan Fast, Binbin Chen, and Michael S. Bernstein. 2016. Empath: Understanding Topic Signals in Large-Scale Text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 4647–4657. https://doi.org/10.1145/2858036.2858535
[8]
Palash Goyal, Homa Hosseinmardi, Emilio Ferrara, and Aram Galstyan. 2018. Embedding Networks with Edge Attributes. In Proceedings of the 29th on Hypertext and Social Media (Baltimore, MD, USA) (HT ’18). ACM, New York, NY, USA, 38–42. https://doi.org/10.1145/3209542.3209571
[9]
Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). ACM, New York, NY, USA, 855–864. https://doi.org/10.1145/2939672.2939754
[10]
Janine Hacker, Rebecca Bernsmann, and Kai Riemer. 2017. Dimensions of User Behavior in Enterprise Social Networks. Springer International Publishing, Cham, 125–146. https://doi.org/10.1007/978-3-319-45133-6_7
[11]
Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Gaël Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). SciPy, Pasadena, CA USA, 11 – 15.
[12]
Jeff Heer and Andrew Fiore. 2015. UC Berkeley Enron Email Analysis. https://bailando.berkeley.edu/enron_email.html.
[13]
Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. https://spacy.io/.
[14]
Vijay Ingalalli, Dino Ienco, and Pascal Poncelet. 2018. Mining Frequent Subgraphs in Multigraphs. Information Sciences 451-452 (Jul 2018), 50–66. https://doi.org/10.1016/j.ins.2018.04.001
[15]
Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. 2020. Representation Learning for Dynamic Graphs: A Survey.Journal of Machine Learning Research 21, 70 (2020), 1–73.
[16]
Tamara G. Kolda. 2001. Orthogonal Tensor Decompositions. SIAM J. Matrix Anal. Appl. 23, 1 (July 2001), 243–255. https://doi.org/10.1137/S0895479800368354
[17]
Tamara G. Kolda and Brett W. Bader. 2009. Tensor Decompositions and Applications. SIAM Rev. 51, 3 (September 2009), 455–500. https://doi.org/10.1137/07070111X
[18]
Yanbei Liu, Qi Wang, Xiao Wang, Fang Zhang, Lei Geng, Jun Wu, and Zhitao Xiao. 2020. Community enhanced graph convolutional networks. Pattern Recognition Letters 138 (2020), 462–468. https://doi.org/10.1016/j.patrec.2020.08.015
[19]
Steven Loria. 2020. TextBlob: Simplified Text Processing. https://textblob.readthedocs.io/en/dev/.
[20]
Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69 (Feb 2004), 026113. Issue 2. https://doi.org/10.1103/PhysRevE.69.026113
[21]
Maximilian Nickel and Evert Rol. 2019. SKTensor Python3 Library. https://pypi.org/project/scikit-tensor-py3/.
[22]
Hrvoje Niksic. 2017. GNU Wget Software. https://www.gnu.org/software/wget/
[23]
Thomas Olsson, Jukka Huhtamäki, and Hannu Kärkkäinen. 2020. Directions for Professional Social Matching Systems. Communications of the ACM (CACM) 63, 2 (January 2020), 60–69. https://doi.org/10.1145/3363825
[24]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research 12 (November 2011), 2825–2830.
[25]
Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and Node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(Marina Del Rey, CA, USA) (WSDM ’18). Association for Computing Machinery, New York, NY, USA, 459–467. https://doi.org/10.1145/3159652.3159706
[26]
Kunal Ranjan and Lipika Dey. 2014. Email Analytics for Support Center Performance Analysis. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW). IEEE Computer Society, Los Alamitos, CA, USA, 810–817. https://doi.org/10.1109/ICDMW.2014.74
[27]
Leonardo F.R. Ribeiro, Pedro H.P. Saverese, and Daniel R. Figueiredo. 2017. Struc2vec: Learning Node Representations from Structural Identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(Halifax, NS, Canada) (KDD ’17). Association for Computing Machinery, New York, NY, USA, 385–394. https://doi.org/10.1145/3097983.3098061
[28]
Aman Roy, Vinayak Kumar, Debdoot Mukherjee, and Tanmoy Chakraborty. 2020. Learning Multigraph Node Embeddings Using Guided Lévy Flights. In Advances in Knowledge Discovery and Data Mining, Hady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, and Sinno Jialin Pan (Eds.). Springer International Publishing, Cham, 524–537.
[29]
Aravind Sankar, Xinyang Zhang, Adit Krishnan, and Jiawei Han. 2020. Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction. In Proceedings of the 13th International Conference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20). Association for Computing Machinery, New York, NY, USA, 510–518. https://doi.org/10.1145/3336191.3371811
[30]
Daniel Schneider, Scott Spurlock, and Megan Squire. 2016. Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List. In Proceedings of the 12th International Symposium on Open Collaboration (Berlin, Germany) (OpenSym ’16). ACM, New York, NY, USA, Article 2, 10 pages. https://doi.org/10.1145/2957792.2957801
[31]
Priyanka Sinha, Lipika Dey, Pabitra Mitra, and Anupam Basu. 2015. Mining HEXACO personality traits from Enterprise Social Media. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Lisboa, Portugal, 140–147. http://aclweb.org/anthology/W15-2920
[32]
Priyanka Sinha, Lipika Dey, Pabitra Mitra, and Dilys Thomas. 2020. A Hierarchical Clustering Algorithm for Characterizing Social Media Users. Association for Computing Machinery, New York, NY, USA, 353–362. https://doi.org/10.1145/3366424.3383296
[33]
StackExchange. 2017. StackExchange Dataset Archive. https://archive.org/details/stackexchange
[34]
Sergio L. Toral, Rocío M. Torres, and Federico Barrero. 2009. Modelling Mailing List Behaviour in Open Source Projects: the Case of ARM Embedded Linux. J.UCS: Journal of Universal Computer Science 15, 3 (feb 2009), 648–664.
[35]
Utkarsh Upadhyay. 2015. StackOverflow data to postgres. https://github.com/Networks-Learning/stackexchange-dump-to-postgres.
[36]
Janine Viol, Rebecca Bernsmann, and Kai Riemer. 2015. ”Behavioural Dimensions for Discovering Knowledge Actor Roles Utilising Enterprise Social Network Metrics”. In Proceedings of Australasian Conference on Information Systems (ACIS) 2015. AIS, Adelaide,Australia, 13 pages. https://aisel.aisnet.org/acis2015/17
[37]
Wu Youyou, Michal Kosinski, and David Stillwell. 2015. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112, 4 (2015), 1036–1040. https://doi.org/10.1073/pnas.1418680112 arXiv:http://www.pnas.org/content/112/4/1036.full.pdf
[38]
Shuo Yu, Feng Xia, Kaiyuan Zhang, Zhaolong Ning, Jiaofei Zhong, and Chengfei Liu. 2017. Team Recognition in Big Scholarly Data: Exploring Collaboration Intensity. In 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress. IEEE, USA, 925–932. https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.155
[39]
Shuo Yu, Jin Xu, Chen Zhang, Feng Xia, Zafer Almakhadmeh, and Amr Tolba. 2019. Motifs in Big Networks: Methods and Applications. IEEE Access 7(2019), 183322–183338. https://doi.org/10.1109/ACCESS.2019.2960044
[40]
Kaiyuan Zhang, Shuo Yu, Liangtian Wan, Jianxin Li, and Feng Xia. 2019. Predictive Representation Learning in Motif-Based Graph Networks. In AI 2019: Advances in Artificial Intelligence, Jixue Liu and James Bailey (Eds.). Springer International Publishing, Cham, 177–188.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Companion Proceedings of the Web Conference 2022
April 2022
1338 pages
ISBN:9781450391306
DOI:10.1145/3487553
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. enterprise social media
  3. tensor node embedding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Indian Institute of Technology Kharagpur
  • Tata Consultancy Services Limited

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 91
    Total Downloads
  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media