skip to main content
research-article

Digger: Detect Similar Groups in Heterogeneous Social Networks

Published: 19 December 2018 Publication History

Abstract

People participate in multiple online social networks, e.g., Facebook, Twitter, and Linkedin, and these social networks with heterogeneous social content and user relationship are named as heterogeneous social networks. Group structure widely exists in heterogeneous social networks, which reveals the evolution of human cooperation. Detecting similar groups in heterogeneous networks has a great significance for many applications, such as recommendation system and spammer detection, using the wealth of group information. Although promising, this novel problem encounters a variety of technical challenges, including incomplete data, high time complexity, and ground truth. To address the research gap and technical challenges, we take advantage of a ratio-cut optimization function to model this novel problem by the linear mixed-effects method and graph spectral theory. Based on this model, we propose an efficient algorithm called Digger to detect the similar groups in the large graphs. Digger consists of three steps, including measuring user similarity, construct a matching graph, and detecting similar groups. We adopt several strategies to lower the computational cost and detail the basis of labeling the ground truth. We evaluate the effectiveness and efficiency of our algorithm on five different types of online social networks. The extensive experiments show that our method achieves 0.693, 0.783, and 0.735 in precision, recall, and F1-measure, which significantly surpass the state-of-arts by 24.4%, 15.3%, and 20.7%, respectively. The results demonstrate that our proposal can detect similar groups in heterogeneous networks effectively.

References

[1]
Martin Atzmueller, Stephan Doerfel, and Folke Mitzlaff. 2016. Description-oriented community detection using exhaustive subgroup discovery. Information Sciences 329 (2016), 965--984.
[2]
Stefano Boccaletti, Vito Latora, Yamir Moreno, Martin Chavez, and D.-U. Hwang. 2006. Complex networks: Structure and dynamics. Physics Reports 424, 4 (2006), 175--308.
[3]
Leo Breiman. 2017. Classification and Regression Trees. Routledge.
[4]
Wei Chen, Zhenming Liu, Xiaorui Sun, and Yajun Wang. 2010. A game-theoretic framework to identify overlapping communities in social networks. Data Mining and Knowledge Discovery 21, 2 (2010), 224--240.
[5]
Euijin Choo, Ting Yu, and Min Chi. 2015. Detecting opinion spammer groups through community discovery and sentiment analysis. In Proceedings of the IFIP Annual Conference on Data and Applications Security and Privacy. Springer, 170--187.
[6]
Aaron Clauset, Mark E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 6 (2004), Article 066111.
[7]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107--113.
[8]
James Demmel, Ioana Dumitriu, and Olga Holtz. 2007. Fast linear algebra is stable. Numerische Mathematik 108, 1 (2007), 59--91.
[9]
Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3 (2010), 75--174.
[10]
James H. Fowler and Nicholas A. Christakis. 2010. Cooperative behavior cascades in human social networks. Proceedings of the National Academy of Sciences 107, 12 (2010), 5334--5338.
[11]
Michael R. Garey and David S. Johnson. 1977. The rectilinear Steiner tree problem is NP-complete. SIAM Journal on Applied Mathematics 32, 4 (1977), 826--834.
[12]
Stephen Kelley, Mark Goldberg, Malik Magdon-Ismail, Konstantin Mertsalov, and Al Wallace. 2012. Defining and discovering communities in social networks. In Handbook of Optimization in Complex Networks. Springer, 139--168.
[13]
David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology 58, 7 (2007), 1019--1031.
[14]
Hairong Liu and Shuicheng Yan. 2010. Common visual pattern discovery via spatially coherent correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 1609--1616.
[15]
Wu Liu, Lingheng Zhu, Lingyang Chu, and Huadong Ma. 2018. A common subgraph correspondence mining framework for map search services. Multimedia Tools and Applications (2018), 1--20.
[16]
Xiaoming Liu, Yadong Zhou, Xiaohong Guan, and Chao Shen. 2017. A feasible graph partition framework for parallel computing of big graph. Knowledge-Based Systems 134 (2017), 228--239.
[17]
Xiaoming Liu, Yadong Zhou, Chengchen Hu, and Xiaohong Guan. 2016. MIRACLE: A multiple independent random walks community parallel detection algorithm for big graphs. Journal of Network and Computer Applications 70 (2016), 89--101.
[18]
Xiaoming Liu, Yadong Zhou, Chengchen Hu, Xiaohong Guan, and Junyuan Leng. 2014. Detecting community structure for undirected big graphs based on random walks. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 1151--1156.
[19]
Charles E. McCulloch and John M. Neuhaus. 2001. Generalized Linear Mixed Models. Wiley Online Library.
[20]
Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting fake reviewer groups in consumer reviews. In Proceedings of the 21st International Conference on World Wide Web. ACM, 191--200.
[21]
Mark E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577--8582.
[22]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
[23]
Daniele Perito, Claude Castelluccia, Mohamed Ali Kaafar, and Pere Manils. 2011. How unique and traceable are usernames? In Proceedings of the International Symposium on Privacy Enhancing Technologies Symposium. Springer, 1--17.
[24]
Pascal Pons and Matthieu Latapy. 2005. Computing communities in large networks using random walks. In Proceedings of the 20th International Conference on Computer and Information Sciences (ISCIS’05). Springer, 284--293.
[25]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing 8 Management 24, 5 (1988), 513--523.
[26]
Chuan Shi, Xiangnan Kong, Yue Huang, S. Yu Philip, and Bin Wu. 2014. Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering 26, 10 (2014), 2479--2492.
[27]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S. Yu Philip. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2017), 17--37.
[28]
Chuan Shi, Chong Zhou, Xiangnan Kong, Philip S. Yu, Gang Liu, and Bai Wang. 2012. HeteRecom: A semantic-based recommendation system in heterogeneous networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1552--1555.
[29]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.
[30]
Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2001. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 2 (2001), 411--423.
[31]
Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and Computing 17, 4 (2007), 395--416.
[32]
Chih-Chien Wang, Min-Yuh Day, and Yu-Ruei Lin. 2016. Toward understanding the cliques of opinion spammers with social network analysis. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). 1163--1169.
[33]
Fengjiao Wang, Shuyang Lin, and S. Yu Philip. 2016. Collaborative co-clustering across multiple social media. In Proceedings of the 17th IEEE International Conference on Mobile Data Management (MDM’16), Vol. 1. 142--151.
[34]
Yun Xiong, Yangyong Zhu, and S. Yu Philip. 2015. Top-k similarity join in heterogeneous information networks. IEEE Transactions on Knowledge and Data Engineering 27, 6 (2015), 1710--1723.
[35]
Yiming Yang and Jan O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML’97), Vol. 97. 412--420.
[36]
Pan Zhang, Cristopher Moore, and M. E. J. Newman. 2016. Community detection in networks with unequal groups. Physical Review E 93, 1 (2016), 012303.
[37]
Xiaomei Zhang and Guohong Cao. 2017. Transient community detection and its application to data forwarding in delay tolerant networks. IEEE/ACM Transactions on Networking 25, 5 (2017), 2829--2843.
[38]
Yutao Zhang, Jie Tang, Zhilin Yang, Jian Pei, and Philip S. Yu. 2015. Cosnet: Connecting heterogeneous social networks with local and global consistency. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1485--1494.
[39]
Yuchen Zhao, Guan Wang, Philip S. Yu, Shaobo Liu, and Simon Zhang. 2013. Inferring social roles and statuses in social networks. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 695--703.

Cited By

View all
  • (2023)A Meta-Analysis of Facebook-Assisted Learning Outcomes in Different Countries or RegionsInternational Journal of Information Technology and Web Engineering10.4018/IJITWE.31931218:1(1-18)Online publication date: 10-Mar-2023
  • (2022)Metaheuristic approaches for ratio cut and normalized cut graph partitioningMemetic Computing10.1007/s12293-022-00365-w14:3(253-285)Online publication date: 29-Apr-2022
  • (2020)Extracting Dense and Connected Communities in Dual Networks: An Alignment Based AlgorithmIEEE Access10.1109/ACCESS.2020.30209248(162279-162289)Online publication date: 2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 1
February 2019
340 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3301280
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2018
Accepted: 01 August 2018
Revised: 01 July 2018
Received: 01 January 2018
Published in TKDD Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Heterogeneous networks
  2. detecting similar groups
  3. graph spectral
  4. linear mixed-effects

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Open Project Program of the National Laboratory of Pattern Recognition
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Meta-Analysis of Facebook-Assisted Learning Outcomes in Different Countries or RegionsInternational Journal of Information Technology and Web Engineering10.4018/IJITWE.31931218:1(1-18)Online publication date: 10-Mar-2023
  • (2022)Metaheuristic approaches for ratio cut and normalized cut graph partitioningMemetic Computing10.1007/s12293-022-00365-w14:3(253-285)Online publication date: 29-Apr-2022
  • (2020)Extracting Dense and Connected Communities in Dual Networks: An Alignment Based AlgorithmIEEE Access10.1109/ACCESS.2020.30209248(162279-162289)Online publication date: 2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media