ABSTRACT
More often than not, people are active in more than one social network. Identifying users from multiple heterogeneous social networks and integrating the different networks is a fundamental issue in many applications. The existing methods tackle this problem by estimating pairwise similarity between users in two networks. However, those methods suffer from potential inconsistency of matchings between multiple networks.
In this paper, we propose COSNET (COnnecting heterogeneous Social NETworks with local and global consistency), a novel energy-based model, to address this problem by considering both local and global consistency among multiple networks. An efficient subgradient algorithm is developed to train the model by converting the original energy-based objective function into its dual form.
We evaluate the proposed model on two different genres of data collections: SNS and Academia, each consisting of multiple heterogeneous social networks. Our experimental results validate the effectiveness and efficiency of the proposed model. On both data collections, the proposed COSNET method significantly outperforms several alternative methods by up to 10-30% (p << 0:001, t-test) in terms of F1-score. We also demonstrate that applying the integration results produced by our method can improve the accuracy of expert finding, an important task in social networks.
Supplemental Material
- R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993. Google ScholarDigital Library
- L. Backstrom, C. Dwork, and J. M. Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW'07, pages 181--190, 2007. Google ScholarDigital Library
- X. Bai, F. P. Junqueira, and S. H. Sengamedu. Exploiting user clicks for automatic seed set generation for entity matching. In KDD'13, pages 980--988, 2013. Google ScholarDigital Library
- K. Bellare, S. Iyengar, A. G. Parameswaran, and V. Rastogi. Active sampling for entity matching. In KDD'12, pages 1131--1139, 2012. Google ScholarDigital Library
- I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1):1--36, March 2007. Google ScholarDigital Library
- C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR'2004, pages 25--32, 2004. Google ScholarDigital Library
- W. Chen, Z. Liu, X. Sun, and Y. Wang. A game-theoretic framework to identify overlapping communities in social networks. Data Mining and Knowledge Discovery, 21(2):224--240, 2010. Google ScholarDigital Library
- W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string metrics for matching names and records. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, pages 73--78, 2003.Google Scholar
- S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL'07, volume 6, pages 708--716, 2007.Google Scholar
- Y. Cui, J. Pei, G. Tang, W.-S. Luk, D. Jiang, and M. Hua. Finding email correspondents in online social networks. World Wide Web, 16(2):195--218, 2013. Google ScholarDigital Library
- R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. MIT Press, Cambridge, MA, 2000.Google Scholar
- S. Kataria, K. S. Kumar, R. Rastogi, P. Sen, and S. H. Sengamedu. Entity disambiguation with hierarchical topic models. In KDD'11, pages 1037--1045, 2011. Google ScholarDigital Library
- N. Komodakis. Efficient training for pairwise or higher order crfs via dual decomposition. In CVPR'11, pages 1841--1848, 2011. Google ScholarDigital Library
- N. Komodakis, N. Paragios, and G. Tziritas. Mrf energy minimization and beyond via dual decomposition. IEEE Trans. Pattern Anal. Mach. Intell., 2011. Google ScholarDigital Library
- X. Kong, J. Zhang, and S. Y. Philip. Inferring anchor links across multiple heterogeneous social networks. In CIKM'13, pages 179--188, 2013. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. B. Moon. What is twitter, a social network or a news media? In WWW'10, pages 591--600, 2010. Google ScholarDigital Library
- S. Lacoste-Julien, K. Palla, A. Davies, G. Kasneci, T. Graepel, and Z. Ghahramani. Sigma: Simple greedy matching for aligning large knowledge bases. In KDD'13, pages 572--580, 2013. Google ScholarDigital Library
- Y. LeCun, S. Chopra, and R. Hadsell. A tutorial on energy-based learning. 2006 CIAR Summer School: Neural Computation & Adaptive Perception, 2006.Google Scholar
- J. Li, J. Tang, Y. Li, and Q. Luo. Rimom: A dynamic multi-strategy ontology alignment framework. IEEE TKDE, 21(8):1218--1232, 2009. Google ScholarDigital Library
- Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan. Mining evidences for named entity disambiguation. In KDD'13, pages 1070--1078, 2013. Google ScholarDigital Library
- J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What's in a name?: an unsupervised approach to link users across communities. In WSDM'13, pages 495--504, 2013. Google ScholarDigital Library
- S. Liu, S. Wang, F. Zhu, J. Zhang, and R. Krishnan. Hydra: Large-scale social identity linkage via heterogeneous behavior modeling. In SIGMOD'14, pages 51--62, 2014. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. VLDB'12, 5(8):716--727, 2012. Google ScholarDigital Library
- H. Ma, H. Yang, M. R. Lyu, and I. King. Sorec: social recommendation using probabilistic matrix factorization. In CIKM'08, pages 931--940, 2008. Google ScholarDigital Library
- A. Maslow. A theory of human motivation. Psychological Review, 50(4):370--396, 1943.Google ScholarCross Ref
- A. Narayanan and V. Shmatikov. De-anonymizing social networks. In IEEE Symposium on Security and Privacy'09, pages 173--187, 2009. Google ScholarDigital Library
- D. Perito, C. Castelluccia, M. A. Kaafar, and P. Manils. How unique and traceable are usernames? In Privacy Enhancing Technologies, pages 1--17, 2011. Google ScholarDigital Library
- W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In KDD'13, pages 68--76, 2013. Google ScholarDigital Library
- J. Tang, A. Fong, B. Wang, and J. Zhang. A unified probabilistic framework for name disambiguation in digital library. IEEE TKDE, 24(6):975--987, 2012. Google ScholarDigital Library
- J. Tang, H. Gao, H. Liu, and A. D. Sarma. eTrust: Understanding trust evolution in an online world. In KDD'12, pages 253--261, 2012. Google ScholarDigital Library
- J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990--998, 2008. Google ScholarDigital Library
- W. Tang, J. Tang, T. Lei, C. Tan, B. Gao, and T. Li. On optimization of expertise matching with various constraints. Neurocomputing, 76(1):71--83, 2012. Google ScholarDigital Library
- B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. NIPS'04, 16, 2004.Google Scholar
- H. Whitney. Congruent graphs and the connectivity of graphs. American Journal of Mathematics, 54(1):150--168, 1932.Google ScholarCross Ref
- S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In WWW'11, pages 705--714, 2011. Google ScholarDigital Library
- L. Yartseva and M. Grossglauser. On the performance of percolation graph matching. In COSN'13, pages 119--130, 2013. Google ScholarDigital Library
- R. Zafarani and H. Liu. Connecting corresponding identities across communities. In ICWSM'09, pages 354--357, 2009.Google Scholar
- R. Zafarani and H. Liu. Connecting users across social media sites: A behavioral-modeling approach. In KDD'13, pages 41--49, 2013. Google ScholarDigital Library
- J. Zhang, J. Tang, and J. Li. Expert finding in a social network. In DASFAA'07, pages 1066--1069, 2007.Google ScholarCross Ref
Index Terms
- COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency
Recommendations
User Identity Linkage across Online Social Networks: A Review
The increasing popularity and diversity of social media sites has encouraged more and more people to participate on multiple online social networks to enjoy their services. Each user may create a user identity, which can includes profile, content, or ...
Unifying Virtual and Physical Worlds: Learning Toward Local and Global Consistency
Event-based social networking services, such as Meetup, are capable of linking online virtual interactions to offline physical activities. Compared to mono online social networking services (e.g., Twitter and Google+), such dual networks provide a ...
Inferring anchor links across multiple heterogeneous social networks
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementOnline social networks can often be represented as heterogeneous information networks containing abundant information about: who, where, when and what. Nowadays, people are usually involved in multiple social networks simultaneously. The multiple ...
Comments