ABSTRACT
Network embedding transforms a network into a continuous feature space. Network augmentation, on the other hand, leverages this feature representation to obtain a more informative network by adding potentially plausible edges while removing noisy edges. Traditional network embedding methods are often inefficient in capturing - (i) the latent relationship when the network is sparse (the network sparsity problem), and (ii) the local and global neighborhood structure of vertices (structure preserving problem).
We propose SENA, a structural embedding and network augmentation framework for social network analysis. Unlike other embedding methods which only generate vertex features, SENA generates features for both vertices and relations (edges) by minimizing a well-designed objective function composed of a loss function and a regularization. The loss function reduces the network-sparsity problem by learning from both the edges present (true edges) and absent (false edges) in the network; whereas the regularization term preserves the structural properties of the network by efficiently considering - (i) the local neighborhood of vertices and edges, and (ii) the network spectra, i.e., eigenvectors of a symmetric matrix representing the network.
We compare SENA with four baseline network embedding methods, namely DeepWalk, SE, SME and TransE. We demonstrate the efficacy of SENA through a task-based evaluation setting on different real-world networks. We consider the state-of-the-art algorithms for (i) community detection, (ii) link prediction and (iii) knowledge graph query answering, and show that with SENA's representation, these algorithms achieve up to 10%, 9% and (surprisingly) 108% higher accuracy respectively compared to the best baseline embedding methods.
- D. Babić, D. J. Klein, I. Lukovits, S. Nikolić, and N. Trinajstič. 2002. Resistance- Distance Matrix: a Computational Algorithm and its Application. International Journal of Quantum Chemistry 90, 1 (2002), 166--176.Google ScholarCross Ref
- Lars Backstrom and Jure Leskovec. 2011. Supervised Random Walks: Predicting and Recommending Links in Social Networks. In WSDM . Hong Kong, China, 635--644. Google ScholarDigital Library
- Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. 2012. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop. (2012).Google Scholar
- Mikhail Belkin and Partha Niyogi. 2001. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In NIPS . MIT Press, Granada, Spain, 585--591. Google ScholarDigital Library
- Paolo Bientinesi, Inderjit S. Dhillon, and Robert A. van de Geijn. 2005. A Parallel Eigensolver for Dense Symmetric Matrices based on Multiple Relatively Robust Representations. SIAM Journal on Scientific Computing 27 (sep 2005). Issue 1. Google ScholarDigital Library
- Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. JSTAT (2008), P10008.Google Scholar
- Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan. 1973. Time Bounds for Selection. J. Comput. Syst. Sci. 7, 4 (Aug. 1973), 448--461. Google ScholarDigital Library
- Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. 2014. A semantic matching energy function for learning with multi-relational data - Application to word-sense disambiguation. Machine Learning 94, 2 (2014), 233--259. Google ScholarDigital Library
- Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data.. In NIPS. 2787--2795. Google ScholarDigital Library
- Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011. Learning Structured Embeddings of Knowledge Bases. In AAAI. AAAI Press, San Francisco, USA. Google ScholarDigital Library
- Matthew Brand. 2003. Continuous Nonlinear Dimensionality Reduction by Kernel Eigenmaps. In IJCAI. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 547--552. http://dl.acm.org/citation.cfm?id=1630659.1630740 Google ScholarDigital Library
- Lawrence Cayton. 2005. Algorithms for manifold learning. Univ. of California at San Diego Tech. Rep (2005), 1--17.Google Scholar
- Tanmoy Chakraborty, Ayushi Dalmia, Animesh Mukherjee, and Niloy Ganguly. 2016. Metrics for Community Analysis: A Survey. arXiv preprint arXiv:1604.03512 (2016).Google Scholar
- Tanmoy Chakraborty, Sriram Srinivasan, Niloy Ganguly, Animesh Mukherjee, and Sanjukta Bhowmick. 2014. On the Permanence of Vertices in Network Communities. In SIGKDD. New York, USA, 1396--1405. Google ScholarDigital Library
- James W. Demmel. 1997. Applied Numerical Linear Algebra . Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Google ScholarDigital Library
- Richard Durstenfeld. 1964. Algorithm 235: Random Permutation. Commun. ACM 7, 7 (July 1964), 420--. Google ScholarDigital Library
- M. Girvan and M. E. Newman. 2002. Community structure in social and biological networks. PNAS 99, 12 (June 2002), 7821--7826.Google ScholarCross Ref
- Samer Hassan, Rada Mihalcea, and Carmen Banea. 2007. Random-Walk Term Weighting for Improved Text Classification.. In ICSC . IEEE Computer Society, 242--249. Google ScholarDigital Library
- Xiaofei He, Deng Cai, Shuicheng Yan, and Hong-Jiang Zhang. 2005. Neighborhood Preserving Embedding. In ICCV. IEEE Computer Society, Washington, DC, USA, 1208--1213. Google ScholarDigital Library
- ML. Huang, P. Eades, and J. Wang. 1998. On-line animated visualisation of huge graphs using a modified spring algorithm. IEEE Transactions on Computers 9 (1998), 623--645. Google ScholarDigital Library
- Paul Jaccard. 1912. The Distribution of the Flora in the Alpine Zone. New Phytologist 11, 2 (Feb. 1912), 37--50. http://www.jstor.org/stable/2427226'seq=3Google ScholarCross Ref
- Rodolphe Jenatton, Nicolas Le Roux, Antoine Bordes, and Guillaume Obozinski. 2012. A latent factor model for highly multi-relational data. In NIPS . Lake Tahoe, Nevada, USA, 3176--3184. Google ScholarDigital Library
- A. Lancichinetti, F. Radicchi, J. J. Ramasco, and S. Fortunato. 2011. Finding statistically significant communities in networks. PLoS ONE 6, 4 (2011), e18961.Google ScholarCross Ref
- David Liben-Nowell and Jon Kleinberg. 2003. The Link Prediction Problem for Social Networks. In CIKM. ACM, New York, USA, 556--559. Google ScholarDigital Library
- Ulrike Luxburg. 2007. A Tutorial on Spectral Clustering. Statistics and Computing 17, 4 (Dec. 2007), 395--416. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).Google Scholar
- Bojan Mohar. 1991. The Laplacian spectrum of graphs. In Graph Theory, Combi- natorics, and Applications. Wiley, 871--898.Google Scholar
- Bryan Perozzi, Rami Al-Rfou', and Steven Skiena. 2014. DeepWalk: online learning of social representations. In KDD . ACM, 701--710. Google ScholarDigital Library
- Pascal Pons and Matthieu Latapy. 2006. Computing Communities in Large Networks Using Random Walks. J. Graph Algorithms Appl. 10, 2 (2006), 191--218.Google ScholarCross Ref
- U. N. Raghavan, R. Albert, and S. Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Phy. Rev. E. 76, 3 (2007).Google Scholar
- Lorenzo Rosasco, Ernesto De Vito, Andrea Caponnetto, Michele Piana, and Alessandro Verri. 2004. Are Loss Functions All the Same? Neural Comput. 16, 5 (May 2004), 1063--1076. Google ScholarDigital Library
- Martin Rosvall and Carl T. Bergstrom. 2008. Maps of random walks on complex networks reveal community structure. PNAS 105, 4 (2008), 1118--1123.Google ScholarCross Ref
- S.T. Roweis and L.K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323--2326.Google ScholarCross Ref
- Daniel A. Spielman. 2007. Spectral Graph Theory and its Applications. In FOCS . IEEE Computer Society, 29--38. Google ScholarDigital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In WWW . ACM, Florence, Italy, 1067--1077. Google ScholarDigital Library
- Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. GraphX: A Resilient Distributed Graph System on Spark. In First International Workshop on Graph Data Management Experiences and Systems (GRADES '13) . ACM, New York, NY, USA, Article 2, 6 pages. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2013. Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach. In WSDM . ACM, New York, USA, 587--596. Google ScholarDigital Library
Index Terms
- SENA: Preserving Social Structure for Network Embedding
Recommendations
MELL: Effective Embedding Method for Multiplex Networks
WWW '18: Companion Proceedings of the The Web Conference 2018Network embedding is a method for converting nodes in a network into low dimensional vectors, preserving its structure and the similarities among the nodes. Embedding is widely used in many applications, e.g., social network analysis and knowledge ...
Task-oriented attributed network embedding by multi-view features
AbstractNetwork embedding, also known as network representation learning, aims at defining low-dimensional, continuous vector representation of nodes to maximally preserve the network structure. Recent efforts attempt to extend ...
Community preserving adaptive graph convolutional networks for link prediction in attributed networks
AbstractLink prediction in attributed networks has attracted increasing attention recently due to its valuable real-world applications. Various related methods have been proposed, but most of them cannot effectively utilize community structure,...
Highlights- We investigate the impact of community structure on links by empirical observations.
Comments