Elsevier

Information Sciences

Volume 512, February 2020, Pages 935-951
Information Sciences

Improving network embedding with partially available vertex and edge content

https://doi.org/10.1016/j.ins.2019.09.083Get rights and content

Abstract

Network embedding aims to learn a low-dimensional representation for each vertex in a network, which has recently shown its power in many graph mining problems such as vertex classification and link prediction. Most existing methods learn such representations according to network structure information, and some methods further consider vertex content in a network. Unlike prior works, we study the problem of network embedding with two distinctive properties: (1) content information exists on both vertices and edges; (2) only a part of vertices and edges have content information. To solve this problem, we propose a novel Partially available Vertex and Edge Content Boosted network embedding method, namely PVECB, which uses available vertex and edge content information to fine-tune structure-only representations through two hand-designed mechanisms respectively. Empirical results on four real-world datasets demonstrate that our method can effectively boost structure-only representations to capture more accurate proximities between vertices.

Introduction

Graphs (or networks) are widely used in modeling complex real-world systems such as online social networks (OSNs), biological networks, and communication networks, where entities in these systems are modeled as vertices, and entity relations are modeled as edges. Many practical problems are then converted to the corresponding graph mining problems, e.g., the friend recommendation in Facebook and the who-to-follow service in Twitter become classic link prediction problems on graphs [6], [12], [32]. In recent years, network embedding techniques [3], [7], [16], [20] are gaining popularity and demonstrated to be effective in solving a wide scope of graph mining problems, including link prediction, vertex classification, and community detection. The basic idea behind network embedding is to design a mapping that maps vertices in a graph into a low-dimensional space while preserving meaningful structure information contained in the original graph, and the mapped vertices, called vertex embedding vectors or vertex representations, allow leveraging off-the-shelf machine learning algorithms to effectively solve aforementioned graph mining problems1.

In the literature, most existing network embedding methods [3], [16], [20] mainly leverage the network structure information to design the mapping, and if two vertices are connected or have small shortest-path distance, the embedding methods tend to map them to two close locations in the low-dimensional space. Although embedding vectors learned by structure-only methods have been demonstrated to be useful in some applications, they do not consider rich semantic information contained on vertices and edges and may obtain inappropriate vertex representations. In real-world networks, besides structural information, vertices and edges in a network often contain meaningful semantic information such as attributes and tags, and the semantic information is sometimes important when determining vertex representations. For example, let us consider a simple researcher social network in ResearchGate in Fig. 1. Here, a vertex represents a researcher, and two researchers are connected if they are friends, follow each other, or have co-authored at least one paper. We notice that R1 is an interdisciplinary researcher, who is interested in both Computer Security and Machine Learning. R1 has a connection with R2 who is interested in Machine Learning, and R1 has another connection with R3 who is interested in Computer Security. If we omit these vertex and edge semantics, and only consider structural information, structure-only embedding methods (such as the first-order proximity [20] that assumes all neighbors of a vertex are similar) will produce similar representations for vertices R2 and R3 in the low-dimensional space, implying that R2 and R3 have similar research interests, which violates our observation.

Present work. In this work, we investigate how to leverage structural information and content information to obtain better vertex representations. Using the example in Fig. 1, we briefly explain why both vertex content and edge content are helpful in obtaining better vertex representations. First, the content information on vertices R2 and R3 indicates that the two researchers actually belong to two different research communities, i.e., Machine Learning and Computer Security, respectively. Thus, their representations in the low-dimensional space should not be too close to each other. Second, although researchers R4 and R5 lack the vertex content, each of them has an edge with R1 and the edge content indicates that they have a lot of common research interests, and hence their vertex representations in the low-dimensional space should be close to each other.

Inspired by the above observations, we come up with the idea of enhancing structure-only network embedding methods by leveraging the vertex and edge content information. However, we still face the following challenges.

Challenge 1: Missing vertex/edge content. It is very common that some vertices or edges in a network do not have content in real-world networks. For example, some researchers may not provide their research interests (vertex content missing). Or, a researcher may follow some researchers but has not co-authored with them yet (edge content missing). How to handle such missing vertex/edge content remains difficult.

Challenge 2: Semantic gaps between vertex and edge content. Usually, vertex and edge content describe characteristics of vertices within different contexts, and thus there is a semantic gap between them. For example, researchers tend to use some simple and general words to describe their research interests (vertex content), such as Machine Learning used by R2. On the contrary, researchers need to elaborate their papers (edge content) using more technical and specialized words (e.g., Bayesian inference and transductive inference). The semantic gap between vertex and edge content will give rise to inaccurate relationships between vertices, and how to solve this issue needs to be further studied.

In this paper, we develop a novel method that jointly leverages both vertex and edge content. More importantly, our method allows incomplete vertex and edge content, and hence we refer the method as Partially available Vertex and Edge Content Boosted network embedding, namely PVECB. In particular, we address the above challenges from the following two perspectives correspondingly:

Fine-tuning with available content information. To address the first challenge, we first initialize embedding vectors of vertices according to the structure information and then fine-tune structure-only embedding vectors using available content information on vertices and edges. To some extent, the philosophy of fine-tuning can alleviate the problem of the incompleteness of content information.

Different functionalities of vertex and edge content. To address the second challenge, instead of gathering all content associated with a vertex together, we investigate the functionalities of vertex and edge content, and design two different objectives to fine-tune structure-only embedding vectors with them respectively.

Our main contributions are summarized as follows:

  • We propose a novel network embedding method that jointly leverages vertex and edge content to learn vertex representations. Our method tolerates missing vertex/edge content which makes it applicable in realistic scenarios.

  • We construct three real-world co-authorship networks, which include rich vertex and edge content that can be used to characterize relationships between vertices, and empirical results on these networks and another social network demonstrate that our method achieves a significant performance improvement (up to around 18%) over the best-performed structure-only method in terms of Micro-F1 scores. On the contrary, existing content-enhanced methods, such as TADW [30] and TriDNR [15], fail to handle the missing content information, and even their performance is worse than the best-performed structure-only method.

The remainder of this paper will proceed as follows. Section 2 summarizes related work. Section 3 introduces some background knowledge. Section 4 formulates the studied problem and introduces the proposed method PVECB in detail. Section 5 presents experimental results, and Section 6 concludes.

Section snippets

Related work

In this section, we summarize the related literature of network embedding methods from two aspects: methods that only use network structure information, and methods that also use auxiliary information such as content information and label information.

Preliminaries

In this section, we first give some background knowledge about structure-only network embedding methods and then analyze the drawbacks of the existing content-enhanced methods. Finally, we introduce a kind of deep learning technique, which is employed to extract features from content information in this paper.

Our proposed approach

In this section, we first formulate our studied problem. Then, we describe PVECB in detail, and discuss the model optimization.

Experiments

In order to validate the effectiveness of our model, we compare our method with several state-of-the-art methods on multi-label vertex classification. The empirical results demonstrate that our proposed method can effectively enhance the performance of structure-only network embedding methods with partially available vertex and edge content. Let us first briefly introduce the datasets and experimental settings.

Conclusion

In this paper, we propose a novel method PVECB to boost the existing structure-only network embedding method with available vertex and edge content. PVECB employs two kinds of mechanisms to fine-tune structure-only embedding vectors with vertex content and edge content respectively. Its attractive property is to allow us to implicitly diffuse content information over the network and learn effective vertex embedding vectors even when only a small portion of vertices and edges have content

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The research presented in this paper is supported in part by Shenzhen Basic Research Grant (JCYJ20170816100819428), National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (61922067, U1736205, 61603290), Natural Science Basic Research Plan in Shaanxi Province of China (2019JM-159), and Natural Science Basic Research Plan in Zhejiang Province of China (LGG18F020016).

Lin Lan received the B.S. in automation engineering from Xi’an Jiaotong University, Xi’an, P.R. China, in 2016. He is currently a Ph.D. student with NSKEYLAB at Xi’an Jiaotong University. His research interests include network analysis, network representation learning, and deep learning and its applications.

References (32)

  • R. Salakhutdinov et al.

    Semantic hashing

    Int. J. Approx. Reason.

    (2009)
  • T. Baldwin et al.

    How noisy social media text, how different social media sources?

    Proceedings of the Sixth International Joint Conference on Natural Language Processing

    (2013)
  • M. Belkin et al.

    Laplacian eigenmaps and spectral techniques for embedding and clustering

    Advances in Neural Information Processing Systems

    (2002)
  • S. Cao et al.

    Grarep: Learning graph representations with global structural information

    Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

    (2015)
  • H. Chen et al.

    Enhanced network embeddings via exploiting edge labels

    Proceedings of the 27th ACM International Conference on Information and Knowledge Management

    (2018)
  • T.F. Cox et al.

    Multidimensional Scaling

    (2000)
  • Y. Dong et al.

    Link prediction and recommendation across heterogeneous social networks

    IEEE 12th International Conference on Data Mining

    (2012)
  • A. Grover et al.

    Node2vec: scalable feature learning for networks

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2016)
  • D.P. Kingma et al.

    Stochastic gradient vb and the variational auto-encoder

    Second International Conference on Learning Representations, ICLR

    (2014)
  • Q. Le et al.

    Distributed representations of sentences and documents

    Proceedings of the 31st International Conference on Machine Learning

    (2014)
  • O. Levy et al.

    Neural word embedding as implicit matrix factorization

    Advances in Neural Information Processing Systems

    (2014)
  • H. Li et al.

    Variation autoencoder based network representation learning for classification

    Proceedings of ACL 2017, Student Research Workshop

    (2017)
  • D. Liben-Nowell et al.

    The link-prediction problem for social networks

    J. Am. Soc. Inf.Sci. Technol.

    (2007)
  • T. Mikolov et al.

    Distributed representations of words and phrases and their compositionality

    Advances in Neural Information Processing Systems

    (2013)
  • M. Ou et al.

    Asymmetric transitivity preserving graph embedding

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2016)
  • S. Pan et al.

    Tri-party deep network representation

    Network

    (2016)
  • Cited by (6)

    • AttrE2vec: Unsupervised attributed edge representation learning

      2022, Information Sciences
      Citation Excerpt :

      In this approach, only a part of the network is used to train the model to infer embeddings for new nodes. Several attempts have been made in the inductive setting including EP-B [12], GraphSAGE [13], GAT [14], SDNE [15], TADW [16], AHNG[17] or PVECB [18]. There is also recent progress on heterogeneous graph embedding, e.g., MIFHNE [19] or models based on graph neural networks [20].

    • Role-based network embedding via structural features reconstruction with degree-regularized constraint

      2021, Knowledge-Based Systems
      Citation Excerpt :

      In recent years, network embedding (NE) has aroused considerable interests of researchers [3–5]. It aims to learn low-dimensional representations of nodes or the whole graph while preserving its structure [6,7], which can be applied into a lot of downstream tasks, including node or graph classification [8], community detection [9], social recommendation [10,11], and link prediction [12,13]. Nearly all these NE methods are designed to preserve node proximity, which usually leads a clustering guided by communities in the network.

    • Fast Generating A Large Number of Gumbel-Max Variables

      2020, The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020

    Lin Lan received the B.S. in automation engineering from Xi’an Jiaotong University, Xi’an, P.R. China, in 2016. He is currently a Ph.D. student with NSKEYLAB at Xi’an Jiaotong University. His research interests include network analysis, network representation learning, and deep learning and its applications.

    Pinghui Wang received the Ph.D. degree in control science and engineering from Xi’an Jiaotong University, Xi’an, P.R. China. From April 2012 to October 2012, he was a postdoctoral researcher with the Department of Computer Science and Engineering at The Chinese University of Hong Kong. From October 2012 to July 2013, he was a postdoctoral researcher with the School of Computer Science at McGill University, QC, Canada. He is currently an associate professor with the department of automation at Xi’an Jiaotong University. His research interests include Internet traffic measurement and modeling, traffic classification, abnormal detection, and online social network measurement.

    Junzhou Zhao received the Ph.D. degree in control science and engineering from Xi’an Jiaotong University, Xi’an, P.R. China. He is currently a postdoctoral researcher at King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. His research focuses on mining and measuring massive large scale networks/graphs, with a particular interest in online social networks, a.k.a. the network science.

    Jing Tao received the B.S and M.S degrees in automatic control from Xi’an Jiaotong University, Xi’an, China, in 2001 and 2006 respectively. He is currently a teacher in Xi’an Jiaotong University and on-the-job Ph.D. candidate with the Systems Engineering Institute and SKLMS Laboratory, Xi’an Jiaotong University under the supervision of Prof. Xiaohong Guan. His research interests include Internet traffic measurement and modeling, traffic classification, abnormal detection, and botnet.

    John C.S. Lui received the Ph.D. degree in computer science from UCLA. He is currently a professor in the Department of Computer Science and Engineering at The Chinese University of Hong Kong. His current research interests include communication networks, network/system security (e.g., cloud security, mobile security, etc.), network economics, network sciences (e.g., online social networks, information spreading, etc.), cloud computing, large-scale distributed systems and performance evaluation theory. He serves in the editorial board of IEEE/ACM Transactions on Networking, IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, Journal of Performance Evaluation and International Journal of Network Security. He was the chairman of the CSE Department from 2005 to 2011. He received various departmental teaching awards and the CUHK Vice-Chancellor’s Exemplary Teaching Award. He is also a corecipient of the IFIP WG 7.3 Performance 2005 and IEEE/IFIP NOMS 2006 Best Student Paper Awards. He is an elected member of the IFIP WG 7.3, fellow of the ACM, fellow of the IEEE, and croucher senior research fellow.

    Xiaohong Guan received the B.S. and M.S. in automatic control from Tsinghua University, Beijing, China, in 1982 and 1985, respectively, and the Ph.D. degree in electrical engineering from the University of Connecticut, Storrs, US, in 1993. From 1993 to 1995, he was a consulting engineer at PG&E. From 1985 to 1988, he was with the Systems Engineering Institute, Xi’an Jiaotong University, Xi’an, China. From January 1999 to February 2000, he was with the Division of Engineering and Applied Science, Harvard University, Cambridge, MA. Since 1995, he has been with the Systems Engineering Institute, Xi’an Jiaotong University, and was appointed Cheung Kong Professor of Systems Engineering in 1999, and dean of the School of Electronic and Information Engineering in 2008. Since 2001 he has been the director of the Center for Intelligent and Networked Systems, Tsinghua University, and served as head of the Department of Automation, 2003–2008. He is an Editor of IEEE Transactions on Power Systems and an Associate Editor of Automata. His research interests include allocation and scheduling of complex networked resources, network security, and sensor networks. He has been elected Fellow of IEEE.

    View full text