research-article

Public Access

MC²: Unsupervised Multiple Social Network Alignment

Authors:

Li Sun,

Zhongbao Zhang,

Gen Li,

Pengxin Ji,

Sen Su,

Philip S. YuAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 14, Issue 4

Article No.: 70, Pages 1 - 22

https://doi.org/10.1145/3596514

Published: 21 July 2023 Publication History

PDF eReader

Abstract

Social network alignment, identifying social accounts of the same individual across different social networks, shows fundamental importance in a wide spectrum of applications, such as link prediction and information diffusion. Individuals more often than not join in multiple social networks, and it is in fact much too expensive or even impossible to acquiring supervision for guiding the alignment. To the best of our knowledge, few method in the literature can align multiple social networks without supervision. In this article, we propose to study the problem of unsupervised multiple social network alignment. To address this problem, we propose a novel unsupervised model of joint Matrix factorization with a diagonal Cone under orthogonal Constraint, referred to as MC². Its core idea is to embed and align multiple social networks in the common subspace via an unsupervised approach. Specifically, in MC² model, we first design a matrix optimization to infer the common subspace from different social networks. To address the nonconvex optimization, we then design an efficient alternating algorithm by leveraging its inherent functional property. Through extensive experiments on real-world datasets, we demonstrate that the proposed MC² model significantly outperforms the state-of-the-art methods.

1 Introduction

Social network alignment, identifying social accounts of the same individual across different social networks, shows fundamental importance across a wide spectrum of applications, ranging from network fusion [38], link prediction [43], information diffusion [35] to cross-domain recommendation [16]. Consequently, it is receiving increasing attention from both industry and academia.

A series of methods have been proposed for social network alignment, however, there still exist following limitations:

–

Supervision: Most of existing methods [8, 12, 14, 17, 34] rely on some known anchor users across these networks, acting as supervisions. However, since the information of social platforms is not shared, acquiring supervisions is in fact intractable or even impossible in most of real-world scenarios.

–

Multiplicity: Main existing methods [15, 17, 44] focus on aligning a pair of social networks. Actually, individuals often participate in multiple social networks. However, due to the global inconsistence [40], these methods cannot extend to align multiple social networks. Therefore, recently, researchers begin to investigate the general problem of multiple social network alignment.

To the best of our knowledge, few method in the literature perform unsupervised multiple network alignment. These limitations motivate us to rethink: Can we align multiple social networks via an unsupervised method?

The answer is yes. In different social networks, the same individual presents similar correlations with others, as evidenced by sociology [21, 28, 31]. This characteristics can be leveraged for social network alignment without supervision. Take the Figure 1 as an example. Figure 1(a) illustrates three different social networks. Since these three social networks share a similar structure, it is likely to infer that $v^{(1)}_a = v^{(2)}_a = v^{(3)}_a$. In this article, we propose to study the problem of unsupervised multiple network alignment. To address this problem, we design a novel Matrix factorization model with diagonal Cone under orthogonal Constraint, referred to as MC$^2$. The essential novelty of the proposed MC$^2$ model lies in that it embeds and aligns multiple social networks without supervision.

Fig. 1.

However, how to perform social network alignment without supervision still faces tremendous challenges in both modeling and optimization.

–

Modeling: Although it is significant to align multiple social networks without supervision, it still largely remains open. There are a few studies [11, 44] trying to align only a pair of social networks without supervision, however, they cannot extend to multiple network alignment. Moreover, there exist a few studies [18, 24] aiming at aligning multiple networks, however, they still heavily rely on the supervision. Besides, very few studies [37] struggle to aligning multiple networks without supervision. Given m social networks, they need to conduct $C_m^2$ times of pairwise alignments. It calls for a unified approach for multiple social network alignment without supervision.

–

Optimization: It is always a challenge to address the optimization problem. The underlying optimization problem tends to be nonconvex due to the inherent complexity of this problem. For a nonconvex optimization, it still lacks an effective method in general to approach the optimal. Thus, it renders this problem much more challenging.

To address the first challenge (Modeling), we design a novel matrix factorization model to perform network alignment without supervision as shown in Figure 1. Specifically, we first infer the optimal common subspace from proximity matrices in Figure 1(a). To this end, we formulate a Matrix factorization model with diagonal Cone, where the common base matrix is under orthogonal Constraint, as shown in Figure 1(b). Then, we embed each social network into the common subspace to align multiple social networks without supervision as shown in Figure 1(c). Hence, we obtain the embedding of the social account, depicting its underlying intrinsic identity, to perform unsupervised alignment. The merit of MC$^2$ model lies in that we unfold the intrinsic identity to identify the social accounts of the same individual.

To address the second challenge (Optimization), we propose an efficient alternating algorithm for the nonconvex optimization in MC$^2$ model. That is, we alternately optimize the objective function w.r.t. one variable matrix while fixing the others. Specifically, for the base matrix of the common subspace, we prove that the gradient function of this subproblem is Lipschitz continuous. Thus, we leverage the fast Nesterov’s gradient descent with 2-order convergence rate. For the matrices on the diagonal matrix cone for each social network, we leverage the orthogonal constraint in the optimization and derive the closed-form solution with the aid of Lagrangian multiplier. The merit of proposed alternating algorithm lies in that it efficiently optimizes each subproblem leveraging its inherent functional property.

Through extensive experiments on real-world datasets, we demonstrate that the proposed MC$^2$ model achieves up to 1.9$\times$ precision compared to the state-of-the-art approaches.

To summarize, we highlight the following noteworthy contributions as follows:

–

Problem. In this article, we propose to study the problem of unsupervised multiple social network alignment in a unified way, which has rarely been touch before in the literature.

–

Methodology. We design a novel unified unsupervised MC$^2$ model, which effectively embeds and aligns multiple social networks without any supervision (anchor users known in prior).

–

Optimization. We design an efficient alternating algorithm for the nonconvex optimization in MC$^2$ model leveraging its inherent functional property.

–

Experiment. Extensive experiments on real-world datasets demonstrate the evident superiority of the proposed MC$^2$ model.

Roadmap. The rest of this article is organized as follows: In Section 2, we define the problem of unsupervised multiple social networks alignment. Sections 3 and 4 introduce the modeling and optimization of the proposed MC$^2$ model, respectively. We show experimental results in Section 5, summarize related works in Section 6, and finally conclude our work in Section 7.

2 Problem Definition

There is a set of M social networks $\lbrace {{S^{(m)}}} \rbrace _{m=1}^M$. We use the superscript $^{(m)}$ to indicate the notation associated with the social network $S^{(m)}$ throughout this article. A social network ${S^{(m)}}$ owns ${N^{(m)}}$ social accounts $\lbrace v_{(\cdot)}^{(m)} \rbrace$. We utilize a structure proximity matrix ${{\mathbf {W}}^{(m)}} \in {\mathbb {R}^{n \times n}}$ and attribute similarity matrix ${{\mathbf {M}}^{(m)}} \in {\mathbb {R}^{n \times n}}$ to represent the pairwise proximity of structure and attribute features of these social accounts respectively, where n is social account number of the largest social network. ${\mathbf {W}^{(m)}}$ and ${\mathbf {M}^{(m)}}$ are symmetric as the pairwise proximity is considered to be undirected. The value of element $\mathbf {W}_{i,j}^{(m)}$ indicates the connection strength of structure features between user account $v_i^{(m)}$ and $v_j^{(m)}$, which can be either binary or any nonnegative value. At the same time, the value of element $\mathbf {M}_{i,j}^{(m)}$ shows the connection strength of attribute features between user account $v_i^{(m)}$ and $v_j^{(m)}$, which can be either binary or any nonnegative value. In this article, we propose to align multiple social networks without any supervision, whether or not the social network is weighted.

We summarize the main notations in Table 1 and formally define the problem of unsupervised multiple social network alignment as follows:

Table 1.

Notation	Definition
$\lbrace {{S^{(m)}}} \rbrace _{m=1}^M$	the social network set
${ \mathbf {D} }$	the proximity matrix of the social network
${ \mathbf {W} }$	the structure proximity matrix of the social network
${ \mathbf {M} }$	the attribute proximity matrix of the social network
$\mathbf {B}$	the matrix of eigenvalue
$\mathbf {H}$	the base matrix of common subspace
$\mathbf {H}_1$	the base matrix of common structure subspace
$\mathbf {H}_2$	the base matrix of common attribute subspace
$\mathbf {V}$	the matrix of embeddings
r	the dimension of common subspace
n	the account number of the largest network
L	the Lipschitz constant

Table 1. Main Notations and Definitions

Given a set of social networks $\lbrace {{S^{(m)}}} \rbrace _{m=1}^M$ without any supervision, the problem of unsupervised multiple social network alignment is: for each $S^{(m)}$, find a $\phi ^{(m)}$, mapping the social account to its owner, so that

\begin{equation*} \phi ^{(1)}(v_{(\cdot)}^{(1)})=\cdots =\phi ^{(m)}(v_{(\cdot)}^{(m)})=\cdots =\phi ^{(M)}(v_{(\cdot)}^{(M)}), \end{equation*}

to identify social accounts belonging to the same individual.

3 MC²: Modeling

To address the problem of unsupervised multiple social network alignment, we design a novel Matrix factorization model with diagonal Cone under orthogonal Constraint, termed as MC$^2$, as shown in Figure 1. In different social networks, the same individual presents similar correlations with others, as evidenced by sociology [21, 28, 31]. This characteristics can be leveraged for social network alignment without supervision.

Recall the example in Figure 1(a). Since these nodes share something in common on structure and attribute, i.e., they play similar roles in the corresponding whole network, it is likely to infer that $v^{(1)}_a = v^{(2)}_a = v^{(3)}_a$. Obviously, common subspace includes common structure subspace and common attribute subspace. Therefore, common subspace needs to be constructed so that each person will have the same or similar vector representation in this common subspace even though he/she has multiple different accounts in these social networks. As a vector space is spanned by its bases, the goal of our model is to identify the common bases.

In MC$^2$, we first infer the common bases for multiple social networks in Section 3.1 without any supervision and then embed these networks into such common subspace in Section 3.2. Finally, we perform the user identity alignment by comparing the similarities of different accounts in this common space in Section 3.3.

3.1 Common Bases Inferring

This section focuses on how to infer the common structure bases and common attribute bases from different social networks. Note that, we use the structure proximity matrix $\mathbf {W}^{(m)} \in {\mathbb {R}}^{n \times n}$ and attribute proximity matrix $\mathbf {M}^{(m)} \in {\mathbb {R}}^{n \times n}$ to represent the corresponding social network.

First, according to feature decomposition of symmetric matrix, $\mathbf {W^{(m)}}$ can be represented as follows:

\begin{equation} {{{\bf W}}^{(m)}} = {{\bf H}_1}_{eig}^{\left(m \right)}{{\bf B}_1}_{eig}^{\left(m \right)}{{\bf H}_1}_{eig}^{\left(m \right)T}, \end{equation}

(1)

where ${{\bf H}_1}_{eig}^{(m)}$ and ${{\bf B}_1}_{eig}^{(m)}$ are the eigenvector and eigenvalue of the structure proximity matrix $\mathbf {W}^{(m)}$, respectively.

For ${{\bf H}_1}_{eig}^{(m)}$, it can be represented with a common base matrix $\mathbf {H}_1$ and a specific base matrix ${{\bf H}_1}_{s}^{(m)}$, i.e., $[ {{{\bf H}_1},{{\bf H}_1}_s^{(m)}} ]$. For ${{\bf B}_1}_{eig}^{(m)}$, it can also be represented with two matrices, i.e., $\left[ {\begin{matrix} {{{{\bf B}}^{(m)}}}&{}\\ {}&{{{\bf B}_1}_s^{(m)}} \end{matrix}} \right]$, where ${{{{\bf B}}^{(m)}}}$ and ${{\bf B}_1}_{s}^{(m)}$ denote the diagonal cone matrix of common subspace and specific subspace, respectively. Then as shown in Figure 2, ${{\bf W}}^{(m)}$ can be reformulated as:

\begin{equation} \begin{aligned}{{{\bf W}}^{(m)}} & = \left[ {{{\bf H}_1},{{\bf H}_1}_{s}^{\left(m \right)}} \right]\left[ \begin{array}{*{20}{c}} {{{{\bf B}}^{\left(m \right)}}}&{}\\ {}&{{{\bf B}_1}_{s}^{\left(m \right)}} \end{array} \right]\left[ \begin{array}{*{20}{c}} {{{{\bf H}_1}^T}}\\ {{{\bf H}_1}{{_{s}^{\left(m \right)}}^T}} \end{array} \right]\\ \\ & = {{\bf H}_1}{{{\bf B}}^{\left(m \right)}}{{{\bf H}_1}^T} + {{\bf H}_1}_{s}^{\left(m \right)}{{\bf B}_1}_{s}^{\left(m \right)}{{\bf H}_1}_{s}^{\left(m \right)T}. \end{aligned} \end{equation}

(2)

Note that, the base of common space should be orthogonal so that we can relax it to be $\mathbf {H_1}^T\mathbf {H_1}=\mathbf {I}$ as there exists no inverse for a $n \times r$ matrix.

Fig. 2.

Then, attribute proximity matrice $\mathbf {M}^{(m)} \in {\mathbb {R}}^{n \times n}$ can be represented as follows:

\begin{equation} {{{\bf M}}^{(m)}} = {{\bf H}_2}_{eig}^{\left(m \right)}{{\bf B}_2}_{eig}^{\left(m \right)}{{\bf H}_2}_{eig}^{\left(m \right)T}, \end{equation}

(3)

where ${{\bf H}_2}_{eig}^{(m)}$ and ${{\bf B}_2}_{eig}^{(m)}$ are the eigenvector and eigenvalue of the attribute proximity matrix $\mathbf {M}^{(m)}$, respectively.

Similarly, ${{\bf H}_2}_{eig}^{(m)}$ can consist of a common base matrix and a specific base matrix, $[ {{{\bf H}_2},{{\bf H}_2}_s^{(m)}} ]$. It should be noted that due to the similarity between structure and attribute features, ${{\bf B}_2}_{eig}^{(m)}$ can also be represented with two matrices, i.e., $\left[ {\begin{matrix} {{{{\bf B}}^{(m)}}}&{}\\ {}&{{{\bf B}_2}_s^{(m)}} \end{matrix}} \right]$, where ${{{{\bf B}}^{(m)}}}$ and ${{\bf B}_2}_{s}^{(m)}$ denotes the diagonal cone matrix of common subspace and specific subspace, respectively. As a result, ${{\bf M}}^{(m)}$ can be reformulated as

\begin{equation} \begin{aligned}{{{\bf M}}^{(m)}} & = \left[ {{{\bf H}_2},{{\bf H}_2}_{s}^{\left(m \right)}} \right]\left[ \begin{array}{*{20}{c}} {{{{\bf B}}^{\left(m \right)}}}&{}\\ {}&{{{\bf B}_2}_{s}^{\left(m \right)}} \end{array} \right]\left[ \begin{array}{*{20}{c}} {{{{\bf H}_2}^T}}\\ {{{\bf H}_2}{{_{s}^{\left(m \right)}}^T}} \end{array} \right]\\ \\ & = {{\bf H}_2}{{{\bf B}}^{\left(m \right)}}{{{\bf H}_2}^T} + {{\bf H}_2}_{s}^{\left(m \right)}{{\bf B}_2}_{s}^{\left(m \right)}{{\bf H}_2}_{s}^{\left(m \right)T}. \end{aligned} \end{equation}

(4)

Note that, the base of common space should be orthogonal so that we can relax it to be $\mathbf {H_2}^T\mathbf {H_2}=\mathbf {I}$ as there exists no inverse for a $n \times r$ matrix.

Finally, we extract the common bases of different networks, through some particular constraints. Structural information and attribute information have potential commonalities, which is the core of our article. Therefore, in order to simplify our problem mathematically and reduce the losses as little as possible, we chose the treatment similar to attribute proximity matrix and structure proximity matrix. The orthogonal common base matrix $\mathbf {H}_1$ and $\mathbf {H}_2$ can be learned via optimizing the matrix factorization model with diagonal cone under orthogonal constraint as follows:

\begin{equation} \begin{aligned}\begin{array}{l} \mathop {\min }\limits _{\lbrace {{{\bf B}}^{(m)}}\rbrace , {{{\bf H}}_1},{{{\bf H}}_2}} \quad {\cal J} = \displaystyle \sum \limits _m {\left[ {\left\Vert {{{{\bf W}}^{(m)}} - {{{\bf H}}_{\rm {1}}}{{{\bf B}}^{(m)}}{{{\bf H}}_{\rm {1}}}^T} \right\Vert _F^2} \right.} \\ \left. {{\rm { \qquad \qquad \qquad \qquad \qquad + }}\left\Vert {{{{\bf M}}^{(m)}} - {{{\bf H}}_{\rm {2}}}{{{\bf B}}^{(m)}}{{{\bf H}}_{\rm {2}}}^T} \right\Vert _F^2} \right] \\ s.t. \quad {\mathbf {{H}_1}^T}\mathbf {H_1} = \mathbf {I}, \quad \mathbf {{H}_2}^T\mathbf {H_2} = \mathbf {I}, \quad {\mathbf {B}^{(m)}} \in \mathbb {S}_{\text{diag+}} ^r, \end{array} \end{aligned} \end{equation}

(5)

where ${\Vert \cdot \Vert _F}$ is Frobenius norm and $\mathbf {I} \in \mathbb {R}^{r \times r}$ is the identity matrix. $\mathbb {S}_{\text{diag+}} ^r$ is the cone of nonnegative diagonal matrix, i.e., $\mathbf {B}^{(m)} = \text{diag}(\lbrace b_i^{(m)}\rbrace)$ and $b_i^{(m)}$ is nonnegative.

3.2 Common Subspace Embedding

With the common base matrix, we embed the structure proximity matrices $\lbrace \mathbf {W}^{(m)}\rbrace _{m=1}^M$ and the attribute proximity matrices $\lbrace \mathbf {M}^{(m)}\rbrace _{m=1}^M$ of these networks into the inferred common subspace via matrix factorization [9, 10]. Furthermore, we formulate the equation system into a unified equation pair to make the form more concise, i.e.,

\begin{equation} {{{\bf D}}^{(m)}} = \left[ \begin{array}{*{20}{c}} {{{{\bf W}}^{(m)}}}&{}\\ {}&{{{{\bf M}}^{(m)}}} \end{array} \right]\!,{{\bf H}} = \left[ \begin{array}{*{20}{c}} {{{{\bf H}}_1}}&{}\\ {}&{{{{\bf H}}_2}} \end{array} \right]\!. \end{equation}

(6)

Formally, for each $\mathbf {D^{{(m)}}}$, we have,

\begin{equation} \mathbf {D}^{(m)}=\mathbf {H} \mathbf {V}^{(m)}. \end{equation}

(7)

Specifically, $\mathbf {H}$ is the base matrix of the optimal common subspace, which is the combination of ${{{\bf H}}_1}$ and ${{{\bf H}}_2}$ in Equation (5), whose columns are the bases of the inferred r-dimensional common subspace. $\mathbf {V}^{(m)} \in {\mathbb {R}}^{n \times r}$ is the embedding matrix of $S^{(m)}$. $\mathbf {H}$’s rows are the embedding depicting the intrinsic identity of a social account. Due to the orthogonality of ${{{\bf H}}_1}$ and ${{{\bf H}}_2}$, we leverage the orthogonality onto $\mathbf {H}$ to obtain the embedding matrix $\mathbf {V}^{(m)}$ for each social networks. Specifically, as no inverse exists for $\mathbf {H}$, we left multiply its pseudo inverse, $\mathbf {H}^\dagger =(\mathbf {H}^T \mathbf {H})^T \mathbf {H}^T$, on both sides of Equation (7), i.e.,

\begin{equation} \mathbf {H}^\dagger \mathbf {D}^{(m)}= \mathbf {H}^\dagger \mathbf {H} \mathbf {V}^{(m)}. \end{equation}

(8)

Leveraging the orthogonality, i.e., ${\mathbf {H}^T}\mathbf {H} = \mathbf {I}$, we obtain the embedding matrix of the intrinsic identity by $\mathbf {V}^{(m)}=\mathbf {H}^\dagger \mathbf {D}^{(m)}$.

3.3 Alignment Identification

When we get the embedding results for each account, which include the structure embedding result and the attribute embedding result, there are many alternatives to describe the underlying intrinsic identity between different accounts. Note that, we use two-dimensional structure and attribute space to depict different accounts in this article. Furthermore, we choose Euclidean distance to measure the difference between different accounts in the two dimensions. Obviously, the smaller the Euclidean distance between different accounts, the more likely these accounts belong to the same person. Recall the example in Figure 1(c). In the common subspace, accounts positioned close, e.g., $\lbrace v_a^{(1)}, v_a^{(2)}, v_a^{(3)}\rbrace$, are regarded to be good candidates for alignment.

The merit of MC$^2$ model is to unfold the intrinsic identity of social accounts from the observations from different social networks without supervision.

4 MC²: Optimization

The core of the proposed model is to obtain the orthogonal base of the common subspace via optimization in Equation (5). However, the optimization is not jointly convex over the variable matrices $\lbrace \mathbf {B}^{(m)}\rbrace$, $\mathbf {H_1}$ and $\mathbf {H_2}$. Thus, it is infeasible to obtain the closed-form optimal solution. To address this nonconvex optimization, we propose an efficient alternating algorithm, which alternately optimizes the objective in Equation (5) w.r.t. one variable matrix while fixing others. The merit of proposed alternating algorithm lies in that it efficiently optimizes each subproblem leveraging its inherent functional property.

4.1 H Subproblem

In order to optimize the common subspace problem in an efficient way, we rewrite the objective function in Equation (5) w.r.t. $\mathbf {H}$, referred to as $\mathcal {J}(\mathbf {H})$ as follows:

\begin{equation} \mathop {\min }\limits _ \mathbf {{H^T}H = I} \mathcal {J}\left(\mathbf {H} \right) = \sum \limits _m {\left\Vert {{\mathbf {D}^{(m)}} - \mathbf {H}{\mathbf {C}^{(m)}}{\mathbf {H}^T}} \right\Vert _F^2}. \end{equation}

(9)

where ${{{\bf D}}^{(m)}} = \left[ {\begin{matrix} {{{{\bf W}}^{(m)}}}&{}\\ {}&{{{{\bf M}}^{(m)}}} \end{matrix}} \right]$, ${{{\bf C}}^{(m)}} = \left[ {\begin{matrix} {{{{\bf B}}^{(m)}}}&{}\\ {}&{{{{\bf B}}^{(m)}}} \end{matrix}} \right]$ and ${{\bf H}} = \left[ {\begin{matrix} {{{{\bf H}}_1}}&{}\\ {}&{{\bf {\rm H}}_2} \end{matrix}} \right]$.

We derive the gradient of $\mathcal {J}(\mathbf {H})$ w.r.t. $\mathbf {H}$ as follows:

\begin{equation} \nabla \mathcal {J} \left(\mathbf {H} \right) = 4\sum \limits _m {\left({\mathbf {H}{\mathbf {C}^{\left(m \right)}}{\mathbf {C}^{\left(m \right)T}} - {\mathbf {W}^{\left(m \right)}}\mathbf {H}{\mathbf {C}^{\left(m \right)T}}} \right)}, \end{equation}

(10)

where the orthogonal constraint is imposed, i.e., ${\mathbf {H}^T}\mathbf {H} = \mathbf {I}$.

First, we study the functional property of $\nabla \mathcal {J} (\mathbf {H})$ and show that the gradient $\nabla \mathcal {J}(\mathbf {H})$ is Lipschitz continuous, concluded in Lemma (Lipschitz Continuousness) as follows:

Lemma 1 (Lipschitz Continuousness).

The partial gradient $\nabla \mathcal {J}(\mathbf {H})$ is Lipschitz continuous with the constant:

\begin{equation} L = 4\left[ {{{\left\Vert {{\mathbf {C}^{\left(m \right)T}}{\mathbf {C}^{\left(m \right)}}} \right\Vert }_F} + {{\left\Vert {{\mathbf {W}^{(m)}}} \right\Vert }_F}{{\left\Vert {{\mathbf {C}^{\left(m \right)}}} \right\Vert }_F}} \right]\!. \end{equation}

(11)

Proof.

A function $f(x)$ is called to be Lipschitz continuous if and only if there exists a constant L so that $|f(x_1)-f(x_2)| \le L|x_1-x_2|$ holds for any $x_1$ and $x_2$ in the domain of $f(x)$. For any two matrices ${\mathbf {H}_a}, {\mathbf {H}_b} \in {\mathbb {R}^{n \times r}}$, we have:

\begin{equation} \begin{aligned}& \frac{1}{2} {\left\Vert \nabla \mathcal {J} \left(\mathbf {H}_a \right) - \nabla \mathcal {J} \left(\mathbf {H}_b \right) \right\Vert }_F \\ = \ & 2 \sum \limits _m || \mathbf {H}_a \mathbf {C}^{ \left(m \right) } { \mathbf {C}^{ \left(m \right) } } ^T - \mathbf {W}^{ \left(m \right) } \mathbf {H}_a { \mathbf {C}^{\left(m \right)} }^T \\ & \quad \ - \mathbf {H}_b \mathbf {C}^{ \left(m \right) } { \mathbf {C}^{ \left(m \right) } }^T + \mathbf {W}^{ \left(m \right) } \mathbf {H}_b {\mathbf {C}^{ \left(m \right) } }^T || _F\\ \le \ & 2\sum \limits _m \left[ {\left\Vert {{\mathbf {W}^{\left(m \right)}}} \right\Vert }_F {\left\Vert {{\mathbf {H}_a} - {\mathbf {H}_b}} \right\Vert }_F {\left\Vert {{\mathbf {C}^{\left(m \right)T}}} \right\Vert }_F \right.\\ & \left. \quad \ + {\left\Vert {{\mathbf {H}_a} - {\mathbf {H}_b}} \right\Vert }_F {\left\Vert {{\mathbf {C}^{\left(m \right)}}{\mathbf {C}^{\left(m \right)T}}} \right\Vert }_F \right] \\ = \ & 2 \left[ \sum \limits _m \left({{{\left\Vert {{\mathbf {C}^{\left(m \right)}}{\mathbf {C}^{\left(m \right)T}}} \right\Vert }_F + \left\Vert {{\mathbf {W}^{\left(m \right)}}} \right\Vert }_F \left\Vert {{\mathbf {C}^{\left(m \right)T}}} \right\Vert }_F \right) \right] \\ & \quad \cdot { \left\Vert \mathbf {H}_a - \mathbf {H}_b \right\Vert }_F \\ = \ & \frac{L}{2} {\left\Vert \mathbf {H}_a - \mathbf {H}_b \right\Vert }_F. \end{aligned} \end{equation}

(12)

Thus, we obtain Lipschitz constant L and prove this lemma.□

Second, with the fact that the gradient ${\nabla \mathcal {J} ({{\mathbf {H}}})}$ is Lipschitz continuous, we are able to employ a fast algorithm, Nesterov gradient descent, to solve the $\mathbf {H}$ subproblem in Equation (9). According to the study [19], we introduce an auxiliary matrix $\mathbf {Y}$ for $\mathbf {H}$ and optimize the upper bound of second order Taylor expansion as follows:

\begin{equation} \mathcal {J}(\mathbf {Y},\mathbf {H}) = \mathcal {J}(\mathbf {H}) + \left\langle {\nabla \mathcal {J}\left(\mathbf {H} \right)\!,\mathbf {H} - \mathbf {Y}} \right\rangle + \frac{L}{2}\left\Vert {\mathbf {H} - \mathbf {Y}} \right\Vert _F^2, \end{equation}

(13)

whose minimal solution is equivalent to that in Equation (9). To achieve the goal, we construct two sequences, i.e., $\lbrace {{{{\bf H}}_l}} \rbrace$ and $\lbrace {{{{\bf Y}}_l}} \rbrace$, and alternatively update them in each iteration. Based on the conclusions from previous work [3, 19], at the ${l^{th}}$ iteration, we have updating rule as follows:

\begin{equation} \begin{aligned}{\alpha _l} &= {2^{ - i}}{\alpha _{l - 1}}, \\ {\mathbf {H}_l} &= {P_ + }\left({{\mathbf {Y}_l} - {\alpha _l}\nabla J\left({{\mathbf {Y}_l}} \right)} \right)\!,\\ {\beta _{l + 1}} &= \left({1 + \sqrt {4{\beta _l}^2 + 1} } \right)/2,\\ {\mathbf {Y}_{l + 1}} &= {\mathbf {H}_l} + \left({{\beta _l} - 1} \right)\left({{\mathbf {H}_l} - {\mathbf {H}_{l - 1}}} \right)/{\beta _{l + 1}}, \end{aligned} \end{equation}

(14)

where ${{\beta _l}}$ is the acceleration coefficient for updating ${\mathbf {Y}_{l + 1}}$. Let $\hat{\mathbf {H}}_l={{\mathbf {Y}_l} - {\alpha _l}\nabla J({{\mathbf {Y}_l}})}$. The operator ${P_ + }(\hat{\mathbf {H}}_l)$ projects $\hat{\mathbf {H}}_l$ to be an orthonormal basis [5]. The nonnegative scalar i is the lower bound of the following inequality,

\begin{equation} \begin{aligned}& \mathcal {J}\left({{\mathbf {Y}_l}} \right) - \mathcal {J}\left({{\mathbf {Y}_l} - {2^{ - i}}{\alpha _{l - 1}}\nabla \mathcal {J}\left({{\mathbf {Y}_l}} \right)} \right) \\ \ge \ & {2^{ - i - 1}}{\alpha _{l - 1}}\left\Vert {\nabla \mathcal {J}\left({{\mathbf {Y}_l}} \right)} \right\Vert _F^2. \end{aligned} \end{equation}

(15)

Obviously, the challenge of employing the fast Nesterov gradient descent is to solve ${\mathbf {H}_l}={P_ + }(\hat{\mathbf {H}}_l)$. Now, we derive the closed-form solution. Specifically, we formulate ${P_ + }(\hat{\mathbf {H}}_l)$ into an optimization problem as follows:

\begin{equation} \begin{aligned}{P_ + } (\hat{\mathbf {H}}_l) & = \mathop {\arg }\limits _\mathbf {H} \mathop {\min } || {\mathbf {H} - \hat{\mathbf {H}}_l} ||_F^2 \\ &= \mathop {\arg }\limits _{{\bf H}} \min tr\left[ {{{({{\widehat{{\bf H}}}_l}{\rm { - }}{{\bf H}})}^T} \cdot ({{\widehat{{\bf H}}}_l}{\rm { - }}{{\bf H}})} \right] \\ &= \mathop {\arg }\limits _{{\bf H}} \min tr\left[ {({{{\bf H}}^T}{{\bf H}}) - (\widehat{{\bf H}}_l^T{{\bf H}}) + (\widehat{{\bf H}}_l^T{{{\bf H}}_l})} \right] \\ &= \mathop {\arg }\limits _{{\bf H}} \min tr\left[ {({{{\bf H}}^T}{{\bf H}}) - (\widehat{{\bf H}}_l^T{{\bf H}}) + (\widehat{{\bf H}}_l^T{{{\bf H}}_l})} \right]\!. \end{aligned} \end{equation}

(16)

Due to the orthogonality of ${{\bf H}}$, Equation (13) is equivalent to Equation (14),

\begin{equation} {P_ + }({\widehat{{\bf H}}_l}){\rm { }} = \mathop {\arg }\limits _{{\bf H}} \max tr(\widehat{{\bf H}}_l^T{{\bf H}}). \end{equation}

(17)

To address the optimization problem in Equation (14), we utilize Lagrangian multiplier method with multiplier matrix $\mathbf {\Lambda }$ and obtain the following equation:

\begin{equation} {\widehat{{\bf H}}_l} = {{\bf H}}\left({\frac{{{{{\bf \Lambda } }^T} + {{\bf \Lambda } }}}{2}} \right)\!, \end{equation}

(18)

coupled by two variables $\mathbf {H}$ and ${\bf \Lambda }$. Fortunately, we can eliminate the variable ${\bf H}$ owing to its orthogonality as follows:

\begin{equation} {\widehat{{\bf H}}_l}^T{\widehat{{\bf H}}_l} = {({{{\bf \Lambda } }^T} + {{\bf \Lambda } })^2}/4. \end{equation}

(19)

As the real matrix ${({{{\bf \Lambda } }^T} + {{\bf \Lambda } })^2}/4$ is symmetric, there exists a unique form of feature decomposition. Hence, we perform feature decomposition in ${\hat{\mathbf {H}}_l}^T {\hat{\mathbf {H}}_l}$,

\begin{equation} {\widehat{{\bf H}}_l}^T{\widehat{{\bf H}}_l} = {{\bf CD}}{{{\bf C}}^T}, \end{equation}

(20)

where $\mathbf {C}$ is the feature vector matrix and $\mathbf {D}$ is the nonnegative eigenvalue matrix. Due to the nonnegative of $\mathbf {\Lambda }$ and orthogonality of $\mathbf {C}$, we have,

\begin{equation} \frac{{\mathbf {\Lambda }^T + {\mathbf {\Lambda }}}}{2} = \max \left({0,\mathbf {C}{\mathbf {D}^{\frac{1}{2}}}{\mathbf {C}^T}} \right)\!. \end{equation}

(21)

We can obtain the closed-form solution of $\mathbf {H}_l$ as follows:

\begin{equation} \mathbf {H}_l = \hat{\mathbf {H}}_l{\left[ {\frac{{\mathbf {\Lambda }^T + {\mathbf {\Lambda }}}}{2}} \right]^T}{\left[ {\left({\frac{{\mathbf {\Lambda }^T + {\mathbf {\Lambda }}}}{2}} \right){{\left({\frac{{\mathbf {\Lambda }^T + {\mathbf {\Lambda } }}}{2}} \right)}^T}} \right]^{ - 1}}. \end{equation}

(22)

Finally, we learn the convergence rate of the Nesterov gradient descent in solving this subproblem.

Lemma 2.

For objective function ${\cal J}({{\bf H}})$, if the sequence $\lbrace {{{{\bf H}}_k}} \rbrace _0^\infty$ is constructed by Equation (14), then the following assertions are true:

(1)

\begin{equation} \mathcal {J}\left({{\mathbf {H}_l}} \right) - \mathcal {J}\left({{\mathbf {H}^*}} \right) \le {\raise0.7ex\hbox{${4L\left\Vert {{\mathbf {Y}_0} - {\mathbf {H}^*}} \right\Vert _F^2}$} \!\mathord {\left.\middle / {\vphantom{{2L\left\Vert {{\mathbf {Y}_0} - {\mathbf {H}^*}} \right\Vert _F^2} {{{\left({l + 2} \right)}^2}}}}\right.\hspace{0.0pt}} \!\lower0.7ex\hbox{${{{\left({l + 2} \right)}^2}}$}}, \end{equation}

(23)

where ${{\mathbf {Y}_0}}$ is the initial matrix and ${L}$ is the Lipschitz constant defined in Lemma (Lipschitz Continuousness). ${\mathbf {H}^ * }$ is the optimal solution of the optimization in Equation (9).

(2) There are two conditions needed to be guaranteed to achieve accuracy $\varepsilon$. One is computing the gradient of the objective function $\nabla {\cal J}({{\bf H}})$ no more than $\lfloor {\sqrt {{}^{E}/_{\varepsilon }}} \rfloor$ times and the other is evaluating the objective function ${\cal J}({{\bf H}})$ no more than $2\lfloor {\sqrt {{}^{E}/_{\varepsilon }} } \rfloor + \lfloor {{{\log }_2}(2L{\alpha _{ - 1}})} \rfloor + 1$ times.

where $\lfloor \cdot \rfloor$ represents rounding down and $E = 4L{\Vert {{{{\bf Y}}_0} - {{{\bf H}}^*}} \Vert ^2}$.

That is, we can claim that Nesterov gradient descent has a much faster convergence rate of ${\rm O}({\frac{1}{{{l^2}}}})$ for optimizing $\mathbf {H}$, where l is the number of iterations.

4.2 B Subproblem

To solve the $\mathbf {B}$ subproblem, we solve $\mathbf {B}^{(m)}$ in parallel as variable matrices $\lbrace \mathbf {B}^{(m)} \rbrace _{m=1}^M$ are not coupled. We give the objective function in Equation (5) w.r.t. $\mathbf {B}^{(m)}$, referred to as $\mathcal {J}(\mathbf {B}^{(m)})$ as follows:

\begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{{\bf B}}^{(m)}} \in _{{\rm {diag + }}}^r} {\cal J}({{{{\bf B}}^{(m)}}}) = \left\Vert {{{{\bf W}}^{\left(m \right)}} - {{{\bf H}}_1}{{{\bf B}}^{\left(m \right)}}{{\bf H}}_1^T} \right\Vert _F^2\\ \qquad \qquad \qquad \qquad + \left\Vert {{{{\bf M}}^{\left(m \right)}} - {{{\bf H}}_2}{{{\bf B}}^{\left(m \right)}}{{\bf H}}_2^T} \right\Vert _F^2. \end{array} \end{equation}

(31)

Leveraging the orthogonal constraint, i.e., ${{\bf H}}_1^T{{{\bf H}}_1} = {{\bf I}},{{\bf H}}_2^T{{{\bf H}}_2} = {{\bf I}}$, we derive the closed-form solution for each $\mathbf {B}^{(m)}$.

First, due to the diagonalization of $\mathbf {B}$, we reformulate the 3-order term in Equation (31) into a linear combination of rank-one matrices [6] as follows:

\begin{equation} \begin{array}{l} \mathop {\min }\limits _{\lbrace b_i^{\left(m \right)}\rbrace _{i = 1}^r} {{\cal J}_b} = \left\Vert {{{{\bf W}}^{\left(m \right)}} - \displaystyle \sum \limits _{i = 1}^r {b_i^{\left(m \right)}} {{{\bf h}}_{{1_i}}}{{\bf h}}_{{1_i}}^T} \right\Vert _F^2\\ \qquad \qquad \qquad + \left\Vert {{{{\bf M}}^{\left(m \right)}} - \displaystyle \sum \limits _{i = 1}^r {b_i^{\left(m \right)}} {{{\bf h}}_{{2_i}}}{{\bf h}}_{{2_i}}^T} \right\Vert _F^2, \end{array} \end{equation}

(32)

where ${{{\bf h}}_{{j_i}}}$ is the ${i\textrm {th}}$ column of ${{{\bf H}}_j}$ in which j = 1, 2 and $b_i^{(m)}$ is the corresponding nonnegative eigenvalue. Then, we notice that the optimization in Equation (32) is a quadratic programming problem under inequality constraints. We derive the Lagrangian function with multipliers $\lbrace \lambda _i^{(m)}\rbrace _{i=1}^r$,

\begin{equation} \begin{array}{l} {\cal L} = \left\Vert {{{{\bf W}}^{\left(m \right)}} - \displaystyle \sum \limits _{i = 1}^r {b_i^{\left(m \right)}} {{{\bf h}}_{{1_i}}}{{\bf h}}_{{1_i}}^T} \right\Vert _F^2\\ \qquad \qquad +\ \left\Vert {{{{\bf M}}^{\left(m \right)}} - \displaystyle \sum \limits _{i = 1}^r {b_i^{\left(m \right)}} {{{\bf h}}_{{2_i}}}{{\bf h}}_{{2_i}}^T} \right\Vert _F^2 - \displaystyle \sum \limits _{i = 1}^r {\lambda _i^{\left(m \right)}} b_i^{\left(m \right)}, \end{array} \end{equation}

(33)

and leverage the Lagrangian multiplier method. Thus, we have the following system:

\begin{equation} \begin{array}{l} {\nabla _{b_i^{\left(m \right)}}} \mathcal {L} = {\nabla _{_{b_i^{\left(m \right)}}}} \mathcal {J}_b - {\lambda ^{(m)}_i}b_i^{\left(m \right)} \ = 0, \\ b_i^{\left(m \right)} \ge 0, \\ {\lambda ^{(m)} _i} \ge 0, \\ {\lambda ^{(m)} _i}b_i^{\left(m \right)} = 0. \end{array} \end{equation}

(34)

Finally, solving this system, we obtain the closed-form solution as follows:

\begin{equation} b_i^{\left(m \right)} = \frac{{\frac{{\lambda _i^{(m)}}}{2} + tr\left[ {{{{\bf W}}^{\left(m \right)}}\left({{{{\bf h}}_{{1_i}}}{{\bf h}}_{{1_i}}^T} \right)} \right] + tr\left[ {{{{\bf M}}^{\left(m \right)}}\left({{{{\bf h}}_{{2_i}}}{{\bf h}}_{{2_i}}^T} \right)} \right]}}{{||{{{\bf h}}_{{1_i}}}||_2^2 + ||{{{\bf h}}_{{2_i}}}||_2^2}}, \end{equation}

(35)

where $||\cdot ||_2$ is the $l_2$-norm. There are 2m variables, while we can make use of the equation ${\lambda ^{(m)} _i}b_i^{(m)} = 0$. That is, if the trace term on the numerator is less than 0, we set $b^{(m)} _ i = 0$, otherwise, we have,

\begin{equation} b_i^{\left(m \right)} = \frac{{tr\left[ {{{{\bf W}}^{\left(m \right)}}\left({{{{\bf h}}_{{1_i}}}{{\bf h}}_{{1_i}}^T} \right)} \right] + tr\left[ {{{{\bf M}}^{\left(m \right)}}\left({{{{\bf h}}_{{2_i}}}{{\bf h}}_{{2_i}}^T} \right)} \right]}}{{||{{{\bf h}}_{{1_i}}}||_2^2 + ||{{{\bf h}}_{{2_i}}}||_2^2}}. \end{equation}

(36)

4.3 Complexity Analysis

For the computational complexity, we analyze the computational complexity in each subproblem. In $\mathbf {H}$ subproblem, the majority cost is to compute the gradient $\nabla \mathcal {J}({{\mathbf {H}_l}})$ in Equation (9). Let r denote the dimension of common subspace, i.e., the rank of the base matrix $\mathbf {H}$. Thus, the complexity for updating $\mathbf {H}$ is ${\rm O}({2{K_1}M \cdot {n^2}r})$, where ${{K_1}}$ is the number of iterations of line 4–8 in Algorithm 1 and $n= \max \lbrace N^{(m)}\rbrace$ is the number of users in the largest social network. Note that, $r \ll n$ and ${{K_1}}$ is a small constant owing to the ${\rm O}({\tfrac{1}{{{K_1^2}}}})$ convergence rate of the Nesterov’s gradient descent. In $\mathbf {B}$ subproblem, we reformulate the $\mathbf {B}$ subproblem into a quadratic programming problem with nonnegative constraints via the trick of rank-one matrix and thus it is easy to figure out that the complexity for updating $\mathbf {B}$ via Equation (31) is ${\rm O}({M\cdot {n^2}r})$. Thus, we conclude that the overall computational complexity of Algorithm 1 is in the rank of ${\rm O}({n^2})$ as the dimension r, the number of social networks M and the iteration number ${{K_1}}$ are all small constants. Note that, the $O(n^2)$ of MC$^2$ is in the same order as other baseline of matrix optimization, e.g., MASTER, but owns better performance. MC$^2$ has larger computational complexity for each iteration than the deep learning methods, e.g., DeepLink, but owns much faster convergence speed owing to the well-designed algorithm based on Lipschitz continuous of the mathematical structure and Nesterov gradient descent.

For the space complexity, MC$^2$ largely reduces the complexity of itself to $O(n^2)$ owing to the structure of diagonal blocks. That is, MC$^2$ only requires the space of the input network and its complexity is in the same order as other typical matrix optimization methods. Fortunately, as the matrices of the real-world networks are usually sparse, the off-the-shelf methods for tackling the sparse matrix can be leveraged to further optimize the space requirement.

5 Experiments

In this section, first, we will present the experimental setup, including the datasets, comparison methods, parameter setup, and evaluation metric. Then, we will present the performance of different network alignment methods, and show that the proposed model MC$^2$ outperforms the state-of-the-art methods.

5.1 Experimental Setups

5.1.1 Datasets.

In this article, we use two real-world datasets, i.e., Twitter-Foursquare dataset and Weibo-Douban dataset, to evaluate the proposed MC$^2$ model. The statistics of these datasets are given in the Table 2. The details of the datasets are introduced as follows:

Table 2.

Dataset	#(Nodes)	#(Links)	#(Anchor Users)
Twitter	5,167	164,660	2,858
Foursquare	5,240	76,972	2,858
Weibo	5,234	102,566	3,088
Douban	5,234	238,235	3,088
DBLP	24,352	273,476	18,255
AMiner	26,386	316,565	18,255

Table 2. Statistics of the Datasets

–

Twitter-Foursquare: Twitter¹ is one of the most popular social networking platforms where users can post tweets to share their opinions. Foursquare² is a popular location sharing platform where users are encouraged to share information about their current location with others. The datasets are widely used in the studies [8, 14, 37, 45]. There are 5,167 Twitter identities and 5,240 Foursquare identities, where 2,858 pairs of identities are matched as the ground truth.

–

Weibo-Douban: Sina Weibo³ is a famous Chinese social platform. Douban⁴ is a social service where people can share comments on movies, books, and musics in China. We collected the ground truth from an open data set website⁵ We finally have 5,234 Weibo identities and 5,234 Douban identities with 3,088 pairs of matched identities.

–

DBLP-AMiner: Both DBLP⁶ and AMiner⁷ are collections of academic articles, and can be regarded as academic social networks. We use a partially aligned subset for the experimental evaluation. Specifically, authors are the nodes, and nodes link each other if the corresponding authors have co-authorship in the article collection.

The existing datasets contain information of two networks and cannot be used for multiple network alignment. Thus, we randomly divide each network into two sub-networks to obtain the datasets of multiple networks.

5.1.2 Data Preprocessing.

For Twitter users, the dataset contains two aspects of information. One is structure information, i.e., the following relationship and followed relationship. The other is attribute information, i.e., name, location, and education. For Foursquare users, the dataset contains similar information. Without loss of generality, in this article, we only leverage the common data in both datasets. Besides, we expand smaller networks when the dimensions of the input networks are different.

Note that, the network adjacency matrix can only characterize the first-order proximity, which just reflects the local pairwise proximity between nodes. Furthermore, the first-order proximity is hardly to fully capture the hidden pairwise proximity between nodes. As a result, similar to the study [20], this article choose the higher order structure proximity to model the structure strength between nodes.

5.1.3 Comparison Methods.

To evaluate the proposed model, we chose several state-of-the-art methods for comparison. Note that, the comparison methods cover the methods of unsupervised, semi-supervised, and supervised fashions, which are briefly summarized as follows:

–

UMNA [37]: We employ the UMNA model as an unsupervised UIL baseline, which is a two-step network alignment framework. It matches anonymized networks by minimizing the friendship inconsistency and selects anchor links which can lead to the maximum confidence scores across anonymized social networks.

–

UUIL$_\text{gan}$ [11]: This is an unsupervised approach based on minimizing the distance between the distributions of user identities in two social networks. It leverages WGAN [1] to transfer embeddings into the same domain to perform unsupervised bi-network alignment. It needs to align multiple social networks in a pairwise way.

–

MASTER [24]: This is a a semi-supervised user alignment model. It constructs a common subspace of multiple social networks via uni- and joint-embedding, incorporating both attribute and structural features. Note that, we only consider the structure information for fair comparisons.

–

PALE [17]: This is a supervised user alignment model. It employs network embedding with awareness of observed anchor links as supervised information to capture structural regularities and further learns a stable cross-network mapping for predicting anchor links.

–

DeepLink [45]: This is a supervised user alignment model. It leverages dual learning to improve common subspace construction to perform alignment via a proposed neural network framework.

5.1.4 Parameter Setup.

For the proposed MC$^2$ model, we set the dimension r of the common space to 512 and the step size ${\alpha _{0}}$ for matrix optimization to 0.1 as default. The base matrix $\mathbf {H}$ of the common space and the auxiliary matrix ${Y_0}$ are initialized as random $n \times r$ nonnegative matrices, where n is the number of users in the largest social network. The perturbation coefficient ${\beta _0}$ is set to $\frac{e}{L}$, where L is the Lipschitz constant, following the mathematical studies [3, 19]. For comparison methods (DeepLink, PALE, MASTER, and UUIL$_\text{gan}$), we employ the configuration of the best performance for each competitive method, as reported in their original articles. Note that, we separate the datasets for training and testing in the same way for these supervised methods.

5.1.5 Evaluation Metric.

To quantify the alignment performance, we utilize the precision$@K$ as the evaluation metric. The precision$@K$ is calculated by

\begin{equation*} \frac{1}{N} \sum _{i=1}^{N} \mathbb {I}_i \lbrace success@K\rbrace , \end{equation*}

where $\mathbb {I}_i \lbrace success@K\rbrace$ indicates whether or not the ground truth exists in the candidate list of length K and N is the number of testing anchors. The higher the value of precision$@K$, the better the performance.

5.2 Experimental Results

We repeated each experiment for 10 times and report the mean value and its $95\%$ confidence interval. We present the main results of all comparison methods in term of precision in Table 3.

Table 3.

Method	Twitter-Foursquare		Weibo-Douban		DBLP-AMiner
Method	Bi-Network	Multi-Network	Bi-Network	Multi-Network	Bi-Network	Multi-Network
UUIL$_\text{gan}$	$33.56\pm 0.07$	$27.05\pm 0.09$	$45.03\pm 0.08$	$36.23\pm 0.06$	$48.17\pm 0.05$	$42.55\pm 0.07$
UMNA	$33.27\pm 0.04$	$30.03\pm 0.06$	$47.00\pm 0.04$	$41.33\pm 0.03$	$43.05\pm 0.02$	$41.68\pm 0.03$
MASTER	$43.76\pm 0.06$	$36.57\pm 0.05$	$55.81\pm 0.05$	$47.82\pm 0.04$	$55.65\pm 0.03$	$54.12\pm 0.10$
PALE	$45.29\pm 0.04$	$37.96\pm 0.06$	$56.88\pm 0.06$	$49.33\pm 0.03$	$53.18\pm 0.09$	$51.33\pm 0.02$
DeepLink	$47.13\pm 0.06$	$39.56\pm 0.04$	$58.39\pm 0.05$	$51.50\pm 0.05$	$62.72\pm 0.03$	$59.67\pm 0.04$
MC$^2$ (Ours)	$\mathbf {49.64\pm 0.03}$	$\mathbf {43.63\pm 0.02}$	$\mathbf {59.89\pm 0.04}$	$\mathbf {52.11\pm 0.03}$	$\mathbf {63.50\pm 0.11}$	$\mathbf {61.33\pm 0.05}$

Table 3. Alignment Results on Twitter-Foursquare, Weibo-Douban, and DBLP-AMiner Datasets in Terms of Precision (%)

The best results are in bold, and the second underlined.

5.2.1 Comparison with Unsupervised Methods.

We compare our model with unsupervised baselines. We evaluate the performance under various settings. For both bi-network alignment and multi-network alignment, we report the precision$@20$ varying overlap rate $\eta$ from 10 to 30 in Figures 3(a), (c) and 4(a), (c). For both datasets, the overlap rate $\eta$ is measured by $\frac{2A}{X+Y}$, where A is the number of anchor users, and X and Y are the numbers of two social networks to be aligned. A $\eta$-overlap dataset is generated by randomly deleting users. Generally, the precision rises as the overlap rate increases. Fixing the overlap rate, $\eta =30 \%$, we report the precision$@K$ varying K from 5 to 25 in Figures 3(b), (d) and 4(b), (d). As a baseline, UUIL$_\text{gan}$ obtains the lowest precision$@K$ scores in all the datasets. The reason lies in that, assuming independent distribution of the input vectors, UUIL$_\text{gan}$ cannot leverage relational data in the network effectively to perform social network alignment. The proposed MC$^2$ model achieves $1.9 \times$ precision$@5$ of UUIL$_\text{gan}$ under $\eta =30 \%$. It is interesting to observe that the precision gain of proposed MC$^2$ model against its competitors under multi-network case is greater than that of the bi-network case. The reason behind is that the proposed MC$^2$ model effectively addresses the problem of multiple social network alignment in a unified approach instead of conducting pairwise alignment exhaustively. Obviously, MC$^2$ model presents evident superiority. This is expected and the reason lies in that the proposed MC$^2$ infers a common subspace where social accounts are naturally aligned. However, its competitors incur the global inconsistence, leading to the drop in precision.

Fig. 3.

Fig. 4.

5.2.2 Comparison with Supervised Methods.

We compare the proposed method with the state-of-the-art supervised and semi-supervised methods. The results are shown in Figures 3 and 4. Similar to the unsupervised methods, the performance of the supervised methods (PALE and DeepLink) and semi-supervised method (MASTER) increase as the overlap increases. Besides, the precision of the supervised and semi-supervised methods is higher than that of unsupervised methods (KNN and UUIL$_{\text{gan}}$). Through comparison Figures 3 and 4, we find that the performance on Douban-Weibo dataset is better than that of Twitter-Foursquare dataset. This is mainly because that the complexity of the former dataset is greater than that of the latter. On both datasets, it is still obvious that the proposed MC$^2$ model outperforms the two unsupervised baseline methods over all the datasets.

5.2.3 On the Efficiency.

In order to investigate on the efficiency of the proposed MC$^2$, we examine the running time for each method in practice with different scales of datasets. We summarize the running time of each baseline on all the datasets in Table 4. Note that, we utilize the running time of MC$^2$ as a time unit of each dataset for clarity. As shown in Table 4, the proposed MC$^2$ achieves the best or the second best running time in practice, showing the efficiency of MC$^2$. Incorporating the results shown in Table 3, we can conclude that MC$^2$ is able to output promising alignment results with light overhead.

Table 4.

Method	Twitter-Foursquare		Weibo-Douban		DBLP-AMiner
Method	Bi-Network	Multi-Network	Bi-Network	Multi-Network	Bi-Network	Multi-Network
UUIL$_\text{gan}$	$1.14\pm 0.10$	$\mathbf {0.97\pm 0.09}$	$1.03\pm 0.15$	$1.23\pm 0.11$	$1.26\pm 0.12$	$1.05\pm 0.11$
UMNA	$\mathbf {0.87\pm 0.05}$	$1.02\pm 0.04$	$1.12\pm 0.07$	$1.09\pm 0.05$	$1.06\pm 0.10$	$\mathbf {0.89\pm 0.08}$
MASTER	$1.22\pm 0.01$	$1.18\pm 0.02$	$1.27\pm 0.03$	$1.24\pm 0.03$	$2.15\pm 0.01$	$1.33\pm 0.02$
PALE	$1.07\pm 0.02$	$1.33\pm 0.06$	$1.45\pm 0.10$	$1.17\pm 0.06$	$1.53\pm 0.05$	$1.67\pm 0.09$
DeepLink	$1.58\pm 0.09$	$1.67\pm 0.07$	$1.08\pm 0.03$	$1.20\pm 0.05$	$1.96\pm 0.07$	$1.52\pm 0.06$
MC$^2$ (Ours)	1	1	$\mathbf {1}$	$\mathbf {1}$	$\mathbf {1}$	1

Table 4. Running Time on Twitter-Foursquare, Weibo-Douban, and DBLP-AMiner Datasets

The best results are in bold, and the second underlined.

5.2.4 Parameter Sensitivity.

We study the parameter sensitivity on the step size in the optimization and the dimension of common subspace.

(i) For the step size, we vary the value of $\alpha _0$ in [0.1, 0.03, 0.01, 0.008, 0.003], and summarize the alignment results in Table 5. It is shown that a large step size usually results in the suboptimal alignment. The underlying reason is that the optimum is possibly missed if a step is much too large, and thus the solution tends to spin around a sub-optimum instead. However, it is noteworthy that a small step size inevitably enlarges the convergence time, and is powerless to get rid of the sub-optimum. Thus, a moderate value of step size, saying $\alpha _0=0.03$, is more preferable.

Table 5.

Step Size	Twitter-Foursquare		Weibo-Douban
Step Size	Bi-Network	Multi-Network	Bi-Network	Multi-Network
$\alpha _0 = 0.1$	$49.64\pm 0.03$	$43.63\pm 0.02$	$59.89\pm 0.04$	$52.11\pm 0.03$
$\alpha _0 = 0.03$	$50.03\pm 0.02$	$44.93\pm 0.01$	$60.16\pm 0.05$	$52.23\pm 0.04$
$\alpha _0 = 0.01$	$50.52\pm 0.05$	$45.18\pm 0.10$	$60.48\pm 0.07$	$52.56\pm 0.01$
$\alpha _0 = 0.008$	$\mathbf {50.92\pm 0.01}$	$45.35\pm 0.06$	$\mathbf {60.55\pm 0.02}$	$\mathbf {52.72\pm 0.03}$
$\alpha _0 = 0.003$	$50.97\pm 0.07$	$\mathbf {45.48\pm 0.03}$	$60.51\pm 0.08$	$52.67\pm 0.05$

Table 5. Parameter Sensitivity on Twitter-Foursquare and Weibo-Douban Datasets in Terms of Precision (%)

The best results are in bold, and the second underlined.

(ii) For the dimension, we vary the value of r in [$2^6$, $2^7$, $2^8$, $2^9$, $2^{10}$], and show the alignment results in Figure 5. As shown in the figures, the performance of MC$^2$ is undesirable when dimension r is low, and tends to be better as r increases. The reason lies in that when the dimension is low, the embeddings tend to be entangled together, leading to errors in alignment. However, we cannot receive further precision gain when the dimension is large enough. Thus, it is suggested that a moderate value of dimension number, saying 512 in this case, leads to the promising alignment.

Fig. 5.

6 Related Work

We briefly summarize the advances on social network alignment. We categorize the existing studies in the literature into two classes: supervised alignment and unsupervised alignment.

6.1 Supervised Alignment

The pioneering study [33] formulate the problem of social network alignment in the supervised setting. Researchers mainly focus on aligning a pair of social networks. Following the study [33], Mobius [34] explores the latent pattern in user screen names for linking the identities across social networks. As one of the pioneers, Kong et al. [8] proposed to leverage heterogeneous attributes and formulated a stable matching problem for anchor link prediction. Similarly, Hydra [15] incorporates heterogeneous attributes to identify the latent consistency of the social accounts via a multi-objective optimization. Motivated by the recent success of network embedding [4, 22, 27], others [14, 17, 26, 45] exploited the network topology for social network alignment. PALE [17] predicts anchor links across social networks via a novel embedding-matching framework. Recently, DeepLink [45] leverages dual learning to facilitate common subspace construction to perform alignment via a is a neural framework. Moreover, some studies [2, 30, 36] investigate to exploit both structure and heterogeneous attribute information.

Actually, individuals nowadays often join in multiple social networks. Hence, researchers recently begin to investigate this general problem of multiple social network alignment, which is inherently different from bi-network alignment due to the global inconsistence. A few methods [18, 24, 40] attempt to align multiple social networks but heavily rely on the supervision. Specifically, COSNET [40] considers the local as well as global consistency in aligning multiple social networks via optimizing the energy-based model with a proposed sub-gradient algorithm. ULink [18] links user identities from multiple social networks by modeling user attributes in the latent user space of intrinsic identities. MASTER [24] performs uni- and joint-embedding to construct a common subspace of multiple social networks in a semi-supervised approach. Recently, considering the inherent geometry of social networks, the researchers make an effort on the hyperbolic space and achieve promising results [25, 29].

6.2 Unsupervised Alignment

Acquiring the supervision information involves extensive human annotation, which is often intractable in the real-world scenario. There are a few methods [11, 32, 44] in the literature attempting to align social network without supervision. UUIL [11], as one of the pioneers, leverages generative adversarial network to minimize the distance between the distributions of user identities for aligning two social networks without supervision. Factoid Embedding [32] encode the heterogeneous attributes of social accounts to identify the underlying identity of the individual from a pair of social networks. CoLink [44] employs a novel co-training algorithm manipulating the attribute-based model and the relationship-based model, and makes two models reinforce each other iteratively with an unsupervised approach. However, unfortunately, they cannot directly extend to align multiple social networks. Zhang et al. [39] proposed a novel unsupervised approach, UMNA, to align multiple social networks via conducting multiple pairwise alignment under the transitive rule. Furthermore, Liao et al. [13] proposed a spectral clustering method, IsoRankN, to leverage the structural similarity between protein networks to perform the community (with several nodes) alignment. The difference lies in that, in our article, we focus on the user (single node) alignment, instead of the community alignment. Recently, with the advances of neural graph learning [41, 42], Huynh et al. [7] leverage graph convolutional network for social network alignment.

The survey [23] gives a comprehensive summary. We summarize most of the existing methods in Table 6. Obviously, the essential distinction between our work and all these studies above lies in that we for the first time address the problem of unsupervised multiple social network alignment in a unified approach and design the novel unsupervised MC$^2$ model, filling the technical gap in the social network alignment.

Table 6.

Model	Bi-network	Multi-network
Supervised	Mobius [34], MNA [8], DeepLink [45],
	MAH [26], Hydra [15], MEgo2Vec [36]	ULink [18], COSNET [40],
	BASS [2], IONE [14], Multilevel [39]	MASTER [24].
	PALE [17], LHNE [30], Adversarial [12].
Unsupervised	UMNA [37], FE [32], CoLink [44]
Unsupervised	PCT [38], UUIL [11],

Table 6. A Summary of Existing Social Network Alignment Models

7 Conclusion

In this article, we propose to study the problem of unsupervised multiple social network alignment. To bridge this gap, we propose the novel Matrix factorization model with diagonal Cone under orthogonal Constraint (MC$^2$) model which effectively embeds and aligns multiple social networks without supervision. In MC$^2$ model, we design a constrained matrix optimization to infer the common subspace, from different social networks, where these networks are embedded for unsupervised alignment. To address the nonconvex optimization in MC$^2$ model, we design an efficient alternating algorithm leveraging its functional property to approach its optimal. We evaluated the MC$^2$ model on two real-world social network datasets. The experimental results demonstrated the proposed MC$^2$ model outperforms the state-of-the-art methods.

Acknowledgments

Sincere thank to the constructive comments from the reviewers.

Footnotes

https://twitter.com.

https://foursquare.com.

http://weibo.com/.

⁴

https://www.douban.com/.

⁵

http://apex.sjtu.edu.cn/.

⁶

https://dblp.org.

⁷

https://www.aminer.cn.

References

[1]

Ishaan Gulrajani, Faruk Ahmed, Martín Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of Wasserstein GANs. In Advance in NeurIPS, 5767–5777.

Notation	Definition
\(\lbrace {{S^{(m)}}} \rbrace _{m=1}^M\)	the social network set
\({ \mathbf {D} }\)	the proximity matrix of the social network
\({ \mathbf {W} }\)	the structure proximity matrix of the social network
\({ \mathbf {M} }\)	the attribute proximity matrix of the social network
\(\mathbf {B}\)	the matrix of eigenvalue
\(\mathbf {H}\)	the base matrix of common subspace
\(\mathbf {H}_1\)	the base matrix of common structure subspace
\(\mathbf {H}_2\)	the base matrix of common attribute subspace
\(\mathbf {V}\)	the matrix of embeddings
r	the dimension of common subspace
n	the account number of the largest network
L	the Lipschitz constant

Method	Twitter-Foursquare		Weibo-Douban		DBLP-AMiner
Method	Bi-Network	Multi-Network	Bi-Network	Multi-Network	Bi-Network	Multi-Network
UUIL\(_\text{gan}\)	\(33.56\pm 0.07\)	\(27.05\pm 0.09\)	\(45.03\pm 0.08\)	\(36.23\pm 0.06\)	\(48.17\pm 0.05\)	\(42.55\pm 0.07\)
UMNA	\(33.27\pm 0.04\)	\(30.03\pm 0.06\)	\(47.00\pm 0.04\)	\(41.33\pm 0.03\)	\(43.05\pm 0.02\)	\(41.68\pm 0.03\)
MASTER	\(43.76\pm 0.06\)	\(36.57\pm 0.05\)	\(55.81\pm 0.05\)	\(47.82\pm 0.04\)	\(55.65\pm 0.03\)	\(54.12\pm 0.10\)
PALE	\(45.29\pm 0.04\)	\(37.96\pm 0.06\)	\(56.88\pm 0.06\)	\(49.33\pm 0.03\)	\(53.18\pm 0.09\)	\(51.33\pm 0.02\)
DeepLink	\(47.13\pm 0.06\)	\(39.56\pm 0.04\)	\(58.39\pm 0.05\)	\(51.50\pm 0.05\)	\(62.72\pm 0.03\)	\(59.67\pm 0.04\)
MC\(^2\) (Ours)	\(\mathbf {49.64\pm 0.03}\)	\(\mathbf {43.63\pm 0.02}\)	\(\mathbf {59.89\pm 0.04}\)	\(\mathbf {52.11\pm 0.03}\)	\(\mathbf {63.50\pm 0.11}\)	\(\mathbf {61.33\pm 0.05}\)

Method	Twitter-Foursquare		Weibo-Douban		DBLP-AMiner
Method	Bi-Network	Multi-Network	Bi-Network	Multi-Network	Bi-Network	Multi-Network
UUIL\(_\text{gan}\)	\(1.14\pm 0.10\)	\(\mathbf {0.97\pm 0.09}\)	\(1.03\pm 0.15\)	\(1.23\pm 0.11\)	\(1.26\pm 0.12\)	\(1.05\pm 0.11\)
UMNA	\(\mathbf {0.87\pm 0.05}\)	\(1.02\pm 0.04\)	\(1.12\pm 0.07\)	\(1.09\pm 0.05\)	\(1.06\pm 0.10\)	\(\mathbf {0.89\pm 0.08}\)
MASTER	\(1.22\pm 0.01\)	\(1.18\pm 0.02\)	\(1.27\pm 0.03\)	\(1.24\pm 0.03\)	\(2.15\pm 0.01\)	\(1.33\pm 0.02\)
PALE	\(1.07\pm 0.02\)	\(1.33\pm 0.06\)	\(1.45\pm 0.10\)	\(1.17\pm 0.06\)	\(1.53\pm 0.05\)	\(1.67\pm 0.09\)
DeepLink	\(1.58\pm 0.09\)	\(1.67\pm 0.07\)	\(1.08\pm 0.03\)	\(1.20\pm 0.05\)	\(1.96\pm 0.07\)	\(1.52\pm 0.06\)
MC\(^2\) (Ours)	1	1	\(\mathbf {1}\)	\(\mathbf {1}\)	\(\mathbf {1}\)	1

Step Size	Twitter-Foursquare		Weibo-Douban
Step Size	Bi-Network	Multi-Network	Bi-Network	Multi-Network
\(\alpha _0 = 0.1\)	\(49.64\pm 0.03\)	\(43.63\pm 0.02\)	\(59.89\pm 0.04\)	\(52.11\pm 0.03\)
\(\alpha _0 = 0.03\)	\(50.03\pm 0.02\)	\(44.93\pm 0.01\)	\(60.16\pm 0.05\)	\(52.23\pm 0.04\)
\(\alpha _0 = 0.01\)	\(50.52\pm 0.05\)	\(45.18\pm 0.10\)	\(60.48\pm 0.07\)	\(52.56\pm 0.01\)
\(\alpha _0 = 0.008\)	\(\mathbf {50.92\pm 0.01}\)	\(45.35\pm 0.06\)	\(\mathbf {60.55\pm 0.02}\)	\(\mathbf {52.72\pm 0.03}\)
\(\alpha _0 = 0.003\)	\(50.97\pm 0.07\)	\(\mathbf {45.48\pm 0.03}\)	\(60.51\pm 0.08\)	\(52.67\pm 0.05\)

Abstract

1 Introduction

2 Problem Definition

3 MC2: Modeling

3.1 Common Bases Inferring

3.2 Common Subspace Embedding

3.3 Alignment Identification

4 MC2: Optimization

4.1 H Subproblem

4.2 B Subproblem

4.3 Complexity Analysis

5 Experiments

5.1 Experimental Setups

5.1.1 Datasets.

5.1.2 Data Preprocessing.

5.1.3 Comparison Methods.

5.1.4 Parameter Setup.

5.1.5 Evaluation Metric.

5.2 Experimental Results

5.2.1 Comparison with Unsupervised Methods.

5.2.2 Comparison with Supervised Methods.

5.2.3 On the Efficiency.

5.2.4 Parameter Sensitivity.

6 Related Work

6.1 Supervised Alignment

6.2 Unsupervised Alignment

7 Conclusion

Acknowledgments

Footnotes

References

Cited By

Index Terms

Recommendations

Unsupervised Large-Scale Social Network Alignment via Cross Network Embedding

Social network alignment: a bi-layer graph attention neural networks based method

REBORN: Transfer learning based social network alignment

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations

3 MC²: Modeling

4 MC²: Optimization