Keywords

1 Introduction

Techniques for identifying the groupings in social networks and then analyzing these groupings for further use have become a key research topic in sociology—this is referred to as “community detection”. Through community detection, we can reduce both the network size and the computational complexity of the algorithm used to process it, thereby improving the accuracy of the analysis. However, most of the methods are performed without privacy protection, and the results of community detection are output in the form of the node-set. In order to protect the privacy of users, it is necessary to protect the privacy of community detection.

Existing social network differential privacy protection schemes focus on a centralized model; they assume that third-party data collectors who possess the information are trustworthy, which is a practical assumption of real-world applications. Therefore, we use a local differential privacy (LDP) model to protect the privacy of social networks, by releasing the sanitized graph at local devices after differential privacy processing. The published data mask the interconnections between nodes and preserve the characteristics of the network structure, enabling researchers to achieve a reasonable balance between the utility of the algorithm and its ability to protect privacy when data are used for feature analysis and data mining.

2 Related Work

2.1 Local Differential Privacy

The differential privacy protection methods applied to social networks can be roughly divided according to two approaches. The first approach focuses on publishing certain types of noisy mining results; these include degree distributions, subgraph counts, frequent graphics patterns, and cut queries [4, 5]. This approach uses the properties of the original graph for general purposes, perturbs the graph, and publishes the aggregated results. It is theoretically proven that noise addition ensures strong privacy preservation. The second approach is to publish the entire social network for general purposes [6, 7]. These methods differ in the intermediate structures used for publishing and the corresponding definitions of differential privacy.

2.2 Community Detection with Differential Privacy

The task of finding node groups using connection relationships in the network is referred to as community detection. The Louvain algorithm [9] is based on multi-level optimization modularity and performs well in terms of efficiency and effectiveness. Moreover, the Louvain algorithm can identify hierarchical community structures; thus, it is considered one of the best community detection algorithms.

Recently, researchers have applied differential privacy for community detection. Nguyen et al. [10] chose the Louvain method as the backend community detection method of the input perturbation scheme and proposed the LouvainDP method. Ye et al. [11] proposed the first LDP-enabled graph metric estimation framework for a variety of graph analysis tasks, which address data correlation among nodes by two efficient perturbation algorithms based on adjacency bit vector and node degree.

3 Preliminaries

3.1 Louvain Algorithm

The Louvain algorithm [9] is based on multi-level optimization modularity, which is efficient to identify hierarchical community structures. According to [9], when assigning node \( i \) to a community, the modularity of the community changes as

$$ \begin{aligned} \Delta Q = & \left[ {\frac{{\sum_{in} + k_{i,in} }}{2m} - \left( {\frac{{\mathop \sum \nolimits_{tot} + k_{i} }}{2m}} \right)^{2} } \right] - \left[ {\frac{{\sum_{in} }}{2m} - \left( {\frac{{\sum_{tot} }}{2m}} \right)^{2} - \left( {\frac{{k_{i} }}{2m}} \right)^{2} } \right] \\ = & \frac{1}{2m}\left( {k_{i,in} - \frac{{\sum_{tot} k_{i} }}{m}} \right) = \frac{{k_{i,in} }}{2m} - \frac{{\sum_{tot} k_{i} }}{{2m^{2} }}. \\ \end{aligned} $$
(1)

3.2 Local Differential Privacy

Definition 1 \( (\upepsilon\textbf{-Differential Privacy}) \):

Given a random algorithm \( {\mathcal{A}} \), let \( {\text{S}} \) represent the set of all output spaces of \( {\mathcal{A}} \) on the two neighbor graphs \( {\text{G}}_{1} \) and \( {\text{G}}_{2} \)(which differ at most one element). The algorithm \( {\mathcal{A}} \) satisfies \( \epsilon \)-differential privacy if:

$$ \begin{array}{*{20}c} {\Pr \left[ {{\mathcal{A}}\left( {G_{1} } \right) \in S} \right] \le e^{\epsilon } \times \Pr \left[ {{\mathcal{A}}\left( {G_{2} } \right) \in S} \right]} \\ \end{array} $$
(2)

Theorem 1 (Laplace Mechanism):

For any function \( f:{\text{G}} \to {\mathbb{R}}^{\text{d}} \), the mechanism \( {\mathcal{A}} \)

$$ \begin{array}{*{20}c} {AG = fG + \left( {Lap_{1} \left( {\frac{\Delta f}{\epsilon }} \right), \ldots ,Lap_{d} \left( {\frac{\Delta f}{\epsilon }} \right)} \right)} \\ \end{array} $$
(3)

provides \( \epsilon \)-differential privacy, where \( Lap_{i} \left( {\frac{\Delta f}{\epsilon }} \right) \) represents Laplacian independent and identically distributed variable samples with scale parameter \( \frac{\Delta f}{\epsilon}. \)

Theorem 2 (Exponential Mechanism):

Given a function \( {\text{f}}:\left( {{\text{G}} \times {\text{OS}}} \right) \to {\mathbb{R}} \), where \( {\text{OS}} \) is output space. For a graph \( {\text{G}} \), the mechanism \( {\mathcal{A}} \) that samples an output \( {\text{O}} \) with a probability proportional to \( { \exp }\left( {\frac{{\epsilon \cdot {\text{f}}\left( {{\text{G}},{\text{O}}} \right)}}{{2\Delta {\text{f}}}}} \right) \) satisfies \( \epsilon \)-differential privacy.

Theorem 3 (Sequential Composition):

Let each \( {\mathcal{A}}_{\text{i}} \) provide \( \epsilon_{\text{i}} \)-differential privacy. A sequence \( {\mathcal{A}}_{\text{i}} \left( {\text{G}} \right) \) over the entire graph \( {\text{G}} \) provides \( \sum \epsilon_{\text{i}} \)-differential privacy.

4 Problem Solution

4.1 Differentially Private Louvain Algorithm

We propose a solution to the privacy problems in community detection. Our scheme is divided into two phases. First, the social network is partitioned into multiple independent communities by adopting the community detection algorithm. Then, the privacy of the edges within each independent community is protected.

Algorithm 1 first calculates the sensitivity of the social graph according to its number of nodes (Line 1), and we will explain in detail how to calculate \( \Delta f \) later. Based on the first phase of the Louvain algorithm, each node in the original graph is treated as a separate community (Line 2). Then, according to the node sequence, mining the neighboring nodes/communities of each node (lines 3–4). Then moving the node to different communities and calculating the current modularity gain. Finally, we introduce the exponential mechanism, and the maximum modularity gain is selected, otherwise, it is unchanged (lines 5–9). When the movement of all nodes no longer causes changes of the modularity gain, the first round is completed and the results of the first round of community detection are returned (lines 10–12).

4.2 Edge Probability Perturbation

Community detection makes edge connection within the same community more salient and therefore requires additional protection. We use the same model in [15], which converts each community into an HRG model, then combined the generated HRG model with the edge-connection probability by adding Laplace noise in Algorithm 2.

After we convert the community into an HRG model, we calculate the connection probability of each internal node separately. Further, we introduce the Laplacian mechanism. Subsequently, for any two nodes \( i \) and \( j \) in the community, we find the lowest common ancestor \( r \) and establish a connection between the two nodes \( i \) and \( j \) using the connection probability of the internal node \( r \). Because the inter-community edges are relatively sparse and have low correlations, these direct connections do not provide additional privacy protection.

4.3 Sensitivity Analysis

The sensitivity \( \Delta f \) should be analyzed to complete the selection probability equation, where \( G^{\prime} \) is the neighbor of \( G \). The neighbor of a graph is the graph obtained by changing only one edge. Because the addition or removal operations are similar, only the former is considered in our proof. There are two cases to consider: (1) The connection is an edge inside the community \( P \). (2) The connection is an edge between the community \( P \) and \( S \). Finally, we obtain \( \Delta f \le \frac{3}{m} \).

5 Experiment Evaluation

5.1 Experimental Setup

For comparison purposes, two techniques that are similar to our method were implemented as references. They are the basic differential privacy algorithms for the HRG model, which use the same privacy criteria as [13] and the algorithm perturbation presented in [10]; the results of the previously centralized differential privacy scheme and the algorithm perturbation scheme and our proposed scheme are labeled as “DP”, “MD” and “LLDP” respectively. We performed experiments on two real datasets to evaluate our algorithm. The details of datasets are shown in Table 1.

Table 1. Statistics of the datasets.

5.2 Experiment Evaluation of Community Detection

The real social network dataset we chose did not have standard community detection results. Thus, we chose the output of the Louvain algorithm as a standard control because the evaluation of the data had been performed in [15], and the Louvain method had been proven there to provide high-quality results.

The partitioning results of different privacy budgets are shown in Fig. 1. For both datasets, the results given by our algorithm increased with the increase of the privacy budget and gradually stabilized after reaching a certain value. This provides an effective reference for the selection of a privacy budget. In general, the effectiveness of algorithm perturbations is higher than that of input perturbations. The input disturbance used for comparison also followed a similar trend; however, the overall value was low, which is significantly different from its typical value. Newman [8] suggested that the value of \( Q \) in a general network is between 0.3 and 0.7, which can explain a good community structure. Therefore, although the modularity of the results obtained by our algorithm was lower than the real situation, it still retained an effective community structure.

Fig. 1.
figure 1

The modularity of two social network datasets under different ϵ

6 Conclusion

We analyzed the privacy problems of community detection can lead to and proposed a differentially private detection procedure based on the Louvain algorithm. Moreover, we proposed to further protect the relational data within the community by converting the individual community into a subgraph of an HRG model and subsequently calculating the edge connection probability by adding Laplacian noise. Experimental results indicated an improved performance on real data.