Abstract
In recent years, Geo-Indistinguishability (GeoI) has been increasingly explored for protecting location privacy in location-based services (LBSs). GeoI is considered a theoretically rigorous location privacy notion since it extends differential privacy to the setting of location privacy. However, GeoI does not consider the road network, which may cause insufficiencies in terms of both privacy and utility for LBSs over a road network. In this paper, we first empirically evaluate the privacy guarantee and the utility loss of GeoI for LBSs over road networks. We identify an extra privacy loss when adversaries have the knowledge of road networks and the degradation of LBS quality of service. Second, we propose a new privacy notion, Geo-Graph-Indistinguishability (GeoGI), for protecting location privacy for LBSs over a road network and design a Graph-Exponential mechanism (GEM) satisfying GeoGI. We also show the relationship between GeoI and GeoGI to explain theoretically why GeoGI is a more suitable privacy notion over road networks. Finally, we evaluate the empirical privacy and utility of the proposed mechanism in real-world road networks. Our experiments confirm that GEM achieves higher utility for LBSs over a road network than the planar Laplace mechanism for GeoI under the same empirical privacy level.
This work was supported by JSPS KAKENHI Grant Numbers (S) No. 17H06099, (A) No. 18H04093, (C) No. 18K11314.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In recent years, the spread of smartphones and the improvement of GPS has led to a growing use of location-based services (LBSs). While such services have provided enormous benefits to individuals and society, the exposure of the user’s location raises privacy issues. By using the location information, it is easy to identify sensitive personal information, such as that pertaining to home and family. Many methods for protecting location information have been proposed in the past decade. Most of these methods perturb the true location by using a location privacy-preserving mechanism before sending it to an LBS provider or sharing it with a third party. A mechanism takes a true location as input and outputs a perturbed location that follows a probability distribution over a location domain.
Andr\(\acute{\mathrm{e}}\)s et al. [10] defined a formal notion of location privacy based on the well-known concept of differential privacy. The definition is called geo-indistingui-shability (GeoI), and an output of a mechanism achieving it guarantees indistinguishability of the true location from other locations, that is, strong privacy protection. This notion is derived from differential privacy [5], which provides a rigorous guarantee of indistinguishability of two neighboring databases. One of the most appealing features of GeoI, inherited from differential privacy, is the guarantee of privacy protection against any attacker to a certain degree.
However, since the proposal of GeoI, many studies [3, 14, 25] have identified its weaknesses. Yu et al. [25] showed that GeoI did not protect location privacy against the optimal inference attack [16], and they proposed the framework that adapted GeoI to the expected inference error [17], a complementary notion of GeoI. Chatzikokolakis et al. [3] focused on the fact that GeoI did not consider privacy for semantic information (such as population density) and proposed a mechanism that guarantees the protection of this privacy by using a graph whose weight of edges contains such information. However, this graph does not consider a road network. Oya et al. [14] quantified the privacy against an adversary who knows that the true location is one of two locations. The researchers showed that an output of the mechanism achieving GeoI results in a worse quality of service to protect the user’s location privacy.
In this study, we find that GeoI provides inadequate privacy guarantee and insufficient utility for some LBSs over road networks, such as the k-nearest neighbor search (e.g., searching for k restaurants nearest to the user location). In such LBSs, the quality of service can be improved by taking advantage of the road network instead of the Euclidean space. This is because objects can usually move only on a predefined set of trajectories as specified by the road network and it is natural for these LBSs to use the distance on the road network [4, 9, 15]. GeoI may not be practical for LBSs over a road network because it assumes that (1) the perturbed location can be any location in a continuous plane and (2) the distance between locations is measured by the Euclidean distance.
Due to assumption (1), GeoI may result in unexpected privacy loss. For example, as shown in Fig. 1, if a user’s perturbed location is unreasonable, such as a position on the sea, an adversary can realize that such a location is impossible, which may cause unexpected privacy leakage. Next, due to assumption (2), GeoI may offer inadequate utility for LBSs over a road network. Taking k-nearest neighbor search using shortest path for example, as shown in Fig. 2, a user of LBS searching the nearest restaurant expects that restaurant 2 will be returned in a higher probability than restaurant 1. This is because the path to restaurant 1 is farther than to restaurant 1 because of the river. However, if the user uses a GeoI mechanism, the probabilities outputting restaurants 1 and 2 are the same since the two Euclidean distances between restaurants and the user are the same.
In this paper, we study how to protect location privacy while preserving high utility for LBS over a road network. Our contributions are threefold. First, we identify two insufficiencies of GeoI by two empirical evaluations. We quantitatively analyze the change of the inference error w.r.t. adversaries having the knowledge of road network or not when the location is protected by the planar Laplace mechanism that Andrés et al. [10] proposed, and show its privacy leakage. Additionally, we propose the formulation of the utility of the mechanism for LBSs over a road network. We compute its utility of the planar Laplace mechanism on a graph (PLG), which is our extension of PLM for LBSs over a road network, on two real-world road networks, which shows that it performs poorly w.r.t. this formulation. Second, we propose a new privacy notion based on differential privacy, called Geo-Graph-Indistinguishability (GeoGI) that takes a road network into consideration so that we can construct a more suitable mechanism than GeoI mechanism for protecting location privacy in LBSs over a road network. We design a Graph-Exponential Mechanism (GEM) satisfying GeoGI. We also show the relationship between GeoI and GeoGI to explain theoretically why GeoGI is more suitable privacy notion for LBSs over a road network. Third, to better understand the proposed mechanism, we empirically evaluate privacy and utility of the approach in the case of two real-world road networks in Japan; the results verify that the proposed GEM for GeoGI achieves higher utility than PLM for GeoI when both mechanisms have the same empirical privacy level.
2 Preliminary and Problem Setting
2.1 Geo-Indistinguishability [10]
In this section, we describe the definition of Geo-Indistinguishability (GeoI). Let \(\mathcal {X}\) be a set of locations and let \(\mathcal {Z}\) be a set of query outputs. Intuitively, a mechanism K achieving GeoI guarantees that K(x) and \(K(x^\prime )\) are similar to a certain degree for any two locations \(x,x^\prime \in \mathcal {X}\). This means that even if an adversary obtains an output of the mechanism, he cannot distinguish the true location from other locations to a certain degree.
The multiplicative distance \(d_\mathcal {P}\) that expresses the distance between two probability distributions \(\sigma _1\) and \(\sigma _2\) on \(\mathcal {S}\) is defined as \(d_\mathcal {P}(\sigma _1,\sigma _2)=\sup _{S\in \mathcal {S}}|\ln {\frac{\sigma _1(S)}{\sigma _2(S)}}|\) with the convention that \(|\ln {\frac{\sigma _1(S)}{\sigma _2(S)}}|=0\) if both \(\sigma _1\) and \(\sigma _2\) are zero and \(\infty \) if only one of them is zero. \(d(x,x^\prime )\) represents the Euclidean distance between x and \(x^\prime \). Given \(\epsilon \in \mathbb {R^{+}}\), \(\epsilon \)-GeoI is defined as follows.
Definition 1
(\(\epsilon \)-geo-indistinguishability). The mechanism satisfies \(\epsilon \)-GeoI iff
\(\forall x,x^\prime \in \mathcal {X},Z\subseteq \mathcal {Z}\)
Mechanism Satisfying \(\varvec{\epsilon }\)-GeoI. The authors of [10] introduced a mechanism called planar Laplace mechanism (PLM) for achieving \(\epsilon \)-GeoI. The probability distribution that PLM generates is called the planar Laplace distribution and, as its name suggests, is derived from a two-dimensional version of the Laplace distribution as follows: \(\Pr (PLM_\epsilon (x)=z) = \frac{\epsilon ^2}{2\pi }\mathrm {e}^{-\epsilon d(x,z)}\).
2.2 Problem Statement
As we described in Sect. 1, some LBSs improve their services by using a road network. In this paper, we assume these LBSs, where the road network G is defined as a weighted undirected graph (V, E). Let V be a set of vertices which represent points on the road network, each of which has a coordinate on the Euclidean plane, and let E be a set of edges. The weight w(a, b) of edges between connected vertices \(a\in V\) and \(b\in V\) represents the shortest distance of the road on the Euclidean plane connecting the two vertices a and b, which leads to \(w(a,b)\ge d(a,b)\). Then, \(d_s(v,v^\prime )\) represents the shortest path length between \(v\in V\) and \(v^\prime \in V\), and following inequality holds. This is because \(d(v,v^\prime )\) stands for the shortest distance as the crow flies while \(d_s(v,v^\prime )\) stands for the shortest distance on the road network.
In these LBSs, a user who wants to receive the service sends a vertex that represents his location to an untrusted LBS provider, and the LBS provider performs the service’s computations w.r.t. the vertex (using the road network) and provides the service. Furthermore, we assume that there is no trusted server. Hence, a user needs to protect privacy on his device by himself.
Quantification of Utility and Privacy Guarantee. When a user uses a mechanism to protect his privacy, the quality of the service the user receives degrades. Shokri et al. [16] generally quantified this quality loss, referred to as service quality loss (SQL), in LBSs when a user uses mechanism K:
Here, \(\pi _u\) is the probability distribution representing the probability of user’s location, called a prior of the user. \(d_q(r,r^\prime )\) represents the metric of a degree of dissimilarity, which depends on the LBS. Thus, this means the expected value of a degree of dissimilarity between the actual location of the user and the location obfuscated by mechanism K. Shokri et al. used the Euclidean distance from a natural idea that the longer the Euclidean distance between the true location and the obfuscated location is, the worse the service becomes. We call this SQL\(_e\).
Shokri et al. also quantified the degree of a privacy protection by a mechanism. The researchers translated location privacy into adversarial error (AE) by measuring how accurately an adversary could infer the user’s true location. Formally, AE can be formulated as follows:
Here, \(\pi _a\) is the probability distribution representing the adversarial knowledge about the user’s actual location, called a prior of the adversary. An inference mechanism h outputs an inferred point by drawing a point according to the probability distribution when given an obfuscated point, which stands for the inference of the adversary. Thus, \(\Pr (h(r^\prime )=\hat{r})\) means the probability of estimating \(\hat{r}\) as the actual location of the user when the adversary observes \(r^\prime \). Therefore, AE represents the expected value of a degree of dissimilarity \(d_q(\hat{r},r)\) between the user’s true location r and the location \(\hat{r}\) the adversary infers. As in the case of SQL, the researchers used the Euclidean distance as \(d_q\).
The major notations in this paper are summarized in Table 1.
3 Evaluating Privacy and Utility of Geo-Indistinguishability
In this section, we empirically show two insufficiencies of GeoI caused by considering a road network. First, we describe the model of an adversary [17]. Then, we show the insufficiency of the privacy guarantee by modeling an adversary who considers the road network and quantifying the accuracy of the attack of this adversary. Next, we describe the formulation of the utility of an output of the mechanism [16]. Then, we propose the way of applying it for LBSs over a road network, and we show by experimentations with two real-world road networks that PLM performs poorly for this formulation.
3.1 Empirical Privacy Evaluation
We assume that the adversary also uses a road network to infer the user’s actual location because a road network is publicly available. In the paper [10], the authors did not consider such an adversary, and we empirically show that if the adversary considers a road network, this may lead to privacy leakage even if the mechanism satisfies GeoI, which is referred to as an insufficiency of the privacy protection.
Adversarial Model. First, we describe the model of the adversary who tries to infer the user’s actual location. Shokri et al. [17] modeled the adversary who knows the prior \(\pi _a\) and the mechanism that the user uses and can solve problems with any computational complexity. Although this assumption is advantageous for the adversary, showing the protection against this adversary will guarantee strong privacy. When the adversary obtains the user’s obfuscated location \(r^\prime \), he tries to infer the user’s true location by the optimal inference attack. In this attack, an adversary solves the following mathematical optimization problem and obtains the optimal probability distribution and constructs the optimal inference mechanism h w.r.t. his knowledge; by using this mechanism with input \(r^\prime \), he estimates the user’s true location.
We model an adversary who knows a road network in this way. If an adversary knows a road network, the domain of his prior \(\pi _a\) is V, and \(d_p\) is \(d_s\) because we assume that the adversary also tries to improve his inference w.r.t. the shortest distance. In this setting, this is a linear programming problem because \(\Pr (h(r^\prime )=\hat{r})\) represents a variable and the other terms are constant so that the objective function and the constraints are linear. We solve this problem using CBC (coin-or branch and cut)Footnote 1 solver of the PuLP library of Python.
Experiment. In the following paragraph, we show that the adversary who knows a road network can attack with higher accuracy than can the adversary who does not know it. To make this easy to understand, we use a simple synthetic data illustrated in Fig. 3.
This map consists of 1600 squares with the side length of 100 m; that is, the area dimensions are 4000 m * 4000 m, and each lattice point has a coordinate. The red line through the center represents the road where the user is considered to be, and the other area represents locations where the user must not be, such as on the sea.
Thus, we assume that the user’s location is determined according to a uniform distribution on the red line and inside the black frame to make an output of the mechanism planarly spread; the adversary who does not consider uses the prior given by the uniform distribution inside the black frame, and the adversary who considers the road network uses the prior given by the uniform distribution on the red line inside the black frame. Then, Fig. 4 shows AE\(_e\) of each adversary w.r.t. the privacy parameter \(\epsilon \) of the mechanism. Comparing AE\(_e\) of both adversaries, it is clear that the adversary with the prior considering the road network can estimate the user’s true location more accurately.
Remark 1. This results from that the PLM could obfuscate the user’s location to a place where the user must not be; in the case of Fig. 3, anywhere except the red line. Thus, determining this place contributes to the accuracy of the attack. Figure 4 shows that an adversarial error could become approximately half in this particular case, so this is an insufficiency of the privacy protection of GeoI.
3.2 Utility
If a user uses LBSs over a road network and uses a mechanism on the Euclidean plane, such as PLM, the user or the LBS provider needs to map the perturbed location to a vertex of the road network because the LBS provider presumes that the user is located at a vertex of the graph to take advantage of a road network. For example, in the LBS that searches for the nearest restaurants, the LBS provider needs to compute the shortest path length between vertices where the user and restaurants are located. If the user is located outside of the graph, the shortest path length cannot be computed.
Then, it is worth noting that if the user (rather than the service provider) performed the mapping to the vertex before the user sent the perturbed location, it would prevent the perturbed location from being the location where the user must not be and would improve the privacy protection. In this view, we propose the mechanism on the graph, which is defined as the algorithm that outputs the perturbed vertex when given a vertex. Then, we can formulate SQL on a road network as SQL\(_s\) using the shortest path length as the metric.
In this formulation, we anticipate the lower utility of an output of the mechanism achieving GeoI because it cannot consider SQL\(_s\) due to the definition using the Euclidean distance. For example, if there is a river with no bridges so that the user cannot cross it, the opposite riverside is far away and obfuscating to the opposite riverside results in the lower utility (see Fig. 2). However, GeoI may consider the opposite riverside to be close.
Then, we can also formulate the adversarial error over a road network that we call AE\(_s\) as follows:
This formula expresses the expected value of the shortest path length between the actual location and the location that an adversary with the prior of \(\pi _a\) infers with inference mechanism h when a user uses mechanism K. Intuitively, this represents the adversarial error on a road network because we use the shortest path length \(d_s\) as the metric. Thus, when the adversary can infer the true location on the road network with a high accuracy, the formula (7) will have a small value. It can be stated that if AE\(_s\) is small, the privacy protection level of the mechanism is low.
Experiments. Here, we empirically show that the mechanism satisfying GeoI may perform worse w.r.t. SQL\(_s\) than we expect, since the definition of GeoI considers SQL as SQL\(_e\). To illustrate this, we compare SQL\(_s\) and SQL\(_e\) of the same GeoI mechanism. As we stated, we use a mechanism on a graph for LBSs over a road network. Then, we propose a natural and straightforward way of converting PLM to a planar Laplace mechanism on a graph (PLG). First, a user perturbs the location using PLM. Next, the user maps the perturbed location on the Euclidean plane to the nearest vertex on the road network. Figure 5 is an example of an output of PLG. Formally, we formulate this mechanism as follows:
Here, \(x_v\) is the coordinate of vertex v, and let \(S_w\) be a Voronoi cell of vertex w when the Voronoi diagram created from the graph on the Euclidean plane is given.
Theorem 1
PLG\(_\epsilon \) satisfies \(\epsilon \)-GeoI on the graph.
We refer the reader to the appendix for the proof. Thus, it is assumed that this is a straightforward way of GeoI mechanism in our setting where a user needs to output a vertex as we stated in Chap. 2.2, and because we cannot use PL, it is reasonable to use this mechanism instead of PL. We compute SQL\(_s\) and SQL\(_e\) on two road networks. We used OpenStreetMapFootnote 2 to retrieve two maps of areas of two cities, Tokyo and Akita, in Japan with dimensions of 4000 m * 4000 m as in Fig. 6.
We plot the results in Figs. 7 and 8 around the range where SQL is reasonable. If the user uses the same mechanism (i.e., the same \(\epsilon \)), it is observed that the utility for the LBS over a road network is worse. This outcome is caused by the difference of the Euclidean distance and the shortest path length between two vertices. Additionally, it is observed that the difference between SQL\(_s\) and SQL\(_e\) on the Akita graph is larger than that on the Tokyo graph at the same SQL\(_e\) because the difference between the two distances on the Akita graph is larger than that on the Tokyo graph.
Remark 2. GeoI constrains the mechanism to use the Euclidean distance so that the mechanism cannot improve its utility of the output for LBSs using a road network. Regardless of how hard we try to improve the utility of the mechanism output, as long as there is the constraint of GeoI, the mechanism cannot consider a road network, and we cannot improve the mechanism. Additionally, we showed that SQL\(_s\) of PLG is worse on two graphs; however, there may be a road network that results in much worse utility than we showed. Unless the mechanism considers a road network, the mechanism cannot guarantee high utility.
4 Geo-Graph-Indistinguishability
In this section, we propose a new definition of location privacy called Geo-Graph-Indistinguishability (GeoGI) for LBSs using a road network, which is tolerant to the weaknesses of GeoI. We first formally define GeoGI, and then propose a mechanism satisfying GeoGI which is called Graph-Exponential Mechanism (GEM). Finally, we clarify the relationship between GeoI and GeoGI, and describe validity of the definition of GeoGI.
4.1 Definition
Given a graph \(G=(V,E)\) representing a road network, let \(\mathcal {W}\) be a set of vertices that a mechanism outputs. Then, mechanism K on the graph returns the random vertex \(w\in \mathcal {W}\) according to a probability distribution when given a vertex \(v\in V\). Then, given \(\epsilon \in \mathbb {R^+}\), \(\epsilon \)-Geo-Graph-Indistinguishability is defined as follows.
Definition 2
(\(\epsilon \)-Geo-Graph-Indistinguishability). A mechanism K on a road network \(G=(V,E)\) satisfies \(\epsilon \)-Geo-Graph-Indistinguishability iff \(\forall v,v^\prime \in V, \forall W\subseteq \mathcal {W}\),
The definition can be also formulated as \(\forall v,v^\prime \in V, \forall W\subseteq \mathcal {W},\frac{\Pr (K(v)\subseteq W)}{\Pr (K(v^\prime )\subseteq W)} \le \mathrm {e}^{\epsilon d_s(v,v^\prime )}\). This formulation implies that GeoGI is an instance of \(d_\mathcal {X}\)-privacy [7] proposed by Chatzikokolakis et al. as are GeoI and differential privacy. The authors showed that an instance of \(d_\mathcal {X}\)-privacy guaranteed strong privacy. We refer the reader to the appendix for further details. Intuitively, this definition guarantees that for any \(v,v^\prime \in V\), the closer to \(v^\prime \) a vertex v is w.r.t. the shortest path length, the more similar K(v) and \(K(v^\prime )\) are.
It is worth noting that the definition of GeoGI includes a given graph representing a road network, and this results in the privacy protection level and utility varying depending on the road network even if the privacy parameter \(\epsilon \) remains the same.
4.2 Graph Exponential Mechanism
In this section, we propose a mechanism that achieves \(\epsilon \)-GeoGI. Given parameter \(\epsilon \in \mathbb {R^+}\) and graph \(G=(V,E)\), we define \(GEM_\epsilon \) for any user’s location \(v\in V\) and perturbed location \(w\in \mathcal {W}\) as follows.
Definition 3
\(GEM_\epsilon \) takes v as an input and outputs w with the following probability.
where \(\alpha \) is a normalization factor and \(\alpha (v) = \frac{1}{\sum _{w\in V}\mathrm {e}^{-\frac{\epsilon }{2} d_s(v,w)}}\).
The pseudocode of GEM is described in Appendix (Sect. 8.3) due to space limitation. This mechanism employs the idea of exponential mechanism [12] that is one of the general mechanisms for achieving differential privacy.
Theorem 2
GEM\(_\epsilon \) satisfies \(\epsilon \)-GeoGI.
We refer the reader to the appendix for the proof. This mechanism considers a road network so that high utility for LBSs over a road network can be expected. Moreover, since this mechanism satisfies GeoGI, strong privacy based on differential privacy is guaranteed.
Creating the Probability Distribution and Drawing a Random Point. Because we assume that the LBS provider is untrusted and there is no trusted server, a user needs to create this distribution by himself and choose the perturbed vertex according to the distribution. In this section, we describe a method to do this and its issues caused by the number of vertices.
To create the probability distribution, (i) the user gets shortest path lengths to all vertices from the vertex where the user is located. (ii) Then the user computes \(\mathrm {e}^{-\frac{\epsilon }{2}d_s}\) and based on this distribution, (iii) chooses a point.
Phase (i) is acceptable if the server which has enough computing power computes the all shortest lengths and sends users it in advance. This is because the shortest path length can be computed by Dijkstra’s algorithm; this computational complexity of this operation depends on the data structure. if we use a naive method, it is \(O(|E|+|V|^2)\), and it can be improved by using Fibonacci heap to \(O(|E|+|V|\log |V|)\), where |V| and |E| represent the counts of edges and vertices. However, There is a problem if the user needs to compute it and the size of vertices is large because the user uses a mobile phone with limited computing power. So we have to consider the better algorithm. On a road network, a fast algorithm computing the shortest path length has been studied; we refer the reader to [1] that may be applied to our algorithm. Phase (ii) is no problem because it computational complexity is O(|V|). For phase (iii), when the number of vertices is much larger than we expected, we may not be able to effectively sample the vertices according to the distribution. This problem has also been studied and is known as consistent weighted sampling (CWS): we refer the reader to [11, 23]. We believe that these studies can be applied to our algorithm and it can be computed even if the size of vertices is somewhat large.
4.3 Analyzing the Relationship Between GeoI and GeoGI
In this section, we describe the relationship between GeoI and GeoGI. There are two major differences: one is the domain, and the other is the distance metric.
First, we state the difference of the location domain. We can design a GeoI mechanism on the Euclidean plane; however, the same cannot be done for GeoGI because GeoGI constrains a mechanism to use a vertex on the graph due to the definition using the shortest path length. Since we exploit the mechanism for LBSs using a road network, the constraint does not pose a problem. Moreover, this constraint prevents the perturbation to a location where the user must not be located and improve the privacy guarantees, as we stated in Sect. 3.1. We refer the mechanism which meets this constraint as a mechanism on the graph.
The other difference is the used metric. In this part, we assume the mechanism on a graph; otherwise, we cannot design a GeoGI mechanism. Then, the following theorem holds due to the used metric of GeoGI.
Theorem 3
If a mechanism on the graph satisfies \(\epsilon \)-GeoI, it satisfies \(\epsilon \)-GeoGI.
This is due to the definition of a graph that represents the road network. Using Inequality (2), we derive the following inequality:
This inequation shows that the GeoI mechanism is also the GeoGI mechanism, but the reverse is not always true. For example, PLG satisfies both GeoI and GeoGI. This means that GeoGI relaxes the restriction of GeoI. Thus, we can design a more suitable mechanism which improves the utility for a road network.
It is worth noting whether this relaxing of the definition leads to weakening of the guarantees of privacy protection. In short, GeoGI has no guarantees of privacy protection w.r.t. the Euclidean distance so that if a user uses a mechanism that satisfies GeoGI to protect the location, the adversary may easily distinguish the user’s location in terms of the Euclidean distance. In what follows, we show this fact using the notion of the true probability (TP). The probability that an adversary can distinguish user’s location is represented as
Here \(\delta (\hat{v},v)\) is a function that returns 1 if \(\hat{v}=v\) holds and otherwise returns 0. TP means the expected value of a probability that an adversary can remap the obfuscated location to the true location.
We assume a set of graphs, each of which has only two vertices. The Euclidean distance between the vertices is the same for all the graphs, but the weight of the edge between them is different for each graph (Fig. 9). Next, we assume that each prior the user and the adversary have is a uniform distribution on two vertices of this graph, and we compute TP of PLG and GEM. Figure 10 shows the change of TP when the weight, that is, the shortest path length, changes. Due to the guarantee of the Euclidean distance of GeoI, PLG does not degrade TP even if the shortest path length changes, however, since GeoGI does not have a guarantee of the Euclidean distance, GEM significantly degrades TP, which means that the adversary can know the user’s true location.
GeoGI can achieve better utility than can GeoI by guaranteeing privacy protection in terms of the shortest path length instead of the Euclidean distance. This idea comes from the interpretation of privacy; in this paper, we assume that privacy and the utility can be interpreted as the shortest path length on the graph and that it should be acceptable for LBSs on a road network. Therefore, GeoGI may not be suitable for protecting location privacy if the privacy should be interpreted as the Euclidean distance, e.g., querying the weather conditions where we need to protect a wide range of locations.
4.4 Discussion
We assume that a graph representing the road network is given in GeoGI. We note that setting a different graph (road network) in the definition of GeoGI implies a different privacy model. Thus, GeoGI can be personalized. For example, a conservative user may want to use a global graph in GeoGI that covers all possible locations on Earth, while a liberal user may use a smaller graph in GeoGI that only covers the city of her residence. The graph may also depend on application scenarios in practice. For example, if the application is vehicle navigation, the graph should cover all highway road network instead of pedestrian lanes. In summary, the privacy level and utility depend on the shape of the graph, such as its density and size, and its relationships should be shown because this has the potential to lead to improvement of the privacy protection and utility. This topic is left to future research.
Additionally, the utility and the privacy protection level depend on the vertex where the user uses the GeoGI mechanism because, as opposed to the Euclidean plane that spreads uniformly, each vertex relates differently with other vertices and because to satisfy GeoGI, the mechanism needs to vary the probability distribution depending on the vertex’s relationship. This complexity obscures the mechanism performance for a user, and a user will not know how to adjust the privacy parameter. Then, we propose a way of measuring the performance of the mechanism used by a user located at a certain vertex. We formulate the mechanism’s utility as SQLr\(_s\) and its privacy protection level as AEr\(_s\) as follows:
When a user is located at vertex v, SQLr\(_s\) represents the expected value of the shortest path length between v and the perturbed vertex \(v^\prime \). AEr\(_s\) represents the expected value of the shortest path length between v and vertex \(\hat{v}\) inferred by the adversary with inference mechanism h (in this case, we assume optimal inference attack). We show SQLr\(_s\) and AEr\(_s\) of the Akita graph using GEM with \(\epsilon =0.002\) in Figs. 11 and 12, respectively. As we can see, the utility loss (i.e., SQLr\(_s\)) and privacy (i.e., AEr\(_s\)) differ on different locations in spite of the same privacy parameter. We can develop a tool to visualize the privacy and utility of the mechanism under different privacy parameters, which may help users to determine a proper privacy parameter. We defer this to a long version of this work.
5 Experiments
In this section, we show that GEM outperforms the GeoI mechanism in terms of utility and privacy protection for LBSs on a road network. To demonstrate this conclusion, we performed two experiments as follows. First, since GEM, in contrast to PL, may perturb the input location to a location that is out of the road network, such as on the sea (as stated in Sect. 3.1), it is assumed that GEM achieves better privacy guarantee when the adversaries have the knowledge of road networks. To show this, we computed AE\(_e\) of GEM on the synthetic graph we used in Sect. 3.1. Next, because GEM, in contrast to PLG, considers a road network (as stated in Sect. 3.2), it is assumed that the output quality of GEM is higher than PLG. To show this, we computed SQL\(_s\) of GEM for two cities we used in Sect. 3.2. Additionally, we compared the results with those of PL and PLG.
5.1 Privacy Protection Level of GEM
We computed AE\(_e\) of GEM on the graph of Fig. 3. As in Sect. 3.1, we assume that the adversary knew the road network so that his prior was a uniform distribution on the red line inside the black frame. Since GEM, in contrast to PL, outputs only the locations on the red line, it is assumed that AE\(_e\) of GEM is higher than that of PLM. To fairly compare AE\(_e\) of each mechanism, we performed the comparison under the same utility SQL\(_e\).
As is shown in Fig. 13, AE\(_e\) of GEM is higher than that of PL in case of the adversary who knows a road network. This means that GE can protect user privacy more strongly than can PL because GE guarantees that the output is on the road network.
5.2 Utility of GEM
We computed SQL\(_s\) of GEM on two graphs of Fig. 6. Since GEM considers the road network, it is assumed that SQL\(_s\) of GEM is higher than that of PLG. To fairly compare SQL\(_s\) of each mechanism, we performed the comparison under the same AE\(_s\). Then, we assumed that both the priors that a user and an adversary have are uniform distributions on the graph with a range of 2000 m from the centers of maps.
As we can see from Figs. 14 and 15, SQL\(_s\) of GEM is lower than that of PLG. Thus, a GEM output has higher utility for LBSs using a road network than does a PLG output. Additionally, it can be said that the difference of the SQL between PLG and GEM is larger on the Akita graph than on the Tokyo graph. The reason is that the difference between the Euclidean distance and the shortest path length is larger for vertices of the Akita graph.
6 Related Work
6.1 Location Privacy on a Road Network
To the best of our knowledge, this is the first study to propose the perturbation with the differential privacy approach over the road network. However, several studies explored location privacy on a road network.
Tyagi et al. [20] studied location privacy over a road network for VANET users, and they show that there are no comprehensive privacy-preserving techniques or frameworks that cover all privacy requirements or issues to maintain the desired level of location privacy.
Wang et al. [21] and Wen et al. [22] proposed the method of privacy protection for the user who wishes to receive location-based services and travels over roads. The authors use k-anonymity as the protection method and take advantage of the road network constraints.
A series of key features distinguish our solution from these studies: (a) we use the differential privacy approach so that our solution has a guarantee of privacy protection against any attacker and (b) we assume that there is no trusted server. We highlight these two points as advantages of our proposed method.
6.2 State-of-the-Art Privacy Models
Since GeoI was published, many related applications have been proposed. To et al. [18] developed an online framework of privacy-preserving spatial crowdsourcing service using GeoI. Tong et al. [19] proposed a framework of privacy-preserving ridesharing service based on GeoI and differential privacy approach. It may be possible to improve these applications by using GeoGI instead of GeoI. Additionally, Bordenabe et al. [13] proposed an optimized mechanism satisfying GeoI; it may be possible to apply this method to GeoGI.
According to [10], if a user uses the GeoI mechanism multiple times, this causes privacy degradation due to correlations in the data; this scenario also applies to GeoGI. This issue remains a difficult and intensely investigated problem in the field of differential privacy. There are two kinds of approaches attempting to solve this problem. The first is to develop a mechanism for multiple perturbations that satisfies existing notion, such as differential privacy and GeoI [6, 8]. Kairouz et al. [8] studied the composition theorem and proposed a mechanism that upgrades the privacy guarantee. Chatzikokolakis et al. [6] proposed a method of controlling privacy using GeoI when locations are correlated. The second approach is to propose a new privacy notion for correlated data [2, 24]. Xiao et al. [24] proposed \(\delta \)-location set privacy to protect each location in a trajectory when a moving user sends locations. Cao et al. [2] proposed PriSTE, a framework for protecting spatiotemporal event privacy. We believe that these studies can be applied to our work.
7 Conclusion and Future Work
In this paper, by evaluating privacy and utility of PL, we have shown that the definition of GeoI is insufficient for LBSs over a road network to protect privacy and output the useful perturbed location. The core of our proposal is a new notion of privacy that we call GeoGI, which takes the place of GeoI for such LBSs, and a mechanism GEM, based on the exponential mechanism, to perturb the user location. We have shown how GeoGI relates to GeoI and that GeoGI is a more suitable privacy definition for such LBSs w.r.t. privacy protection and utility. We also have shown the effectiveness of our proposed approach by comparing GEM with PLG in the example of two cities in Japan.
In the future, we aim to extend the privacy model to several graphs. Although in this paper, we represented a road network as an undirected graph, it should be represented as a directed graph because of the existence of one-way roads, and this may degrade the utility. Additionally, we need to consider the movement mode such as walking, driving, and flying. Finally, we need to pay attention to the fact that multiple perturbations of correlated data such as trajectory data may degrade the level of protection even if the mechanism satisfies GeoGI as in case of GeoI and differential privacy. This topic has been intensely studied, and we believe that it can be applied to GeoGI. We plan to solve these problems in future research.
8 Appendix
8.1 Proofs
Theorem 1
PLG\(_\epsilon \) on graph \(G=(V,E)\) satisfies \(\epsilon \)-GeoI on graph G.
Proof
This proposition can be formulated as follows for all vertices \(v,v^\prime \in V,W\subseteq \mathcal {W}\),
Furthermore, we derive
Since PL\(_{\epsilon }\) satisfies \(\epsilon \)-GeoI, for all \(z\in \mathcal {Z}\), we derive
By the theorem of integral inequality we obtain
This concludes the proof.
Theorem 2
GEM\(_\epsilon \) satisfies \(\epsilon \)-GeoGI.
Proof
This proposition can be formulated for all vertices \(v,v^\prime \in V,W\subseteq \mathcal {W}\):
The ratio of \(\Pr (GEM(v)=w)\) and \(\Pr (GEM(v^\prime )=w)\) is expressed as follows:
When \(-d_s(v,w)+d_s(v^\prime ,w)\) has the maximum value for \(w\in {W}\), (21) reaches the maximum value too. Due to the triangle inequality, the inequality \(\forall w \in {W},-d_s(v,w)+d_s(v^\prime ,w)\le -d_s(v,w)+d_s(v^\prime ,v)+d_s(v,w)=d_s(v,v^\prime )\) holds, and the following inequality is derived:
Next, we show that the following inequality holds:
The inequality 23 is expressed as follows for any \(v,v^\prime \in V,w\in W\) of any graph G:
Using the triangle inequality, we have \(\forall w \in {W},d(v,w)-d(v,v')\le d_s(v',w)\), \(\mathrm {e}^{-\frac{\epsilon }{2}d_s(v',w)} \le \mathrm {e}^{-\frac{\epsilon }{2}( d_s(v,w)- d_s(v,v'))}\). Therefore,
Using (22) and (23), we obtain
8.2 \(d_\mathcal {X}\)-privacy
As we stated in Sect. 4, GeoGI is an instance of \(d_\mathcal {X}\)-privacy: due to this characterization, we can give two characterizations of GeoGI that mathematically show the guarantee of strong privacy protection. In this section, we stated the characterizations of GeoGI.
Hiding Function. The first characterization uses the concept of a hiding function \(\phi :V\rightarrow V\). For any hiding function and a secret location \(v\in V\), when an attacker who has a prior distribution that expresses the user’s location information obtains each output \(w\sim K(v),w^\prime \sim K(\phi (v))\) of a mechanism satisfying \(\epsilon \)-GeoGI, the following inequality holds for the multiplicative distance between its two posterior distributions:
Let \(d_s(\phi (v))=sup_{v\in V}d_s(v,\phi (v))\) be the maximum distance between an actual vertex and its hidden version. This inequality guarantees that the adversary’s conclusions are the same (up to \(2\epsilon d_\mathcal {X}(\phi )\)) regardless of whether \(\phi \) has been applied or not.
Informed Attacker. The other characterization is shown by the multiplicative distance between the prior distribution and its posterior distribution that is derived by obtaining an output of the mechanism. By measuring its distance, we can determine how much the adversary has learned about the secret. We assume that an adversary (informed attacker) knows that the vertex v where the user is located in N. When the adversary obtains an output of the mechanism. The following inequality holds for the multiplicative distance between his prior distribution \(\pi _{|N}(v)=\pi (v|N)\) and its posterior distribution \(p_{|N}(v|w)=p(v|w,N)\):
Let \(d_s(N)=max_{v,v^\prime \in N}d_s(v,v^\prime )\) be the maximum distance between vertices in N. This inequality guarantees that when \(d_s(N)\) is small, the adversary’s prior distribution and its posterior distribution are similar. In other words, the more the adversary knows about the actual location, the less he cannot learn about the location from an output of the mechanism.
8.3 Pseudocode of GEM

References
Akiba, T., Iwata, Y., Kawarabayashi, K., Kawata, Y.: Fast shortest-path distance queries on road networks by pruned highway labeling. In: Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 147–154 (2014)
Cao, Y., Xiao, Y., Xiong, L., Bai, L.: PriSTE: from location privacy to spatiotemporal event privacy. arXiv preprint arXiv:1810.09152 (2018)
Chatzikokolakis, K., Palamidessi, C., Stronati, M.: Constructing elastic distinguishability metrics for location privacy. Proc. Priv. Enhancing Technol. 2, 156–170 (2015)
Cho, H.J., Chung, C.W.: An efficient and scalable approach to CNN queries in a road network. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 865–876 (2005)
Dwork, C.: Differential privacy. In: van Tilborg, H.C.A., Jajodia, S. (eds.) Encyclopedia of Cryptography and Security, pp. 338–340. Springer, Boston (2011). https://doi.org/10.1007/978-1-4419-5906-5
Chatzikokolakis, K., Palamidessi, C., Stronati, M.: A predictive differentially-private mechanism for mobility traces. In: De Cristofaro, E., Murdoch, S.J. (eds.) PETS 2014. LNCS, vol. 8555, pp. 21–41. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08506-7_2
Chatzikokolakis, K., Andrés, M.E., Bordenabe, N.E., Palamidessi, C.: Broadening the scope of differential privacy using metrics. In: De Cristofaro, E., Wright, M. (eds.) PETS 2013. LNCS, vol. 7981, pp. 82–102. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39077-7_5
Kairouz, P., Oh, S., Viswanath, P.: The composition theorem for differential privacy. IEEE Trans. Inf. Theor. 63(6), 4037–4049 (2017)
Kolahdouzan, M., Shahabi, C.: Voronoi-based k nearest neighbor search for spatial network databases. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 840–851 (2004)
Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, pp. 901–9134 (2013)
Manasse, M., McSherry, F., Talwar, K.: Consistent weighted sampling. Technical report, MSR-TR-2010-73, June 2010
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 94–103, October 2007
Bordenabe, N.E., Chatzikokolakis, K., Palamidessi, C.: Optimal geo-indistinguishable mechanisms for location privacy. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 251–262 (2014)
Oya, S., Troncoso, C., Pérez-González, F.: Is geo-indistinguishability what you are looking for? In: Proceedings of the 2017 on Workshop on Privacy in the Electronic Society, pp. 137–140 (2017)
Papadias, D., Zhang, J., Mamoulis, N., Tao, Y.: Query processing in spatial network databases. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 802–813 (2003)
Shokri, R., Theodorakopoulos, G., Boudec, J.Y.L., Hubaux, J.P.: Quantifying location privacy. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 247–262 (2011)
Shokri, R., Theodorakopoulos, G., Troncoso, C., Hubaux, J.P., Boudec, J.Y.L.: Protecting location privacy: optimal strategy against localization attacks. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 617–627 (2012)
To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. Proc. VLDB Endowment 7(10), 919–930 (2014)
Tong, W., Hua, J., Zhong, S.: A jointly differentially private scheduling protocol for ridesharing services. IEEE Trans. Inf. Forensics Secur. 12(10), 2444–2456 (2017)
Tyagi, A.K., Sreenath, N.: Location privacy preserving techniques for location based services over road networks. In: Proceedings of International Conference on Communications and Signal Processing (ICCSP), pp. 1319–1326, April 2015
Wang, T., Liu, L.: Privacy-aware mobile services over road networks. Proc. VLDB Endowment 2(1), 1042–1053 (2009)
Wen, J., Li, Z.: A method of location privacy protection in road network environment. In: 2018 International Conference on Smart Materials, Intelligent Manufacturing and Automation (SMIMA), vol. 173, p. 03048 (2018)
Wu, W., Li, B., Chen, L., Zhang, C., Yu, P.S.: Improved Consistent Weighted Sampling Revisited. arXiv:1706.01172 [cs], June 2017
Xiao, Y., Xiong, L.: Protecting locations with differential privacy under temporal correlations. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security - CCS 2015, pp. 1298–1309 (2015)
Yu, L., Liu, L., Pu, C.: Dynamic differential location privacy with personalized error bounds. In: Proceedings of the Symposium on Network and Distributed System Security (NDSS) (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 IFIP International Federation for Information Processing
About this paper
Cite this paper
Takagi, S., Cao, Y., Asano, Y., Yoshikawa, M. (2019). Geo-Graph-Indistinguishability: Protecting Location Privacy for LBS over Road Networks. In: Foley, S. (eds) Data and Applications Security and Privacy XXXIII. DBSec 2019. Lecture Notes in Computer Science(), vol 11559. Springer, Cham. https://doi.org/10.1007/978-3-030-22479-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-22479-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22478-3
Online ISBN: 978-3-030-22479-0
eBook Packages: Computer ScienceComputer Science (R0)