1 Introduction

With the rapid development of the Internet of Things, mobile devices equipped with diverse embedded sensors (e.g., camera, accelerometer, compass) are pervasive. Mobile crowdsensing (MCS) [4, 13] has emerged recently as a promising pervasive sensing paradigm to enable the Internet of Things, which facilitates spatiotemporal data collection in large urban areas like transportation monitoring [28], air monitoring [12] and noise mapping [14]. A typical MCS system consists of a central platform resided in cloud and plenty of distributed mobile device users. According to the sensing requests of the platform, users collect location-based sensing data continuously and submit them to the platform for extracting useful information.

A major concern in spatiotemporal data collection via MCS is privacy leakage [7, 16], as spatiotemporal sensing data collected by users contain their private information, such as their trajectories and preferences. Moreover, an untrusted platform in the MCS system should be considered, and hence privacy protection should be conducted by each user independently. Note that both location tags and sensed values contained in the sensing data of users should be perturbed, before they are submitted to the platform. On one hand, the trajectory privacy of a user will be exposed to the untrusted platform, if sensing data with unperturbed location tags of the user are continuously submitted. On the other hand, sensed values also leak location privacy of users unexpectedly, since the location of a user may be inferred according to the values of collected data by adopting truth discovery methods. The intuition of inferring locations of users is that sensing data collected in the same location always have close values, while the values of sensing data collected in different locations may be discrepant. Thus, the sensed values of sensing data collected by users should be sanitized before they are submitted to the platform.

To solve the privacy concern in MCS, some privacy preserving approaches [5, 11, 18, 21, 22, 27] have been proposed based on differential privacy (DP)[2, 15] which is an effective tool to provide valid privacy protection and ensure the usability of aggregated sensing data at the same time. These DP-based approaches always assume that the platform is a trusted third party for users, which is responsible to sanitize collected sensing data and limit the disclosure of private information of users. However, the assumption is not true in reality, as the platform may leak the privacy of users for commercial benefits or be attacked by adversaries. Some other approaches [9, 10, 17, 19, 20, 23, 25] are proposed based on local differential privacy (LDP) [3, 6], in which users perturb their sensing data locally and independently before submitting sensing data to the platform. Hence, private information of users are protected. Moreover, truth discovery methods [8] can be adopted by the platform to extract true values from the perturbed data. However, there are few works considering preserving the privacy contained in both locations and sensed values of users at the same time.

In this work, we consider a MCS system with an untrusted platform, in which location-based sensing data are collected from mobile users continuously. Each datum submitted by a user consists of the identity of the user, the sensed value of a monitored object, the location tag and the time stamp. With sensed values collected by multiple users in an interested location, the platform applies a truth discovery method to obtain the estimated value of the monitored object. Obviously, the trajectory of the user can be released to the untrusted platform over the time, which is a succession of the timestamped locations. Moreover, even if the locations of users are perturbed, we consider that the platform can also infer the trajectories of users from their sensed values in the submitted data. Considering that there are few work considering the problem that sensing data of users may lead to unexpected location privacy leakage, we try to design a privacy protection approach for online spatiotemporal data collection, which protects the location privacy of participating users by perturbing both sensed values and locations in their submitted data.

However, the joint location-value privacy protection problem in MCS is particularly difficult due to the existence of the following challenges. Firstly, the locations of users contained in sensing data submitted to the platform should be perturbed to protect their privacy, which will lead to the platform mismatches the collected sensing data to a wrong location. Furthermore, the accuracy of values estimated based on sanitized sensing data with perturbed locations is impacted. Secondly, considering there is no trusted third party, the joint location-value privacy protection approach should be performed by each user locally and independently. It makes the truth discovery conducted by the platform becomes particularly difficult. Finally, there exists a natural intrinsic tradeoff between the level of privacy protection and the utility of perturbed data. In other words, a high-level privacy protection approach inevitably decreases the utility of sensing data, i.e., the accuracy of estimated values.

In response to these challenges, we propose a privacy protection approach for online spatiotemporal data collection via MCS, in which a location privacy preserving mechanism and a value privacy preserving mechanism are provided respectively. Specially, the location privacy preserving mechanism is designed based on random response that each user can perturb their locations locally. In the Gaussian-mechanism-based value privacy preserving mechanism, each user sanitizes the collected sensed values by adding random Gaussian noise independently. Spatiotemporal sensing data with perturbed locations and sanitized sensed values are submitted to the platform continuously.

The main contributions of this work can be summarized as follows:

  • We consider the privacy preserving problem in a MCS system to collect location-based sensing data over time, in which we observe that not only location tags but also sensed values submitted to an untrusted platform will expose the private information of users.

  • We propose a LDP-based privacy protection approach, which includes two privacy preserving mechanisms to perturb location tags and sensed values respectively. The approach can be performed by each user locally and independently. We theoretically prove that the two mechanisms achieve certain local differential privacy.

  • Extensive simulations are conducted to validate the performance of our proposed privacy protection approach. The simulation results show that the privacy of users is well preserved and the estimated values obtained by the truth discovery method is relatively accurate.

This paper is organized as follows. We first discuss related works in Sect. 2, and present the motivation of joint location-value privacy protection in Sect. 3. Then, we present our system model and some preliminaries in Sect. 4. Section 5 and Sect. 6 elaborate our proposed privacy protection approach and the theoretical analysis, respectively. Finally, simulation results are presented in Sect. 7, and the paper is concluded in Sect. 8.

2 Related Work

Privacy protection has received a lot of attention in MCS, while differential privacy is seen as a promising technology in recent studies [5, 9,10,11, 17,18,19,20,21,22,23, 25, 27]. These privacy protection approaches in MCS can be classified into two categories, i.e., DP-based approaches and LDP-based approaches.

2.1 DP-Based Approaches in MCS

DP-based privacy preserving approaches assume there exists a trusted third party(e.g., a platform or a central server) has been widely adopted and used in many areas [1, 26]. In the MCS system, there are some DP-based approaches [5, 11, 18, 21, 22, 27] which sanitize the sensing data collected from mobile users for privacy protection.

To et al. [18] introduce a framework for protecting location privacy of works participating spatial crowdsourcing tasks, which needs users’ cellular service providers to take coordination role between users and MCS platforms. Wang et al. [22] study the privacy protection problem in a crowd-sourced system for continuous real-time spatiotemporal data publishing, and an online privacy preserving scheme is proposed to monitor population statistics over infinite streams. Then, an enhanced RescueDP framework in [21] is proposed which leverages neural networks to accurately predict the values of statistics and improve the utility of released data. In [5, 11, 27], privacy preserving auction-based incentive mechanisms are designed to preserve the users’ bid privacy. Specifically, the mechanism designed by Jin et al. [5] approximately minimizes the platform’s total payment with a guaranteed approximation ratio. Besides, Lin et al. [11] propose two score functions to realize frameworks for privacy-preserving aution-based incentive mechanisms which achieves approximate social cost minimization. Differently, the joint effect of users privacy concerns and the positive network effect are considered in [27].

However, the assumption of a trusted third party is unpractical sometimes, as the platform may leak the privacy of users for commercial interests or be attacked by adversaries.

2.2 LDP-Based Approaches in MCS

Recently, some LDP-based approaches toward data statistics and analysis in MCS are widely adopted to alleviate the privacy concerns caused by untrusted third party [24]. Mobile users can sanitize their private sensing data locally and submitting the perturbed data to the platform.

There are some works [17, 23] focus on studying the privacy preserving data distribution estimation with LDP in MCS. Wang et al. [23] provide an optimal LDP-based privacy preserving mechanism for distribution estimation over user-contributed data, in which the private information of users contained in both qualitative data and discrete quantitative data can be protected. Ren et al. [17] develop LDP-based privacy-preserving algorithms for multi-dimensional data distribution estimation and data publication, which achieve high computation efficiency and data utility.

[19, 20, 25] design privacy-preserving frameworks to satisfy the privacy demands of users. In order to protect the location privacy of users, [20] design a LDP-based privacy-preserving framework which consists of a data adjustment function and an optimal location obfuscation, and they propose an inference algorithm to improve the inference accuracy of obfuscated data. While [19] leverage distortion privacy with differential privacy together to provide more comprehensive protection for users’ location privacy. Differently, a privacy-preserving task allocation framework in MCS is proposed in [25], in which provides personalized location privacy protection to meet different demands of users. Moreover, Lin et al. [10] propose a randomized response-based privacy-preserving crowdsensing data collection and analysis method to ensure users’ privacy, and Li et al. [9] provide a privacy preserving truth discovery mechanism with theoretical guarantees of both utility and privacy.

Unfortunately, there are few works considering both location tags and sensed values contained in sensing data may unexpectedly disclose the private information of users and further proposing a joint location-value privacy protection approach accordingly.

3 Motivation

Fig. 1.
figure 1

An example of inferring the real location of a user based on sensed values, even though the location is perturbed.

In this section, we aim to emphasize that joint location-value privacy protection is necessary for spatiotemporal data collection via MCS. Only perturbing the locations contained in sensing data collected by a user is not enough, as the platform can infer the real locations of users according to the sensed values. Here, we give a simple example to illustrate how the platform infers the real location of a user based on his/her sensed values, even though the location in submitted sensing data is perturbed.

Example: Suppose there is a platform requiring users to collect ambient noise from various interested locations. A location privacy preserving mechanism is provided to perturb their original locations to other possible locations with a certain probability. Assume there is a mobile user who collects 20 dB, 35 dB, 30 dB, and 15 dB of ambient noise in location A, B, and C over four time slots, where ambient noise is collected twice at location B. The location of the user at the second time slot (i.e., location B) is perturbed to location F. The real trajectory and perturbed trajectory of the user are \(A\rightarrow B\rightarrow B\rightarrow C\) and \(A\rightarrow F\rightarrow B\rightarrow C\), respectively. The sensing data of users submitted to the platform are shown in Fig. 1. In addition, we assume the platform can obtain relatively accurate estimations of the ambient noise in each location over time.

According to the estimated values in the second time slot, the platform can easily find F is not the real location of the user. In addition, according to the locations of the user in the first and third time slot, the platform can infer that the possible location of the user in the second time slot can be D, B, or E. Then, by comparing the estimated values of these three locations with the sensed values collected by the user in the second time slot, the platform can successfully infer that the real location of the user in the second time slot is B.

4 System Model and Preliminaries

In this section, we present the model of online spatiotemporal data collection in MCS, and introduce some preliminaries including truth discovery and local differential privacy.

4.1 System Model

In this work, we consider a typical crowdsensing system consists of a central platform located in cloud and a set of registered users equipped with smart devices. We denote the set of users by \(\mathcal {U}=\{u_1,u_2,\cdots ,u_n\}\). Users are mobile and distributed in an urban area. The platform requires users to collect location-based and time-sensitive sensing data around several interested locations continuously. The locations of interested points in the urban area are represented as \(\mathcal {L}=\{L_1,L_2,\cdots ,L_m\}\), where m is the number of interested locations. For convenience, we divide time into equal-interval time slots, i.e., \(\mathcal {T}=\{t_1,t_2,\cdots ,t_{\tau },\cdots \}\). In each time slot \(t_{\tau }\), the subset of users located around location \(L_j\in \mathcal {L}\) is denoted as \(\mathcal {U}_j^{\tau }\subseteq \mathcal {U}\).

Let \(u_i\) denote a user located around interested point \(L_j\) in time slot \(t_{\tau }\) (i.e., \(u_i\in \mathcal {U}_j^{\tau }\)). We denote the location of user \(u_i\) in \(t_{\tau }\) as \(l_i^{\tau }\), and we use the location of his/her nearby interested point to replace it, i.e., \(l_i^{\tau }=L_j\). The sensed value of sensing data collected by user \(u_i\) in time slot \(t_{\tau }\) is represented by \(v_i^{\tau }\). Each user submits the identity, the sensed value, the location tag, and the time stamp to the platform in real time.

Submitting original sensed values and locations of users will expose their private information (e.g., trajectories) to the platform and adversaries, since an untrusted platform may leak privacy of users for commercial interests and financial benefits or be attacked by adversaries. In this work, we consider users preserve their private information by submitting sanitized values of sensing data and perturbed locations to the platform. Specially, the perturbed location and the sanitized value of user \(u_i\) in time slot \(t_{\tau }\) is denoted by \(\tilde{l}_i^{\tau }\) and \(\tilde{v}_i^{\tau }\), respectively. Note that we assume \(\tilde{l}_i^{\tau }\in \mathcal {L}\).

With receiving all sensing data collected in \(t_{\tau }\), the platform aggregates the sanitized value \(\tilde{v}_i^{\tau }\) of user \(u_{i}\) according to perturbed location \(\tilde{l}_i^{\tau }\). Specially, we define the set of users whose perturbed location is \(L_j\) as \(\tilde{\mathcal {U}}_j^{\tau } = \{u_i\in \mathcal {U}|\tilde{l}_i^{\tau }=L_j\}\). According to the sanitized values \(\{\tilde{v}_i^{\tau }|u_i\in \tilde{\mathcal {U}}_j^{\tau }\}\) collected in \(L_j\), the platform can obtain the estimated value \(\bar{V}_j^{\tau }\) in location \(L_j\) by employing truth estimation as follows,

$$\begin{aligned} \bar{V}_j^{\tau } = \frac{\sum _{u_i\in \tilde{\mathcal {U}}_j^{\tau }} \tilde{w}_i^{\tau } \cdot \tilde{v}_i^{\tau }}{\sum _{u_i\in \tilde{\mathcal {U}}_j^{\tau }} \tilde{w}_i^{\tau }}, \end{aligned}$$
(1)

where \(\tilde{w}_i^{\tau }\) is the weight of user \(u_i\), calculated based on sanitized value at time slot \(t_{\tau }\). Correspondingly, we denote \(w_i^{\tau }\) as the weight of user \(u_{i}\) calculated based on the original sensed value at time slot \(t_{\tau }\).

4.2 Preliminaries

Truth Discovery [8]: Given an initialization of the weights of users, the truth discovery method iteratively conducts the following steps until the estimated value converges.

  • Truth estimation: Given the weights of users and sanitized values collected in location \(L_j\) at time slot \(t_{\tau }\), the estimated value \(\bar{V}_j^{\tau }\) is calculated as (1).

  • Weight update: According to difference between sanitized values submitted to the platform and estimated value \(\bar{V}^{\tau }\) of the monitored object, the weight of user \(u_i\) can be updated as

    $$\begin{aligned} \tilde{w}_i^{\tau }=\log \left( \frac{\sum _{u_r\in \tilde{\mathcal {U}}_j^{\tau }} (\tilde{v}_r^{\tau } - \bar{V}_j^{\tau })^2}{(\tilde{v}_i^{\tau } - \bar{V}_j^{\tau })^2} \right) . \end{aligned}$$
    (2)

Local Differential Privacy [3]: LDP is a promising technology used to provide privacy protection with a quantified guarantee, which is applied to the systems without a trusted third party.

Let M(x) denote the perturbed output of a randomization mechanism M given an input x. M achieves \((\epsilon ,\delta )\)-LDP if it satisfies the following definition.

Definition 1

(\((\epsilon ,\delta )\)-LDP). A randomization mechanism M with its output domain range(M) achieves \(\left( \epsilon ,\delta \right) \)-LDP, if for an arbitrary pair of inputs xy and any possible subset \(S\subseteq range(M)\), there exists

(3)

where \(\epsilon >0\) is privacy budget and \(\delta \ge 0\) is relaxation variable.

Specially, randomization mechanism M is called \(\epsilon \)-differential privacy when \(\delta =0\). Note that a lower value of privacy budget \(\epsilon \) and \(\delta \) indicates a stronger privacy protection level can be achieved, vice versa.

5 Methodology

In this section, we first introduce the overview of our proposed privacy protection approach, which includes two mechanisms for preserving location privacy and sensed value privacy of users, respectively. Then, we describe the detailed designs of these two mechanisms in the next two subsections.

5.1 Overview

Fig. 2.
figure 2

An illustration of the MCS system and our proposed privacy protection approach.

As shown in Fig. 2, our MCS system consists of a central platform resided in cloud and a set of mobile smart device users distributed in an urban area. Sensing data around interested locations are continuously collected by the users nearby and submitted to the platform. In each time slot, the operations conducted by each user and the platform are illustrated in the following.

Each user first collects the sensed value \(v_i^{\tau } \)of the monitored object in his/her current location \(l_i^{\tau }\). Then, each user performs the LPPM and VPPM locally and independently, to perturb the location and sanitize the sensed value as \(\tilde{l}_i^{\tau }\) and \(\tilde{v}_i^{\tau }\), respectively. Finally, the perturbed location and sanitized sensed value, as well as the identity of the user and the time stamp, are submitted to the platform.

The platform first aggregates sanitized values according to the perturbed locations of users after receiving all sanitized sensing data. Then, based on the sanitized values \(\{\tilde{v}_i^{\tau }|u_i\in \tilde{\mathcal {U}}_j^{\tau }\}\) in each location, the platform conducts the truth discovery method to estimate the true value of the monitored object \(\bar{V}_j^{\tau }\) in location \(L_{j}\) at time slot \(t_{\tau }\).

By conducting our privacy protection approach online in a MCS system, we can guarantee that the private information of users can be preserved and the value of the monitored object can be estimated accurately.

5.2 Location Privacy Preserving Mechanism (LPPM)

Original location tags contained in sensing data collected by users over time will disclose their trajectories to the platform or adversaries, which may pose severe threats to their real life and public security. In order to protect the location information of users, we provide a LPPM based on random response [3]. The main idea of this mechanism is that the original location of a user is perturbed to another interested location with a certain probability. The details are illustrated in the following.

We represent our LPPM by a function A, whose both input domain and output range are \(\mathcal {L}\). Given a predefined probability \(p\in (0,1)\), we perturb the original location \(l_i^{\tau }=L_j\in \mathcal {L}\) of user \(u_i\) in time slot \(t_{\tau }\) as follows,

$$\begin{aligned} \tilde{l}_i^{\tau } = A(l_i^{\tau },p)=\left\{ \begin{array}{ll} L_j,&{}\text {with probability } 1-p, \\ L_r \in \mathcal {L}\setminus \{L_j\},&{} \text {with probability } \frac{p}{m-1}. \end{array}\right. \end{aligned}$$

In Sect. 6, we prove that our location privacy preserving mechanism satisfies LDP. Note that although the location tags of users are perturbed, the platform can still extract accurate estimated values in different locations by applying the truth discovery method. Because the sensed value with a mismatched location will be assigned a low weight in truth discovery, and there could be less impact on the accuracy of the estimated result.

5.3 Value Privacy Preserving Mechanism (VPPM)

The trajectory of a user can be still inferred by the platform or adversaries, through comparing the sensed values collected by the user over time and the estimated true values obtained by truth discovery. Thus, besides perturbing locations of users, their sensed values should be sanitized as well. In this subsection, we propose a LDP-Gaussian-based VPPM to sanitize sensed values of users. The main idea of this mechanism is adding noise on sensed values to obtain a sanitized version of them, where the noise is sampled by users from their private Gaussian distributions. The details of this mechanism are illustrated as follows.

In each time slot \(t_{\tau }\), the platform first publishes a predefined parameter \(\lambda \) to all users, where \(\lambda \) is determined by specific privacy demands (i.e., privacy budget \(\epsilon _{2}\) and \(\delta \)). Then, each user \(u_i\) generates a private Gaussian distribution \(\mathcal {N}(0,\sigma _i^2)\) locally, where \(\sigma _i^2\) is sampled from the exponential distribution \(\mathcal {E}(\lambda )\), according to the parameter published by the platform. Finally, user \(u_i\) independently samples noise \(\zeta _i^{\tau }\) from his/her private Gaussian distribution and adds the noise on the sensed value. Summarily, letting function B denote the VPPM, the process can be formulated as

$$\begin{aligned} \tilde{v}_i^{\tau }= & {} B(v_i^{\tau },\sigma ^{2}_{i}) = v_i^{\tau }+\zeta _i^{\tau }, \\&\text {where } \zeta _i^{\tau } \sim \mathcal {N}(0,\sigma _i^2) \text { and } \sigma _i^2 \sim \mathcal {E}(\lambda ). \nonumber \end{aligned}$$
(4)

Intuitively, a larger value of parameter \(\lambda \) indicates a smaller expectation of \(\sigma _i^2\), which leads to a smaller expectation of noise added to sensed values and a lower privacy protection level accordingly.

So far, the locations and sensed values of users submitted to the platform are perturbed. Then, the platform can use the aforementioned truth discovery method to estimate the true values in different locations. Our privacy protection approach is summarized in Algorithm 1.

figure a

6 Theoretical Analysis

In the following, we theoretically analyze that both the location and value privacy preserving mechanisms satisfy LDP.

Theorem 1

Given a set of locations whose size is m, the LPPM with perturbation probability p satisfies \(\epsilon _{1}\)-local differential privacy, where \(\epsilon _{1}=\ln ({(1-p)(m-1)}/{p})\).

Proof

According to Eq. (3), for any two possible locations \(L_{j}\) and \(L_{r}\), the LPPM satisfies LDP if we could calculate the probability ratio \(\mathop {Pr}\{A(L_{j})= \tilde{l}_{i}^{\tau }\}/\mathop {Pr}\{A(L_{r})=\tilde{l}_{i}^{\tau }\}\) and find its maximum. Accordingly, the ratio is maximized when function A outputs perturbed location \(\tilde{l}_{i}^{\tau }\) which is identical to one of the input locations. Mathematically, when \(L_{j} \ne L_{r}\) and \(\tilde{l}_{i}^{\tau }=L_j\), the ratio reaches its maximum. Then we have,

$$\begin{aligned} \frac{\mathop {Pr}\{A(L_{j})= \tilde{l}_{i}^{\tau }\}}{\mathop {Pr}\{A(L_{r}) =\tilde{l}_{i}^{\tau }\}} \le \frac{\mathop {Pr}\{A(L_{j})=L_{j}\}}{\mathop {Pr}\{A(L_{r})=L_{j}\}}=\frac{1-p}{\frac{p}{m-1}} \end{aligned}$$

Thus, the LPPM satisfies \(\epsilon _{1}\)-LDP with \(\epsilon _{1}=\ln ({(1-p)(m-1)}/{p})\).

From Theorem 1, we can observe that when perturbation probability becomes larger or the size of location set becomes smaller, the value of \(\epsilon _{1}\) will become smaller, which indicates low level of privacy protection, vice versa.

In what follows, we present the theoretical analysis on the VPPM in each location at each time slot. We first introduce some parameters just for theoretical analysis. Generally, value \(v_{i}^{\tau }\) of sensing data collected by user \(u_{i}\) in location \(L_{j}\) follows Gaussian distribution \(\mathcal {N}(V^{\tau }_{j_{truth}},{\rho ^{\tau }_{j}}^{2})\) [29], where \(V^{\tau }_{j_{truth}}\) and \({\rho ^{\tau }_{j}}^{2}\) represent the ground truth and the error variance at \(L_{j}\), respectively. Then, we give the definition of L1-Sensitivity as follows.

Definition 2 (L1-Sensitivity)

L1-Sensitivity \(\varDelta ^{\tau }_{j}\) of a user in \(L_{j}\) at time slot \(t_{\tau }\) is defined as

$$\begin{aligned} \varDelta ^{\tau }_{j}=\max _{v^{\tau }_{i},\grave{v}^{\tau }_{i}\in \mathcal {D}^{\tau }_{j}}|v^{\tau }_{i}-\grave{v}^{\tau }_{i}|, \end{aligned}$$

where \(\mathcal {D}^{\tau }_{j}\) is the range of values that may be sensed by users in \(L_{j}\) at \(t_{\tau }\), and \(v^{\tau }_{i}\) and \(\grave{v}^{\tau }_{i}\) are two possible values of sensing data collected by \(u_{i}\).

Obviously, \(\varDelta ^{\tau }_{j}\) depends on \(\rho ^{\tau }_{j}\). We present the relation between the \(\rho ^{\tau }_{j}\) and \(\varDelta ^{\tau }_{j}\) in the following lemma.

Lemma 1

The value of sensitive information \(\varDelta ^{\tau }_{j}\) is smaller than \(a\sqrt{2}\rho ^{\tau }_{j}\) with probability at least \(1-\frac{1}{a}e^{\frac{-a^{2}}{2}}\), where \(a \ge 0\) and decided by the sensed values collected by the users.

Proof

According to the description mentioned, the error between sensed value \(v_{i}^{\tau }\) and \(V^{\tau }_{j_{truth}}\) follows Gaussian distribution \(\mathcal {N}(0,{\rho ^{\tau }_{j}}^{2})\), and \(v_{i}^{\tau }\sim \mathcal {N}(V^{\tau }_{j_{truth}},{\rho ^{\tau }_{j}}^{2})\). Hence, for any two possible values \(v^{\tau }_{i}\) and \(\grave{v}^{\tau }_{i}\) may sensed by \(u_{i}\), the difference between \(v^{\tau }_{i}\) and \(\grave{v}^{\tau }_{i}\) follows Gaussian distribution \(\mathcal {N}(0,2{\rho ^{\tau }_{j}}^{2})\). Based on the Gaussian tail bounds, we have,

$$\begin{aligned} Pr\{|v^{\tau }_{i}-\grave{v}^{\tau }_{i}|>a\sqrt{2}\rho ^{\tau }_{j}\}\le \frac{1}{a}e^{\frac{-a^{2}}{2}}, \end{aligned}$$
(5)

where \(a \ge 0\) and a is decided by values of sensing data collected by users. Thus, the lemma is proved.

Next, we take \(\varDelta ^{\tau }_{j}=a\sqrt{2}\rho _{j}^{\tau }\) to analyze the LDP property achieved by the VPPM.

Theorem 2

Given L1-Sensitivity \(\varDelta ^{\tau }_{j}\) and an exponential distribution with parameter with \(\lambda \), the VPPM is \(\left( \epsilon _{2},\delta \right) \)-local differential private, where \(\epsilon _{2}\ge \frac{{\varDelta ^{\tau }_{j}}^{2}}{2\sigma ^{2}_{i}}\) and \(\delta >1-e^{\frac{-\lambda {\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}}\).

Proof

According to Eq. (4), user \(u_{i}\) adopts VPPM to add noise sampled from Gaussian distribution \(\mathcal {N}(0,\sigma ^{2}_{i})\) on \(v^{\tau }_{i}\) to obtain sanitized value \(\tilde{v}^{\tau }_{i}\). Besides, noise variance \(\sigma ^{2}_{i}\) is sampled from an exponential distribution with parameter \(\lambda \). For any two possible sensed values \(v^{\tau }_{i}\) and \(\grave{v}^{\tau }_{i}\), we have,

$$\begin{aligned}&\frac{\mathop {Pr}\{B(v^{\tau }_{i},\sigma ^{2}_{i})=\tilde{v}^{\tau }_{i}\}}{\mathop {Pr}\{B(\grave{v}^{\tau }_{i},\sigma ^{2}_{i})=\tilde{v}^{\tau }_{i}\}} =\frac{\frac{1}{\sqrt{2\pi }\sigma _{i}}e^{-\frac{(\tilde{v}^{\tau }_{i}-v^{\tau }_{i})^2}{2\sigma ^{2}_{i}}}}{\frac{1}{\sqrt{2\pi }\sigma _{i}}e^{-\frac{(\tilde{v}^{\tau }_{i}-\grave{v}^{\tau }_{i})^2}{2\sigma ^{2}_{i}}}} \\= & {} e^{\frac{(\tilde{v}^{\tau }_{i}-\grave{v}^{\tau }_{i})^2-(\tilde{v}^{\tau }_{i}-v^{\tau }_{i})^2}{2\sigma ^{2}_{i}}} \le e^{\frac{(v^{\tau }_{i}-\grave{v}^{\tau }_{i})^2}{2\sigma ^{2}_{i}}} \le e^{\frac{{\varDelta ^{\tau }_{j}}^{2}}{2\sigma ^{2}_{i}}}\le e^{\epsilon _{2}} \nonumber \end{aligned}$$
(6)

According to Eq. (6), when \(\sigma ^{2}_{i}\ge \frac{{\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}\), mechanism B meets \(\epsilon _{2}\)-local differential privacy. As \(\sigma ^{2}_{i}\) follows the exponential distribution with parameter \(\lambda \), and we constrain the probability of event \(\{\sigma ^{2}_{i}:\sigma ^{2}_{i}\ge \frac{{\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}\}\) happens with at least \(1-\delta \). Thus, \(\mathop {Pr}\{\sigma ^{2}_{i}\ge \frac{{\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}\}=e^{-\frac{\lambda {\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}}\ge 1-\delta \). Therefore, \(\lambda \le \frac{2\epsilon _{2}ln(\frac{1}{1-\delta })}{{\varDelta ^{\tau }_{j}}^{2}}\).

Next we partition \(\mathbb {R}^{+}\), the domain of noise variance, as \(\mathbb {R}^{+}=R_1\cup R_2\), where \(R_1=\left\{ \sigma ^{2}_{i}\in \mathbb {R}^{+}: \sigma ^{2}_{i}\ge \frac{{\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}\right\} \) and \(R_2=\left\{ \sigma ^{2}_{i}\in \mathbb {R}^{+}: \sigma ^{2}_{i}\le \frac{{\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}\right\} \). For subset \(S_{1}\in \mathbb {S}\) and \(S_{2}\in \mathbb {S}\), where \(\mathbb {S}\) is the range of \(B(v^{\tau }_{i}, \sigma ^{2}_{i})\), we define \({S_{1}=\left\{ B(v^{\tau }_{i},\sigma ^{2}_{i})|\sigma ^{2}_{i}\in R_{1}\right\} }\) and \({S_{2}=\left\{ B(v^{\tau }_{i}, \sigma ^{2}_{i})|\sigma ^{2}_{i}\in R_{2}\right\} }\). Then we have,

$$\begin{aligned} \begin{aligned}&\mathop {Pr}\limits _{\sigma ^{2}_{i}\in \mathbb {R}^{+}}\{B(v^{\tau }_{i},\sigma ^{2}_{i})\in S\}\\&= \mathop {Pr}\limits _{\sigma ^{2}_{i}\in R_{1}}\{B(v^{\tau }_{i},\sigma ^{2}_{i})\in S_{1}\}+\mathop {Pr}\limits _{\sigma ^{2}_{i}\in R_{2}}\{B(v^{\tau }_{i},\sigma ^{2}_{i})\in S_{2}\}\\&\le \mathop {Pr}\limits _{\sigma ^{2}_{i}\in R_{1}}\{B(v^{\tau }_{i},\sigma ^{2}_{i})\in S_{1}\}+\delta \\&\le e^{\epsilon _{2}}(\mathop {Pr}\limits _{\sigma ^{2}_{i}\in R_{1}}\{B(\grave{v}^{\tau }_{i},\sigma ^{2}_{i})\in S_{1}\})+\delta \\&\le e^{\epsilon _{2}}(\mathop {Pr}\limits _{\sigma ^{2}_{i}\in \mathbb {R}^{+}}\left\{ B(\grave{v}^{\tau }_{i},\sigma ^{2}_{i})\in S\right\} )+\delta , \end{aligned} \end{aligned}$$

Thus, mechanism B yields \((\epsilon _{2},\delta )\)-local differential privacy, where \(\epsilon _{2}\ge \frac{{\varDelta ^{\tau }_{j}}^{2}}{2\sigma ^{2}_{i}}\) and \(\delta >1-e^{\frac{-\lambda {\varDelta ^{\tau }_{j}}^{2}}{2\epsilon _{2}}}\).

From Theorem 2, we can find that when \(\sigma ^{2}_{i}\) becomes larger, the lower bound of \(\epsilon _{2}\) becomes smaller. In addition, the lower bound of \(\delta \) will be smaller when the value of \(\lambda \) is smaller. When \(\epsilon _{2}\) and \(\delta \) have smaller values, higher privacy protection can be achieved.

7 Performance Evaluation

7.1 Simulation Setup

The default settings in our simulations are set as follows. We consider there is an urban area consists of 10 interested locations that need to monitor ambient noise, and the total number of users is 400. The sensed values of users are simulated by a Gaussian distribution \(\mathcal {N}(V^{\tau }_{j_{truth}},3)\), where \(V^{\tau }_{j_{truth}}\) represents the ground truth and is uniformly distributed in [20, 100]dB. We set perturbation probability p as 0.3 (i.e., \(\epsilon _{1}=3.004\)), privacy budget \(\epsilon _{2}\) as 0.7, and relaxation variable \(\delta =0.3\). Besides, the weight of each user are equally initialized to 1 at the beginning of each time slot.

We compare our proposed approach with two baselines:

  • No Privacy Protection(NPP): Each user submits the original sensing data. Then the estimated values obtained by truth discovery.

  • Original Location with Sanitized Value (OLSV): Each user submits the original locations and sanitized values obtained by value privacy preserving mechanism to the platform. Then the estimated values obtained by truth discovery.

  • Perturbed Location with Original Value (PLOV): Each user submits the perturbed locations obtained by location privacy preserving mechanism and original sensed values to the platform. Then the estimated values obtained by truth discovery.

  • Privacy Protection with Mean (PPM): Each user submits the perturbed locations and sanitized values obtained by our privacy preserving mechanisms to the platform. Then the estimated values obtained by taking the average of the sanitized values submitted by the user in each interested location.

In order to measure the performance achieved by different approaches, we first adopt the commonly used Mean Absolute Error (MAE) as our metric, which calculates the differences between ground truth and estimated values as

$$\begin{aligned} \mathrm {MAE}=\frac{1}{m}\sum _{j=1}^{m}\left| V_{j_{truth}}^{\tau }-\bar{V}_{j}^{\tau }\right| . \end{aligned}$$

The smaller values of MAE indicate that the perturbation and sanitization have little impact on the accuracy of estimated results for the monitored object in all interested locations.

Besides, we compare the average accuracy of estimated values under different settings, which is calculated as

$$\begin{aligned} Accuracy = \frac{1}{m}\sum _{j=1}^{m} \left( 1-\frac{|V_{j_{truth}}^{\tau }-\bar{V}_{j}^{\tau }|}{V_{j_{truth}}^{\tau }} \right) . \end{aligned}$$

7.2 Performance Evaluation

In the following, we take different numbers of interested locations into consideration to compare the MAE achieved by our privacy preserving approach and other baselines first. We plot the MAE and the accuracy achieved by five approaches when the number of locations varies from 5 to 25 in Fig. 3 and Fig. 6. It can be found that the MAE and the accuracy of our approach remain stable, which indicates that our approach is scalable to the amount of interested points. Specially, when our approach can achieve about \(94.61\%\) accuracy of estimated values, which is only \(8.13\%\), \(7.26\%\) and \(1.94\%\) lower than NPP, OLSV and PLOV but \(4.67\%\) higher than PMM, respectively.

Fig. 3.
figure 3

MAE vs. number of locations.

Fig. 4.
figure 4

MAE vs. number of users.

Fig. 5.
figure 5

MAE vs. sample range of ground truth.

Fig. 6.
figure 6

MAE vs. number of locations.

Fig. 7.
figure 7

MAE vs. number of users.

Fig. 8.
figure 8

MAE vs. sample range of ground truth.

As shown in Fig. 4 and Fig. 7, we evaluate the performance of five approaches, by varying the total number of users from 200 to 1000. It can be observed that the MAE and the accuracy achieved by our approach keeps stable regardless of the number of users, which indicates that our approach applies to a large-scale MCS system with plenty of users. Specially, when there are 600 users, our approach achieves \(91.68\%\) accuracy, which is only \(7.51\%\), \(6.36\%\) and \(3.41\%\) lower than NPP, OLSV and PLOV, respectively.

For further studying the performance of our privacy preserving mechanisms on estimation quality, we change the range of user sensed values, i.e., adjusting the range of \(V^{\tau }_{j_{truth}}\). The sampling interval of \(V^{\tau }_{j_{truth}}\) is [20, x] and we vary x from 30dB to 110dB. In Fig. 5 and Fig. 8, the MAE increases but the accuracy decreases, when x becomes larger. This is because as the sample range of \(V^{\tau }_{j_{truth}}\) increasing, sensed values of users becomes more diverse. Specifically, our approach can achieve about \(92.39\%\) average accuracy of estimated values when varies the sample range of \(V^{\tau }_{j_{truth}}\), which is only \(6.38\%\), \(4.82\%\) and \(1.02\%\) lower than NPP, OLSV and PLOV but \(2.97\%\) higher than PMM, respectively.

To summarize, although the performance of our approach is inevitably worse than NPP, OLSV and PLOV, our approach still achieves relatively high accuracy of estimated values and provides joint location-value privacy protection for users. Moreover, our approach always outperforms PMM, since we adopt a more reliable truth discovery method to eliminate the influence of unreliable or protected sensing data on the truth estimation.

8 Conclusion

In this work, we consider the joint location-value privacy protection problem in a MCS system with an untrusted platform, since not only location tags but also sensed values of users contained in their spatiotemporal sensing data will expose the privacy. Therefore, we propose a privacy protection approach, comprising of two privacy preserving mechanisms to perturb the locations and sensed values of users respectively. Specially, the LPPM is designed based on random response, and the VPPM is designed based on Gaussian mechanism. Both of the two mechanisms are proved to satisfy local differential privacy. Moreover, we conduct extensive simulations to show that the true values in interested locations can be accurately estimated based on perturbed locations and sanitized sensed values, by adopting the truth discovery method.