Protecting personal trajectories of social media users through differential privacy

doi:10.1016/j.cose.2017.02.002

Computers & Security

Volume 67, June 2017, Pages 142-163

https://doi.org/10.1016/j.cose.2017.02.002 Get rights and content

Abstract

Road traffic congestion is an important issue in modern cities, however most existing traffic jam identification solutions are based on expensive facilities such as sensors or transport probe infrastructure with high deployment and management costs. As a result of the cost, such solutions are not ubiquitously deployed. The extensive use of smart mobile devices furnished with location-based competencies and the global popularity of microblogging applications like Twitter offers an opportunity to tackle these problems. Twitter users can serve as human traffic sensors providing real-time reflections of current traffic situations. However, these data can contain extensive personal privacy information that demands privacy preserving mechanisms for the user location and their current trajectory. Differential privacy can ensure that degrees of privacy in these trajectories can be preserved whilst allowing data analysis and mining of the Twitter content. This paper proposes an innovative private trajectories release model and associated algorithms with differential privacy guarantees that considers both data privacy and data utility. This includes development of a private reference system for calibrating separate (raw) users trajectories across obfuscated anchor points; construction of privacy supporting noise-enhanced prefix trees to release synthesis data privately, and comprehensive evaluation of both the accuracy and utility of the solutions in terms of a set of evaluation metrics based on real-life tweets-based user trajectories across the city of Melbourne.

Introduction

With the extensive use of smart mobile devices furnished with location-based competencies and the global popularity of micro-blogging applications like Twitter, new possibilities now exist for collection of near real-time transport data. Tweets, especially generated by car drivers, passengers and other pedestrians near roads, can be regarded as indicators of real-time traffic conditions. This however raises privacy issues. It is vital to perform behavioural identification and pattern mining over trajectories to provide traffic monitoring information with individual privacy protected, particularly when it might be used for malicious purposes against Tweeters. Differential privacy (Dwork et al. 2006) has been broadly adopted to protect the privacy of sensitive data including location and trajectory data. It can rigorously protect the privacy of individual location and trajectory information whilst still providing synthesised data for analysis. Approaches that can limit the dangers of leaking location privacy would encourage more users to share their location information. However, when spatial temporal data in social media are released and used to aid in monitoring traffic congestion, there are several challenges that emerge due to the heterogeneity and complex nature of trajectories. Firstly, de Montjoye et al. (2013) showed that over 90% of trajectories can be identified using no more than 4 given locations. Furthermore, coarse and fine sample approaches lead to significantly different results. Consequently, it is crucial to propose privacy preserving trajectory publication solutions for social media under differential privacy mechanisms whilst supporting the use of such data more generally, e.g. for congestion identification.

The goal of this paper is to support a privacy supporting trajectory calibration and publication system leveraging differential privacy as the basis for social media location data obfuscation and protection. Existing research works on trajectories with privacy guarantees have several restrictions and drawbacks. They are imperfect for sampling raw trajectories using fixed time or space intervals. Most real life evaluations privacy approaches only address short trajectories across a small quantity of locations (e.g. metro stations) in restricted space areas and the reference systems adopted for sampling raw trajectories are often inefficient and without privacy guarantees. Chen et al. (2012a) and Chen et al. (2013) provided a non-interactive synthesis of trajectories (sequence) with differential privacy guarantees but restricted their solution to short sequences across metro stations. When dealing with larger scale space areas and more realistic scenarios, e.g. trajectories analysis over traffic in cities, this approach is ineffective. Moreover, it is insufficient to build static noisy prefix trees (modelling trajectories), since there is a need for differential privacy guarantees as the number of nodes in the tree increases. Chen et al. (2012a) used an n-gram approach to improve this, but this led to trajectory and sequence information loss. Su et al. (2013) proposed a trajectory calibrating solution to publish trajectories, but without privacy preserving mechanisms. The model proposed here adopts a new privacy and accuracy-preserving reference system to extract privacy-demanding feature-based anchors which can subsequently be used to calibrate sequences from raw trajectories. This provides a private trajectory data sensitisation approach that scales to large spatial domains reflecting realistic trajectories as it occurs over large-scale real life social media data. The main contributions of this work are: (1) development of a privacy preserving reference system that can be used for protecting arbitrary trajectories, (2) improved utility of synthesised data under differential privacy guarantees, and (3) support for a privacy-preserving trajectory data publication solution that can scale to large space-time domains that reflect realistic trajectories that arise in large-scale social networks.

The rest of this paper is structured as follows. Section 2 describes the related works with focus on those that explore the advantages of differential privacy compared to other approaches used for protecting trajectories. Section 3 introduces the preliminary concepts used in the work as well as providing an overview of the solution. Section 4 presents the Private Reference System adopted in the work. Section 5 presents the Private Trajectories Release that has been adopted. The threat model and assumptions are defined in Section 6 and privacy analysis is performed as well. Section 7 presents the experimental results of the privacy-preserved trajectories using Twitter data focused especially on traffic events identified through social media. Finally Section 8 draws conclusions on the work as a whole and outlines areas of future work.

Section snippets

Related work

Many solutions have been proposed to detect and estimate public transport issues in urban areas based on large-scale or small-scale trajectories data. These works have clearly demonstrated that whilst individuals' trajectories are highly heterogeneous, they can often be predicted. However publishing raw individual trajectories raises many privacy issues. Many of these issues are discussed in Xue et al. (2009).

Differential privacy presented in Dwork et al. (2006) has been broadly adopted to

Background

In this section, related preliminary concepts are given, followed by a solution overview. Major notations used in this paper are given in Table 1.

Private reference system

In this section, the method used to establish the private reference system (PRS) is described. Specifically, we focus on Tweet harvesting and location extraction followed by cluster-based anchor aggregation and perturbed release of information to build the reference system. Individual trajectories are aligned to this private reference system according to geometric strategies.

Private trajectories release

To protect the privacy trajectory sequences under differential privacy, we propose a private trajectory release solution (PTR) that divides noise-enhanced prefix tree construction and private release generation. Furthermore, we propose an effective post-processing strategy that inserts anchor points into aligned trajectories through evaluating these missing anchor points that may be passed by and revise the global directions. This adaptive noise-enhanced prefix tree construction algorithm (5.1

Threat model and assumptions

In this scenario, the adversary model is assumed as the dishonest party that is interested in analysis of published data to infer sensitive information through the application protocol. The server that maintains and publishes trajectories data is treated as the trusted third party, whose aim is to prevent the adversary from extracting additional information from the sanitised trajectories data by performing an analysis to estimate individual frequent trajectories pattern and sensitive locations.

Experimental evaluation

In this section, our PTCP is empirically evaluated using geo-spatially tagged tweets. We measure results both in utility and privacy. We describe our data harvesting and raw trajectories database construction in Section 7.1, followed by the utility evaluation metrics in Section 7.2. Finally, we perform these utility metrics on raw trajectories to obtain empirical results and analysis in Section 7.3.

Conclusions

In this paper, we proposed a Private Trajectories Calibration and Publication System (PTCP) under Differential Privacy, which can be used to release large-scale trajectories in social media (e.g. Twitter) with privacy guarantees and higher utility. Our PTCP approach mechanism proposes a differentially private reference system that enables sampling of discrete raw trajectories according to obfuscated feature-based GLIs (as anchor points). PTCP adopts a noisy calibrated trajectories (sequences)

Acknowledgment

We would like to thank the NeCTAR Research Cloud for the (free) use of the Cloud resources and the Melbourne eResearch Group for support on Twitter access, use and analysis.

Professor Richard O. Sinnott is Director of eResearch at the University of Melbourne and holds a Professorial Role in Applied Computer Systems. He was formerly technical director of the National e-Science Centre, UK; director of e-Science at the University of Glasgow. He has a PhD in Distributed Systems; a Master of Science in Software Engineering and a Bachelor of Science in Theoretical Physics (Hons). He has published over 200 peer-reviewed papers in conferences/journals across a wide range

References (24)

ChenR. et al.
Privacy-preserving trajectory data publishing by local suppression
Inf Sci (Ny)
(2013)
M.E. Andrés et al.
Geo-indistinguishability: differential privacy for location-based systems
(2013)
ChenL. et al.
Robust and fast similarity search for moving object trajectories
(2005)
ChenR. et al.
Differentially private trajectory data publication
(2011)
ChenR. et al.
Differentially private transit data publication: a case study on the Montreal transportation system
(2012)
ChenR. et al.
Differentially private sequential data publication via variable-length n-grams
(2012)
ChenZ. et al.
Discovering popular routes from trajectories
(2011)
G. Cormode et al.
Differentially private spatial decompositions
Y.-A. de Montjoye et al.
Unique in the crowd: the privacy bounds of human mobility
Sci Rep
(2013)
R. Dewri
Local differential perturbations: location privacy under approximate knowledge attackers
Mobile Comput, IEEE Trans on
(2013)

C. Dwork et al.

Calibrating noise to sensitivity in private data analysis

Theory Crypt

(2006)

M. Ester et al.

A density-based algorithm for discovering clusters in large spatial databases with noise

(1996)

Cited by (49)

DPTP-LICD: A differential privacy trajectory protection method based on latent interest community detection
2023, High-Confidence Computing
With the rapid development of high-speed mobile network technology and high-precision positioning technology, the trajectory information of mobile users has received extensive attention from academia and industry in the field of Location-based Social Networks. Researchers can mine users’ trajectories in Location-based Social Networks to obtain sensitive information, such as friendship groups, activity patterns, and consumption habits. Therefore, mobile users’ privacy and security issues have received growing attention in Location-based Social networks. It is crucial to strike a balance between privacy protection and data availability. This paper proposes a differential privacy trajectory protection method based on latent interest community detection (DPTP-LICD), ensuring strict privacy protection standards and user data availability. Firstly, based on the historical trajectory information of users, spatiotemporal constraint information is extracted to construct a potential community strength model for mobile users. Secondly, the latent interest community obtained from the analysis is used to identify preferred hot spots on the user’s trajectory, and their priorities are assigned based on a popularity model. A reasonable privacy budget is allocated to prevent excessive noise from being added and rendering the protected trajectory data unusable. Finally, to prevent privacy leakage, we add Laplace and exponential noise in generating preferred hot spots and recommending user interest points. Security and effectiveness analysis shows that our mechanism provides effective points of interest recommendations and protects users’ privacy from disclosure.
A location privacy protection method in spatial crowdsourcing
2022, Journal of Information Security and Applications
Citation Excerpt :
Unlike traditional crowdsourcing, the workers should travel to the specific location to complete the tasks [5], which means that the location privacy of the workers may be disclosed. Wang S et al. point out that 90% of the travelers can be re-identified with the help of no more than four location records [6]. It is urgent to design a more suitable location privacy protection scheme for SC.
Spatial crowdsourcing is widely used in our daily life, via applications such as DiDi, Uber. With the popularity of smart phone, this paradigm will be more and more popular. However, the popularity of crowdsourcing has increased concerns about the user’s privacy. Without adequate privacy protection, no one will accept the task of crowdsourcing. To address the problem above, a new location privacy protection method is proposed in this paper. The method proposed in this paper can not only protect the user’s location privacy, but also protect the crowdsourcing task’s location privacy. Compared with others, the success rate of task allocation is higher and the travel distance of crowdsourcing workers is shorter. First of all, the coordinates of the worker’s location are converted to polar coordinates, and the differential privacy transformation is performed on the location record of polar coordinates. Less noise is added to the polar radius, and more noise is added to the polar angle, which can improve the utility of the sanitized dataset. Finally, the crowdsourcing server allocates the tasks to the crowdsourcing workers according to the sanitized dataset. Experiments are conducted on two real-world datasets to verify its performance. The experimental results show that this method has the advantage of less information loss.
OPTDP: Towards optimal personalized trajectory differential privacy for trajectory data publishing
2022, Neurocomputing
Citation Excerpt :
Li et al. [46] proposed a new differential private trajectory data publishing algorithm, which includes bounded noise generation algorithm and trajectory merging algorithm, improving the practicability of trajectory data. Wang et al. [47] proposed an innovative privacy trajectory release model and associated the algorithm with different privacy guarantees, taking into account data privacy and data utility, and constructed a privacy-supporting noise-enhanced prefix tree for private release of composite data. Yang et al. [34] analyzed the privacy leakage problem of traditional DP under the continuous spatio-temporal data publishing setting, and proposed the ConTPL algorithm, which can resist the temporal correlations attack.
With the development of location-based applications, more and more trajectory data are collected. Trajectory data often contains users’ sensitive information, and direct release it may pose a threat to users’ privacy. Differential privacy, as a privacy preserving method with solid mathematical foundation, has been widely used in trajectory data publishing. However, current trajectory data publishing methods based on differential privacy cannot fully realize the personalized privacy protection. In this paper, an optimal personalized trajectory differential privacy mechanism is proposed. Firstly, by establishing the probabilistic mobility model of trajectories, we cluster the locations to achieve semantic location matching between different trajectories. Based on the semantic similarity, we identify the templet trajectory, and propose a privacy level allocation method based on stay-points and frequent sub-trajectories. Then, according to the location matching results, we can automatically identify the privacy level of all locations. Combined with the optimal location differential privacy mechanism, we disturb the location points on the user’s trajectory before publishing, where different location privacy levels correspond to different privacy budgets. Experiment results on real-world datasets show that our mechanism provides a better tradeoff between privacy protection and data utility compared with traditional differential privacy methods.
A personalized trajectory privacy protection method
2021, Computers and Security
Citation Excerpt :
After compression, the maximum trajectory length is 28 and the average length is 7.14. The average Euclidean distance called AVGED and the average edit distance on real sequence (Wang and Sinnott, 2017) called AVGEDR is used to evaluate the average distance error and the average similarity between original trajectories and protected trajectories in PTPP respectively, and show the effectiveness of PTPP algorithm. In order to verify the privacy protection performance of our proposed algorithm, we compared PTPP with the original algorithm based on planar Laplacian (Andrés et al., 2013) (PL), adaptive geo-indistinguishability algorithm (Al-Dhubhani and Cazalas, 2018) (Adaptive) and clustering geo-indistinguishability algorithm (Cunha et al., 2019) (Clustering).
Trajectory data of sports or activities are usually collected and shared into social apps like Wechat moments, Sina weibo in public to provide health services and recommendation, while a large number of friends with weak ties in social circle will cause privacy leakage of users’ locations and life habits. To solve the problem, a personalized trajectory privacy protection scheme based on relationship strength called PTPP is proposed, the location obfuscation algorithm based on noise radius limiting geo-indistinguishability and location clustering is explored. Not only is privacy protected, but privacy budgets are controlled in fine grain according to relationship strength between users. Meanwhile, a hybrid calculation model of social relationship strength called HCM is proposed, which combines clustering and BP neural network and improve the reasonableness of social relationship strength. Finally, the availability and security of the PTPP algorithm are analyzed in the application scenarios of social networks. Analysis and the experimental results show that the method proposed could evaluate the strength of the relationship between users effectively and achieve personalized trajectory privacy protection.
A differentially private location generalization approach to guarantee non-uniform privacy in moving objects databases
2021, Knowledge-Based Systems
Recently there has been much interest in moving objects databases because of their applications in many domains, such as location-based services and traffic management. Moving objects databases store and manage information representing changes in the spatial properties of moving objects over time. Meanwhile, privacy protection has been one of the most important concerns in these databases. In this paper, we study this problem by presenting DPLG, a location generalization approach for moving objects databases that preserves the strong guarantee of differential privacy. Our main goal is to guarantee non-uniform privacy for locations with different privacy protection requirements while being scalable for spatial domains with a large number of locations. For this purpose, we use location generalization in such a way that locations with higher privacy protection requirements are generalized to larger ones. Location generalization also has the advantage that it enables DPLG to reduce the number of locations and, thus, to keep the running time and space requirements as reasonable as possible. We also present two post-processing techniques, namely, consistency constraints enforcement and quality improvement, to have consistent query answers and to reduce query errors caused by location generalization. The quality improvement technique divides the noisy count of each generalized location among the reference locations it contains homogeneously or heterogeneously. Extensive experiments demonstrate that, in addition to keeping reasonable the running time and space requirements, DPLG improves the utility of query answers for locations with lower privacy protection requirements in comparison to those with higher privacy protection requirements while satisfying differential privacy.
Suppression techniques for privacy-preserving trajectory data publishing
2020, Knowledge-Based Systems
In this paper, we study the problem of protecting privacy in trajectory datasets from adversaries who can exploit their partial knowledge to infer unknown locations. To efficiently solve this problem, we propose a tree-based indexing structure to store all trajectory data and develop pruning strategies. We provide two algorithms to find a safe counterpart of the original trajectory dataset by using the pruning strategies. Finally, our experimental results demonstrate the efficiency of the proposed algorithms.

View all citing articles on Scopus

Mr Shuo Wang is a PhD at the University of Melbourne. His research interests are in the area of security and big data analytics on Cloud infrastructures.

View full text

Protecting personal trajectories of social media users through differential privacy

Abstract

Introduction

Section snippets

Related work

Background

Private reference system

Private trajectories release

Threat model and assumptions

Experimental evaluation

Conclusions

Acknowledgment

Inf Sci (Ny)

Geo-indistinguishability: differential privacy for location-based systems

Robust and fast similarity search for moving object trajectories

Differentially private trajectory data publication

Differentially private transit data publication: a case study on the Montreal transportation system

Differentially private sequential data publication via variable-length n-grams

Discovering popular routes from trajectories

Differentially private spatial decompositions

Unique in the crowd: the privacy bounds of human mobility

Sci Rep

Local differential perturbations: location privacy under approximate knowledge attackers

Mobile Comput, IEEE Trans on

Calibrating noise to sensitivity in private data analysis

Theory Crypt

A density-based algorithm for discovering clusters in large spatial databases with noise