Elsevier

Computers & Security

Volume 67, June 2017, Pages 142-163
Computers & Security

Protecting personal trajectories of social media users through differential privacy

https://doi.org/10.1016/j.cose.2017.02.002Get rights and content

Abstract

Road traffic congestion is an important issue in modern cities, however most existing traffic jam identification solutions are based on expensive facilities such as sensors or transport probe infrastructure with high deployment and management costs. As a result of the cost, such solutions are not ubiquitously deployed. The extensive use of smart mobile devices furnished with location-based competencies and the global popularity of microblogging applications like Twitter offers an opportunity to tackle these problems. Twitter users can serve as human traffic sensors providing real-time reflections of current traffic situations. However, these data can contain extensive personal privacy information that demands privacy preserving mechanisms for the user location and their current trajectory. Differential privacy can ensure that degrees of privacy in these trajectories can be preserved whilst allowing data analysis and mining of the Twitter content. This paper proposes an innovative private trajectories release model and associated algorithms with differential privacy guarantees that considers both data privacy and data utility. This includes development of a private reference system for calibrating separate (raw) users trajectories across obfuscated anchor points; construction of privacy supporting noise-enhanced prefix trees to release synthesis data privately, and comprehensive evaluation of both the accuracy and utility of the solutions in terms of a set of evaluation metrics based on real-life tweets-based user trajectories across the city of Melbourne.

Introduction

With the extensive use of smart mobile devices furnished with location-based competencies and the global popularity of micro-blogging applications like Twitter, new possibilities now exist for collection of near real-time transport data. Tweets, especially generated by car drivers, passengers and other pedestrians near roads, can be regarded as indicators of real-time traffic conditions. This however raises privacy issues. It is vital to perform behavioural identification and pattern mining over trajectories to provide traffic monitoring information with individual privacy protected, particularly when it might be used for malicious purposes against Tweeters. Differential privacy (Dwork et al. 2006) has been broadly adopted to protect the privacy of sensitive data including location and trajectory data. It can rigorously protect the privacy of individual location and trajectory information whilst still providing synthesised data for analysis. Approaches that can limit the dangers of leaking location privacy would encourage more users to share their location information. However, when spatial temporal data in social media are released and used to aid in monitoring traffic congestion, there are several challenges that emerge due to the heterogeneity and complex nature of trajectories. Firstly, de Montjoye et al. (2013) showed that over 90% of trajectories can be identified using no more than 4 given locations. Furthermore, coarse and fine sample approaches lead to significantly different results. Consequently, it is crucial to propose privacy preserving trajectory publication solutions for social media under differential privacy mechanisms whilst supporting the use of such data more generally, e.g. for congestion identification.

The goal of this paper is to support a privacy supporting trajectory calibration and publication system leveraging differential privacy as the basis for social media location data obfuscation and protection. Existing research works on trajectories with privacy guarantees have several restrictions and drawbacks. They are imperfect for sampling raw trajectories using fixed time or space intervals. Most real life evaluations privacy approaches only address short trajectories across a small quantity of locations (e.g. metro stations) in restricted space areas and the reference systems adopted for sampling raw trajectories are often inefficient and without privacy guarantees. Chen et al. (2012a) and Chen et al. (2013) provided a non-interactive synthesis of trajectories (sequence) with differential privacy guarantees but restricted their solution to short sequences across metro stations. When dealing with larger scale space areas and more realistic scenarios, e.g. trajectories analysis over traffic in cities, this approach is ineffective. Moreover, it is insufficient to build static noisy prefix trees (modelling trajectories), since there is a need for differential privacy guarantees as the number of nodes in the tree increases. Chen et al. (2012a) used an n-gram approach to improve this, but this led to trajectory and sequence information loss. Su et al. (2013) proposed a trajectory calibrating solution to publish trajectories, but without privacy preserving mechanisms. The model proposed here adopts a new privacy and accuracy-preserving reference system to extract privacy-demanding feature-based anchors which can subsequently be used to calibrate sequences from raw trajectories. This provides a private trajectory data sensitisation approach that scales to large spatial domains reflecting realistic trajectories as it occurs over large-scale real life social media data. The main contributions of this work are: (1) development of a privacy preserving reference system that can be used for protecting arbitrary trajectories, (2) improved utility of synthesised data under differential privacy guarantees, and (3) support for a privacy-preserving trajectory data publication solution that can scale to large space-time domains that reflect realistic trajectories that arise in large-scale social networks.

The rest of this paper is structured as follows. Section 2 describes the related works with focus on those that explore the advantages of differential privacy compared to other approaches used for protecting trajectories. Section 3 introduces the preliminary concepts used in the work as well as providing an overview of the solution. Section 4 presents the Private Reference System adopted in the work. Section 5 presents the Private Trajectories Release that has been adopted. The threat model and assumptions are defined in Section 6 and privacy analysis is performed as well. Section 7 presents the experimental results of the privacy-preserved trajectories using Twitter data focused especially on traffic events identified through social media. Finally Section 8 draws conclusions on the work as a whole and outlines areas of future work.

Section snippets

Related work

Many solutions have been proposed to detect and estimate public transport issues in urban areas based on large-scale or small-scale trajectories data. These works have clearly demonstrated that whilst individuals' trajectories are highly heterogeneous, they can often be predicted. However publishing raw individual trajectories raises many privacy issues. Many of these issues are discussed in Xue et al. (2009).

Differential privacy presented in Dwork et al. (2006) has been broadly adopted to

Background

In this section, related preliminary concepts are given, followed by a solution overview. Major notations used in this paper are given in Table 1.

Private reference system

In this section, the method used to establish the private reference system (PRS) is described. Specifically, we focus on Tweet harvesting and location extraction followed by cluster-based anchor aggregation and perturbed release of information to build the reference system. Individual trajectories are aligned to this private reference system according to geometric strategies.

Private trajectories release

To protect the privacy trajectory sequences under differential privacy, we propose a private trajectory release solution (PTR) that divides noise-enhanced prefix tree construction and private release generation. Furthermore, we propose an effective post-processing strategy that inserts anchor points into aligned trajectories through evaluating these missing anchor points that may be passed by and revise the global directions. This adaptive noise-enhanced prefix tree construction algorithm (5.1

Threat model and assumptions

In this scenario, the adversary model is assumed as the dishonest party that is interested in analysis of published data to infer sensitive information through the application protocol. The server that maintains and publishes trajectories data is treated as the trusted third party, whose aim is to prevent the adversary from extracting additional information from the sanitised trajectories data by performing an analysis to estimate individual frequent trajectories pattern and sensitive locations.

Experimental evaluation

In this section, our PTCP is empirically evaluated using geo-spatially tagged tweets. We measure results both in utility and privacy. We describe our data harvesting and raw trajectories database construction in Section 7.1, followed by the utility evaluation metrics in Section 7.2. Finally, we perform these utility metrics on raw trajectories to obtain empirical results and analysis in Section 7.3.

Conclusions

In this paper, we proposed a Private Trajectories Calibration and Publication System (PTCP) under Differential Privacy, which can be used to release large-scale trajectories in social media (e.g. Twitter) with privacy guarantees and higher utility. Our PTCP approach mechanism proposes a differentially private reference system that enables sampling of discrete raw trajectories according to obfuscated feature-based GLIs (as anchor points). PTCP adopts a noisy calibrated trajectories (sequences)

Acknowledgment

We would like to thank the NeCTAR Research Cloud for the (free) use of the Cloud resources and the Melbourne eResearch Group for support on Twitter access, use and analysis.

Professor Richard O. Sinnott is Director of eResearch at the University of Melbourne and holds a Professorial Role in Applied Computer Systems. He was formerly technical director of the National e-Science Centre, UK; director of e-Science at the University of Glasgow. He has a PhD in Distributed Systems; a Master of Science in Software Engineering and a Bachelor of Science in Theoretical Physics (Hons). He has published over 200 peer-reviewed papers in conferences/journals across a wide range

References (24)

  • ChenR. et al.

    Privacy-preserving trajectory data publishing by local suppression

    Inf Sci (Ny)

    (2013)
  • M.E. Andrés et al.

    Geo-indistinguishability: differential privacy for location-based systems

    (2013)
  • ChenL. et al.

    Robust and fast similarity search for moving object trajectories

    (2005)
  • ChenR. et al.

    Differentially private trajectory data publication

    (2011)
  • ChenR. et al.

    Differentially private transit data publication: a case study on the Montreal transportation system

    (2012)
  • ChenR. et al.

    Differentially private sequential data publication via variable-length n-grams

    (2012)
  • ChenZ. et al.

    Discovering popular routes from trajectories

    (2011)
  • G. Cormode et al.

    Differentially private spatial decompositions

  • Y.-A. de Montjoye et al.

    Unique in the crowd: the privacy bounds of human mobility

    Sci Rep

    (2013)
  • R. Dewri

    Local differential perturbations: location privacy under approximate knowledge attackers

    Mobile Comput, IEEE Trans on

    (2013)
  • C. Dwork et al.

    Calibrating noise to sensitivity in private data analysis

    Theory Crypt

    (2006)
  • M. Ester et al.

    A density-based algorithm for discovering clusters in large spatial databases with noise

    (1996)
  • Cited by (49)

    • A location privacy protection method in spatial crowdsourcing

      2022, Journal of Information Security and Applications
      Citation Excerpt :

      Unlike traditional crowdsourcing, the workers should travel to the specific location to complete the tasks [5], which means that the location privacy of the workers may be disclosed. Wang S et al. point out that 90% of the travelers can be re-identified with the help of no more than four location records [6]. It is urgent to design a more suitable location privacy protection scheme for SC.

    • OPTDP: Towards optimal personalized trajectory differential privacy for trajectory data publishing

      2022, Neurocomputing
      Citation Excerpt :

      Li et al. [46] proposed a new differential private trajectory data publishing algorithm, which includes bounded noise generation algorithm and trajectory merging algorithm, improving the practicability of trajectory data. Wang et al. [47] proposed an innovative privacy trajectory release model and associated the algorithm with different privacy guarantees, taking into account data privacy and data utility, and constructed a privacy-supporting noise-enhanced prefix tree for private release of composite data. Yang et al. [34] analyzed the privacy leakage problem of traditional DP under the continuous spatio-temporal data publishing setting, and proposed the ConTPL algorithm, which can resist the temporal correlations attack.

    • A personalized trajectory privacy protection method

      2021, Computers and Security
      Citation Excerpt :

      After compression, the maximum trajectory length is 28 and the average length is 7.14. The average Euclidean distance called AVGED and the average edit distance on real sequence (Wang and Sinnott, 2017) called AVGEDR is used to evaluate the average distance error and the average similarity between original trajectories and protected trajectories in PTPP respectively, and show the effectiveness of PTPP algorithm. In order to verify the privacy protection performance of our proposed algorithm, we compared PTPP with the original algorithm based on planar Laplacian (Andrés et al., 2013) (PL), adaptive geo-indistinguishability algorithm (Al-Dhubhani and Cazalas, 2018) (Adaptive) and clustering geo-indistinguishability algorithm (Cunha et al., 2019) (Clustering).

    View all citing articles on Scopus

    Professor Richard O. Sinnott is Director of eResearch at the University of Melbourne and holds a Professorial Role in Applied Computer Systems. He was formerly technical director of the National e-Science Centre, UK; director of e-Science at the University of Glasgow. He has a PhD in Distributed Systems; a Master of Science in Software Engineering and a Bachelor of Science in Theoretical Physics (Hons). He has published over 200 peer-reviewed papers in conferences/journals across a wide range of computing science areas with specific focus over the last ten years in supporting communities demanding finer-grained access control (security).

    Mr Shuo Wang is a PhD at the University of Melbourne. His research interests are in the area of security and big data analytics on Cloud infrastructures.

    View full text