BES: Differentially private event aggregation for large-scale IoT-based systems

https://doi.org/10.1016/j.future.2018.07.026Get rights and content

Highlights

  • Maximize differential privacy’s utility based on statistics’ aggregation parameters.

  • Streaming-based prototype on Apache Flink for fast distributed online analysis.

  • Evaluation with real-world AMI data showing excellent accuracy and performance.

  • Show differentially-private aggregation can protect from de-anonymization attacks.

Abstract

The emergence of Internet of Things (IoT) offers many advantages, but it also raises significant challenges with respect to efficient and distributed processing of large data and also privacy concerns related to large data disclosure.

We investigate the above problems from a system-perspective and study how differential privacy can be used to complement other privacy-enhancing technologies to allow for controlled large data disclosure. We present a streaming-based framework, Bes, where we leverage the often distributed nature of typical IoT systems for efficient computation of differentially private aggregates. We also propose methods to limit the noise that is commonly introduced for differential privacy in real-world applications, by bounding the outliers based on (differentially private) parameters of the actual system at hand or data from other similar systems.

We also provide a thorough evaluation based on a fully implemented Bes prototype using real-world data from of a concrete IoT system, namely an Advanced Metering Infrastructure (AMI). We show how a large number of events can be aggregated in a private fashion with low processing latency, even when the processing is made by a single-board device, with similar capabilities to the devices deployed in AMIs. Moreover, by implementing a de-pseudonymization attack known from the literature, we also show the strong complementary protection offered by Bes’ differentially private aggregation, compared to other privacy-enhancing technologies.

Introduction

The emergence of the Internet of Things (IoT), with ubiquitous sensors that can measure, compute and communicate with peers, seems to bring advantages to a number of diverse systems ranging from the larger smart city to more constrained environmental sensor networks to support, for example, ecological research [1]. One of the pillars of IoT is arguably the sharing of the collected measurements, either as singular values, as the measurements take place, or as an aggregate over time or over a physical area. This data can then be used to better understand the system under study, or to allow for fine-grained control to increase the system efficiency or save resources.

Even though there are clear advantages of sharing such frequent measurements, there may also be issues with confidentiality and privacy depending on the type of system under study. Take the electrical grid, specifically the Advanced Metering Infrastructure (AMI), as an example. New smart meters have the ability to perform high-frequency measurements of electrical properties at the end points (consumers) of the distribution grid, thus offering support to provide better quality of delivered energy as well as enhanced energy management (e.g., energy consumption forecast or demand–response applications). However, these fine-grained measurements also reveal details of the residents, such as daily routines, what type of appliances they have and when they are used [2]. Researchers have demonstrated that it can be possible to even discover what TV channel is being watched [3].

What makes IoT, and AMI as a specific case study, interesting is that many times the collected data should be shared with possibly untrusted third parties or publicly to allow for the best system performance. However, as such, mature mechanisms such as encryption become less effective. Even though values can be protected during communication, the goal is for controlled disclosure where statistics computed over sensitive data are intentionally shared with the public. Earlier research has suggested privacy-enhancing techniques of such datasets by using anonymization or pseudo-anonymization, but others have demonstrated that there is a high risk that an adversary can break such protection mechanisms and obtain privacy-sensitive data [4].

In the prevention of privacy leaks in such a context, differential privacy [5] has gained compelling interest because of its protection guarantees against strong adversaries. In a nutshell, a method guaranteeing differential privacy can sacrifice the accuracy of a statistic by adding an appropriate level of noise to it, thus preventing the adversary from gaining knowledge about the individual values contributing to the statistic. Although it offers strong guarantees, the adoption of differential privacy in real-world applications is limited by the introduced noise, which might compromise its utility.

Challenges. The following challenges in regards to controlled data disclosure, as well as accuracy and processing-efficiency, must thus be addressed for such infrastructures to reach their potential:

  • (i)

    preservation of the privacy of the individuals behind such statistics,

  • (ii)

    maximization of the statistics’ utility given the accuracy degradation introduced by differential privacy, and

  • (iii)

    efficient (in throughput) and distributed computation of such statistics, leveraging the heterogeneous embedded devices composing AMIs, coping with the data volumes they generate continuously and providing statistics in a real-time fashion.

Based on these observations, the following question steered our research: how can we achieve a controlled disclosure of statistics in an IoT system, using differential privacy to complement other suggested privacy-enhancing technologies (PETs), while (i) minimizing the degradation from differential privacy and (ii) computing such statistics in an efficient and distributed fashion?

Contributions.

We present Bes,1 a framework that enables efficient, differential-privacy-conforming processing, possible to deploy on existing infrastructures; moreover, Bes conducts differentially private streaming aggregation (summation) measurements with an error that is both small and tunable. To achieve these in Bes, we propose a decomposition of the problem in two subproblems: (i) calibrating the privacy-preserving noise by computing approximated sums in a differentially private streaming fashion and (ii) making the approximation process itself differentially private. Simply put, the first part of the problem is about how to adjust the added noise while minimizing its disruptiveness, and the second half is about making the adjustment process differentially-private. We will present these two subproblems in detail in Sections 3 The bounding mechanism: privacy and utility trade-offs, 4 Computing differentially private sums in a streaming fashion. This decomposition admits efficient methods for solving each of the two subproblems and combines into a low-error, efficient differentially private computation.

The guarantees of mechanisms preserving privacy through encryption (e.g., mechanisms encrypting the readings transmitted by users or homomorphic encryption schemes that could allow for such readings to be aggregated directly out of the ciphertext by untrusted applications) may not hold when aggregated data is intentionally shared with the rest of the world, as the aggregator may corroborate the decrypted values with external information and attempt to perform de-anonymization [6] . For that reason, we build the protection in Bes on the foundations of differential privacy. In summary, we make the following contributions:

  • (i)

    We provide a method that, based on the trade-offs between the utility maximization and the privacy preservation of data, can maximize the utility of a differentially private statistic by controlling its aggregation parameters. From a broader perspective, this allows for differential privacy to be practically leveraged in existing IoT systems such as AMIs.

  • (ii)

    We provide a streaming-based design and implementation of Bes based on Apache Flink [7], a state-of-the-art stream processing engine, which allows for fast distributed online analysis.

  • (iii)

    We provide a thorough evaluation, based on a real prototype and conducted with events collected from a real-world advanced metering infrastructure. As shown, Bes allows for accurate privacy-preserving aggregation (with errors that can be lower than 10%) of thousands of events per second, even on inexpensive single-board devices representative of the ones used in IoT systems, such as AMIs.

  • (iv)

    We show that differentially-private aggregation can protect against a previously published de-anonymization attack, stressing the importance of combining different preserving enhancing techniques in order to protect customers’ energy data.

The rest of the paper is organized as follows. We introduce the system model, the problem statement and differential privacy in Section 2. We present Bes’ differentially private aggregation mechanism in Section 3. We overview data streaming aggregation and the implementation of Bes in Section 4. In Section 5, we evaluate the system based on three perspective: utility degradation, performance, and reduction of the de-anonymization attack efficiency using our proposed scheme as compared to previous results. Section 6 discusses the related work while Section 7 concludes the paper, discussing also possible continuations that can build on the results of this paper.

Section snippets

System model and problem description

As mentioned in the previous section, our goals are (i) to provide a method that maximizes the utility of differentially private data aggregations and provides efficient, distributed, continuous (streaming) processing capabilities and (ii) to study its influence on the effectiveness of other privacy enhancing methods. In this section we first introduce the system model to define the scope of our study. Even though we use a specific IoT instantiation, the AMI, our results and the Bes processing

The bounding mechanism: privacy and utility trade-offs

We show in this section how a bounding mechanism, as we suggest for our proposed method, Bes, can improve the utility, by reducing the degradation caused by the noise added to enforce differential privacy. As we anticipated in Section 1, we enforce differential privacy while preventing the noise from being disruptive by decomposing the problem in two subproblems. In the next section, Section 4, we present the first half which covers computing approximate sums in a differentially private

Computing differentially private sums in a streaming fashion

One of the motivations of our proposed method, Bes, is to provide scalable and distributed online event aggregation (Section 1). In particular, we aim for an efficient, streaming-based algorithmic implementation; as such, Bes could run by heterogeneous embedded devices composing the Advanced Metering Infrastructure, enabling for event aggregation to happen closer to the sources (i.e., processing smart meters’ events in a distributed fashion) and for live information to be leveraged in

Experimental evaluation

In this section we evaluate how Bes can be leveraged to maximize the utility of differentially private streaming aggregation and, in support of other privacy preserving techniques, to prevent attackers from inferring information even when the latter comes from e.g., anonymized data. First, we present how Bes’ parameters bound B, window size WS and window advance WA can be chosen to maximize the aggregation utility (i.e., to minimize the MAPE), as discussed in Sections 3 The bounding mechanism:

Related work

As discussed in Section 1, the ever-increasing volumes of events generated by large scale IoTs do not only raise challenges with respect to their secure, efficient and distributed processing of large data [[32], [33]], but also raise privacy concerns, especially when events carry information about human beings. To give evidence of the broad scope of this research thread, it should be noticed that complementary IoT research has focused on (continuous) analysis of evolutionary streams of binary

Discussion and conclusions

In this paper, we take a system perspective and present Bes, an IoT framework whose goal is the aggregation of large-scale measurements in a privacy-preserving, efficient and distributed fashion.

Focusing on the advanced metering infrastructures, we used real-world data to study how privacy-preserving calibrated noise can be added to aggregated energy consumption measurements in order for them to be differentially private. At the same time, we showed how the error introduced by such noise can be

Acknowledgments

The work has primarily been supported by the European Commission Seventh Framework Programme (FP7/2007–2013) through the FP7-SEC-285477-CRISALIS project, as well as by the SysSec Project under grant agreement 257007, by the Swedish Civil Contingencies Agency (MSB) through the project RICS, by the collaboration framework of Chalmers Energy Area of Advance project “SN7: Algorithms for Adaptiveness and Robustness in Electricity Networks”, by the Swedish Foundation for Strategic Research under the

Valentin Tudor is working as an Experienced Researcher in IoT Technologies at Ericsson Research, Stockholm, Sweden. He holds an MS and PhD in Computer Science and Engineering from Chalmers University of Technology. His research main focus is on the security and privacy dimensions of modern cyber–physical systems such as advanced metering infrastructures and smart grids. The work presented in this paper was conducted while Valentin Tudor was working at Department of Computer Science and

References (66)

  • GulisanoV. et al.

    Online and scalable data validation in advanced metering infrastructures

  • DworkC. et al.

    Calibrating noise to sensitivity in private data analysis

  • McSherryF.

    Privacy integrated queries

  • MármolF. et al.

    Do not snoop my habits: preserving privacy in the smart grid

    IEEE Commun. Mag.

    (2012)
  • RottondiC. et al.

    A data pseudonymization protocol for smart grids

  • EfthymiouC. et al.

    Smart grid privacy via anonymization of smart metering data

  • BohliJ.M. et al.

    A privacy model for smart metering

  • WangS. et al.

    A randomized response model for privacy preserving smart metering

    Smart Grid IEEE Trans.

    (2012)
  • JawurekM. et al.

    SoK: privacy technologies for smart gridsa survey of options

  • TudorV. et al.

    A study on data de-pseudonymization in the smart grid

  • ÁcsG. et al.

    I have a DREAM!: differentially private smart metering

  • GulisanoV. et al.

    BES: differentially private and distributed event aggregation in advanced metering infrastructures

  • CaoJ. et al.

    Efficient and accurate strategies for differentially-private sliding window queries

  • GulisanoV. et al.

    Efficient data streaming multiway aggregation through concurrent algorithmic designs and new abstract data types

    ACM Trans. Parallel Comput.

    (2017)
  • GulisanoV. et al.

    Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join

  • JiY. et al.

    Quality-driven processing of sliding window aggregates over out-of-order data streams

  • BotevV. et al.

    Detecting non-technical energy losses through structural periodic patterns in AMI data

  • Odroid-XU4, 2016....
  • DayW.Y. et al.

    Differentially private publishing of high-dimensional data using sensitivity control

  • TudorV. et al.

    Employing Private Data in AMI Applications: Short Term Load Forecasting Using Differentially Private Aggregated Data

  • CabralJ.E. et al.

    Fraud detection system for high and low voltage electricity consumers based on data mining

  • ChenX. et al.

    New publicly verifiable databases with efficient updates

    IEEE Trans. Dependable Secure Comput.

    (2015)
  • LiJ. et al.

    Securely outsourcing attribute-based encryption with checkability

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • Cited by (0)

    Valentin Tudor is working as an Experienced Researcher in IoT Technologies at Ericsson Research, Stockholm, Sweden. He holds an MS and PhD in Computer Science and Engineering from Chalmers University of Technology. His research main focus is on the security and privacy dimensions of modern cyber–physical systems such as advanced metering infrastructures and smart grids. The work presented in this paper was conducted while Valentin Tudor was working at Department of Computer Science and Engineering, Chalmers University, Gothenburg, Sweden.

    Vincenzo Gulisano is an Assistant Professor in the Networks and Systems Division at Chalmers University of Technology. His research focuses on data processing and distributed / parallel / elastic and fault-tolerant data streaming. Dr. Vincenzo Gulisano holds a Ph.D. in Computer Science from the Polytechnic University of Madrid, Spain.

    Magnus Almgren is an Associate professor in cyber–physical systems at Chalmers investigating security properties of systems with a large societal impact. Dr. Almgren has been a Fulbright Scholar and holds an MS in Engineering Physics from Uppsala University, an MS in Computer Science with distinction in research from Stanford University, and a PhD in Computer Science from Chalmers University of Technology. His expertise lies in cyber security and privacy, and he has worked on application-based intrusion detection systems (IDS), reasoning about conflicting information from several detectors in a larger system, security and privacy of cyber–physical systems and internet-of-things.

    Marina Papatriantafilou is an Associate professor and co-head of the Distributed Computing Systems group at Chalmers University of Technology. She holds a PhD in Computer Engineering and Informatics from Patras University. She has also been with the National Research Institute for Mathematics and Computer Science, Amsterdam (CWI), and at the Max-Planck Institute for Computer Science, Saarbrucken. Her research is focused on robust and efficient distributed algorithms and their applications in multiprocessor/multicore systems and network-based distributed systems, consistency and fine-grain synchronization, including data-stream/big-data processing, efficient processing of varying volumes of data; fault-tolerance in multicore/distributed systems, cyber–physical systems, and digitalization.

    Named after the Egyptian deity protector of households. Some preliminary results have been presented in Proceedings of the 2nd ACM International Workshop on Cyber–Physical System Security. ACM, 2016 (Gulisano et al. 2016).

    View full text