Elsevier

Information Fusion

Volume 76, December 2021, Pages 298-314
Information Fusion

Differentially private data fusion and deep learning Framework for Cyber–Physical–Social Systems: State-of-the-art and perspectives

https://doi.org/10.1016/j.inffus.2021.04.017Get rights and content

Highlights

  • We review the design requirements of differential privacy for CPSS applications.

  • We review the data fusion in CPSS, differentially private deep learning in CPSS.

  • We propose Differentially Private Data Fusion and Deep Learning Framework for CPSS.

  • We discuss the future research directions for CPSSs.

Abstract

The modern technological advancement influences the growth of the cyber–physical system and cyber–social system to a more advanced computing system cyber–physical–social system (CPSS). Therefore, CPSS leads the data science revolution by promoting tri-space information resource from a single space. The establishment of CPSSs increases the related privacy concerns. To provide privacy on CPSSs data, various privacy-preserving schemes have been introduced in the recent past. However, technological advancement in CPSSs requires the modifications of previous techniques to suit its dynamics. Meanwhile, differential privacy has emerged as an effective method to safeguard CPSSs data privacy. To completely comprehend the state-of-the-art developments and learn the field’s research directions, this article provides a comprehensive review of differentially private data fusion and deep learning in CPSSs. Additionally, we present a novel differentially private data fusion and deep learning Framework for Cyber–Physical–Social Systems , and various future research directions for CPSSs.

Introduction

Cyber–Physical–Social Systems(CPSS), as a computing system originates from the technological growth of the CPS (Cyber–Physical System) and CSS (Cyber–Social System) and provides a new platform of growth for pervasive computing [1]. Certainly, CPS provides a mapping of physical space objects into cyber space, whereas CSS provides a mapping of social space objects into cyber space [2]. Additionally, social behaviors and interactions are encompassed into the CPS because of the ever-increasing human-centric computing [3], [4]. With the technological advancement in information, computing and communication, CPSS has proved to be among the fundamental shifts to deliver personalized and proactive human being services through meeting peoples social interaction demands and their reaction to the physical environment [3], [5]. Therefore, helping to offer people from various communities improved customized services. Additional introductory details related to CPSS concepts can be referred in [6], [7].

The ever-increasing popularity of CPSS implementation leads to the generation of a large scale of heterogeneous data from numerous sources, such as user response, sensor observation, social network, and measuring instrument. Data partaking and association proves to be a pivotal component to sustain the quality of services in a pervasive environment like smart homes and societies [8]. Thus, through data fusion, CPSS provides full integration of CPS and CSS capabilities by fusing numerous data from physical space, cyber space, and social space to enhances the quality of data henceforth promoting quality decision making [3].

Typically, the usual data fusion processes fuse data originating from physical space. However, CPSS data exists in three spaces (i.e, cyber, physical and social spaces), which makes it a challenging task to merge the data. Precisely, all facets of data in social networks should be taken into account, which implies data fusion in CPSS needs algorithms and data processing techniques that are sophisticated [1], [6], [7], [9]. In this perspective, a human in a data fusion based system becomes both a data provider as well as a data consumer, which provides valued inferences that are missing from the customary physical sensors [3], [10].

Deep learning represents a promising method for precise mining of information in CPSS. The last few years have seen the significant achievement of deep learning methods in numerous machine learning/data mining tasks ranging from data analytics, and autonomous systems to signal and information processing [16]. The success greatly depends on the substantial collection of CPSS data making deep learning widely accepted in healthcare applications such as phenotype extortion, critical disease development prediction, and others [17], [18]. Therefore, there is an obvious danger posed to the privacy of deep learning models made on users sensitive data, like healthcare records. For instance, [19] proves that the private information of an individual in the training dataset can be recuperated by repetitively querying the output probabilities of a classifier for disease recognition built upon a CNN. Existing privacy worries discourage users from partaking their data and thereby hinder the future growth of deep learning itself.

However, the architectural complexity of CPSS makes the assessment of privacy threats challenging, and new privacy concerns arise [20]. Moreover, CPSSs depends on numerous data centers and sensors comprising of a vast amount of private and personal data. For instance, wearable patient devices are reporting real-time data continuously to doctors for consultation [21]. However, without proper privacy preservation approach, individual privacy of CPSS users may be compromised [22].

Attacks on CPSS can be categorized as security-based (active) and privacy based (passive). Passive attacks have the objective of accessing private and individual data being shared from the public datasets [23], [24]. There exist researches that propose various cryptographic methods to protect the privacy of data [11]. However, cryptographic methods are naturally computationally expensive. Furthermore, in scenarios requiring public sharing of data, it becomes more challenging to guarantee privacy [12]. Likewise, anonymization methods like k-anonymity [13] are being proposed to tackle privacy concerns. However, anonymization methods fail to ensure the required level of privacy [25]. Especially as the number of attributes datasets increases, they become vulnerable to re-identification [14].

As a concrete privacy notion, differential privacy [15] is considered as the future of privacy. It is a promising scheme for preserving privacy in deep learning and data fusion. Differential privacy safeguards real-time or statistical data by inserting the appropriate quantity of noise while retaining the healthy trade-off between accuracy and privacy. Especially, differential privacy offers individuals an attestable privacy assurance, which is benefiting from a concrete theoretical foundation compared to other privacy-preserving schemes [14], [26], [27].

Additionally, through the adjustment of privacy budget value, differential privacy can compromise gracefully between utility and privacy, where larger privacy budgets imply weak privacy guarantee offered. Furthermore, it ensures data owners that adversaries are not capable of inferring any information related to a single record confidently from the released deep learning resulting models, even if all the remaining records in the dataset are known to the adversary [28].

Differential privacy proves to be capable of preserving privacy of a large quantity of data from real-time data and databases [29]. It realizes privacy-preservation through the addition of a carefully calibrated quantity of noise to the output results or model according to the robust mechanisms rather than merely individual data anonymization. This is completely achieved due to differential privacy capability to provide efficient and effectual schemes solving privacy preservation problems by using the basic methods like functional mechanism [30], exponential mechanism [31], Gaussian mechanism [32] and Laplace mechanism [11]. These mechanisms satisfy the need for privacy preservation by combining differential privacy strength for a wide range of non-private deep learning prototypes (see Table 1).

However, there is very little research on differential privacy deep learning and data fusion for CPSSs. Dwork [33], Ji [34] has their focus on a few basic differential privacy based machine learning and they do not focus on differentially privacy data fusion and deep learning. Therefore, at present, the community lacks a comprehensive synopsis of state-of-the-art methods on differential privacy-preserving deep learning and data fusion for CPSSs. Inspired by this, we devise a comprehensive survey on differential privacy for deep learning and data fusion on CPSS. Specifically, the survey offers the following major contribution.

  • We propose a novel differentially private data fusion and deep learning framework for CPSS.

  • We offer a thorough and comprehensive analysis of the state-of-the-art techniques. Consequently, the survey put forward new perspectives for clear understanding of the existing works and thus, improve the privacy levels of deep learning and data fusion techniques.

  • To enable timely and impending research in the area of privacy-preserving data fusion and learning, we suggest various promising future research directions.

We organize this monograph as follows. Section 2 provides a brief introduction of the key CPSS concepts, including system architectures, applications, and the motives to use differential privacy in CPSS. Section 3 surveys key concepts on differential privacy, including privacy attacks on deep learning, privacy budget, and various differential privacy mechanisms. Section 4 reviews the data fusion in CPSS by focusing on CPSS data requirements for data fusion, data fusion techniques, and differential privacy data fusion. Section 5 describes differential privacy deep learning for CPSS. We propose CPSS data fusion and deep learning framework design and provide the future work and challenges of CPSS data fusion and deep learning in Section 6. Conclusions are included in Section 7.

Section snippets

Tensor operations and tensor decompositions

Tensors generalize matrices in high-order space and commonly referred to as multi-dimensional arrays, are used to represent extra variable types in higher dimensions [3]. In a tensor, the number of dimensions represents the tensor’s order. Tensor factorization represents an interactive approach to analyze sets of matrices that are common in size [35]. A set of information contained in a given heterogeneous set of data can be merged by factorizing every data set through a tensor decomposition

Differential privacy

As Section 2 narrates, there are various CPSS applications comprises of sensitive and private data, however, CPSS’s data are always under constant attack by the adversaries. They attack critical systems so as to gain partial or complete access to sensitive information. This part of the survey delves into the various attacks of privacy that are closely related to CPSS’s and the recent notion of privacy guarantee given by differential privacy [67].

Data fusion in CPSSs

This part surveys data fusion techniques, CPSS Data Fusion Requirements, data fusion for deep learning, and differentially private data fusion. They are vital to achieving private data fusion and deep Learning for CPSS.

Differentially private deep learning in CPSS

In this section, we firstly introduce the objective perturbation mechanism for achieving differentially private deep learning in CPSS. Then, three differentially private protection levels are illustrated based on recent researches are discussed.

Proposed framework

As Sections 4 Data fusion in CPSSs, 5 Differentially private deep learning in CPSS describes, there are some existing differentially privacy methods for deep learning and data fusion, respectively. Unfortunately, non of the existing method takes into account both techniques. As a result, isolating key aspects of data fusion when dealing with deep learning necessitating the failure to realize the interoperations. To address the challenging problem, we propose a Differentially Private data fusion

Conclusion

Technological advancement has made cyber–physical–social systems (CPSS) an integral part of our daily lives. However, most deep learning techniques used for CPSS are vulnerable to various privacy concerns during model training and prediction. Therefore, it is paramount to enhance deep learning methods’ privacy, especially those majored in domains with critical and sensitive information. Throughout this article, we give a thorough review on preserving privacy in data fusion and deep learning

CRediT authorship contribution statement

Nicholaus J. Gati: Conception and design of study, Writing - original draft, Writing - review & editing. Laurence T. Yang: Conception and design of study, Writing - original draft, Writing - review & editing. Jun Feng: Conception and design of study, Writing - original draft, Writing - review & editing. Xin Nie: Conception and design of study, Writing - original draft. Zhian Ren: Conception and design of study, Writing - original draft. Samwel K. Tarus: Conception and design of study, Writing -

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the National Key Research and Development Plan of China under Grant 2017YFB0801804, Grant 2018YFB1800103, and Grant 2019YFB1170062; in part by the National Natural Science Foundation of China under Grant 62002128, and Grant 61932010; and in part by the Fundamental Research Research Funds for the Central universities under Grant 2018KFYXKJC046.

References (150)

  • BoadaB.L. et al.

    Vehicle sideslip angle measurement based on sensor data fusion using an integrated ANFIS and an unscented Kalman filter algorithm

    Mech. Syst. Signal Process.

    (2016)
  • MoshouD. et al.

    Water stress detection based on optical multisensor fusion with a least squares support vector machine classifier

    Biosystems Eng.

    (2014)
  • RousselS. et al.

    Fusion of aroma, FT–IR and UV sensor data based on the Bayesian inference. Application to the discrimination of white grape varieties

    Chemometr. Intell. Lab. Syst.

    (2003)
  • WuW. et al.

    A multifocus image fusion method by using hidden Markov model

    Opt. Commun.

    (2013)
  • XuZ. et al.

    Information fusion for intuitionistic fuzzy decision making: An overview

    Inf. Fusion

    (2016)
  • WangF.-Y.

    The emergence of intelligent enterprises: From CPS to CPSS

    IEEE Intell. Syst.

    (2010)
  • KekulluogluD. et al.

    Preserving privacy as social responsibility in online social networks

    ACM Trans. Internet Technol.

    (2018)
  • FengJ. et al.

    Privacy preserving high-order bi-lanczos in cloud-fog computing for industrial applications

    IEEE Trans. Ind. Inform.

    (2020)
  • MaJ. et al.

    Towards a smart world and ubiquitous intelligence: A walkthrough from smart things to smart hyperspaces and ubickids

    Int. J. Pervasive Comput. Commun.

    (2005)
  • ShethA. et al.

    Physical-cyber-social computing: An early 21st century approach

    IEEE Intell. Syst.

    (2013)
  • TangM. et al.

    Big data for cybersecurity: Vulnerability disclosure trends and dependencies

    IEEE Trans. Big Data

    (2017)
  • DworkC. et al.

    Theory of cryptography

    Lect. Notes Comput. Sci. Ser.

    (2006)
  • ZhangQ. et al.

    Privacy preserving deep computation model on cloud for big data feature learning

    IEEE Trans. Comput.

    (2015)
  • SweeneyL.

    K-anonymity: A model for protecting privacy

    Int. J. Uncertain. Fuzziness Knowl.-Based Syst.

    (2002)
  • LiN. et al.

    t-closeness: Privacy beyond k-anonymity and l-diversity

  • DworkC.

    Differential privacy: A survey of results

  • LiK.-C. et al.

    Big Data: Algorithms, Analytics, and Applications

    (2015)
  • ChengY. et al.

    Risk prediction with electronic health records: A deep learning approach

  • M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, T. Ristenpart, Privacy in pharmacogenetics: An end-to-end case study...
  • FengJ. et al.

    Privacy-preserving tensor decomposition over encrypted data in a federated cloud environment

    IEEE Trans. Dependable Secure Comput.

    (2020)
  • GowthamM. et al.

    Privacy enhanced data communication protocol for wireless body area network

  • GiraldoJ. et al.

    Security and privacy in cyber-physical systems: A survey of surveys

    IEEE Des. Test

    (2017)
  • HeD. et al.

    Security analysis and improvement of a secure and distributed reprogramming protocol for wireless sensor networks

    IEEE Trans. Ind. Electron.

    (2012)
  • NarayananA. et al.

    Robust de-anonymization of large sparse datasets

  • C.C. Aggarwal, On k-anonymity and the curse of dimensionality, in: Proceedings of the VLDB, Vol. 5, 2005, pp....
  • FengJ. et al.

    Edge-cloud-aided differentially private tucker decomposition for cyber-physical-social systems

    IEEE Internet Things J.

    (2020)
  • DworkC.

    A firm foundation for private data analysis

    Commun. ACM

    (2011)
  • CaoY. et al.

    Quantifying differential privacy in continuous data release under temporal correlations

    IEEE Trans. Knowl. Data Eng.

    (2018)
  • ChaudhuriK. et al.

    Privacy-preserving logistic regression

  • F. McSherry, K. Talwar, Differential privacy in mechanism design, in: Proceedings of the IEEE Symposium on the...
  • M. Abadi, A. Chu, I. Goodfellow, H.B. McMahan, I. Mironov, K. Talwar, L. Zhang, Deep learning with differential...
  • DworkC. et al.

    The algorithmic foundations of differential privacy

    Found. Trends Theor. Comput. Sci.

    (2014)
  • JingL. et al.

    An adaptive multi-sensor data fusion method based on deep convolutional neural networks for fault diagnosis of planetary gearbox

    Sensors

    (2017)
  • SorberL.

    Data fusion: Tensor factorizations by complex optimization (data fusie: tensor factorisaties dmv complexe optimalisatie)

    (2014)
  • CichockiA.

    Era of big data processing: A new approach via tensor networks and tensor decompositions

    (2014)
  • CookD.J. et al.

    CASAS: A smart home in a box

    Computer

    (2012)
  • HelalS. et al.

    The gator tech smart house: A programmable pervasive space

    Computer

    (2005)
  • Y.W. Kao, S.-M. Yuan, USHAF: A user-modifiable semantic home automation framework, in: 2nd International Conference on...
  • KientzJ.A. et al.

    The Georgia tech aware home

  • LinC.-C. et al.

    A triangular nodetrix visualization interface for overlapping social community structures of cyber-physical-social systems in smart factories

    IEEE Trans. Emerg. Top. Comput.

    (2017)
  • Cited by (0)

    View full text