Differentially private data fusion and deep learning Framework for Cyber–Physical–Social Systems: State-of-the-art and perspectives
Introduction
Cyber–Physical–Social Systems(CPSS), as a computing system originates from the technological growth of the CPS (Cyber–Physical System) and CSS (Cyber–Social System) and provides a new platform of growth for pervasive computing [1]. Certainly, CPS provides a mapping of physical space objects into cyber space, whereas CSS provides a mapping of social space objects into cyber space [2]. Additionally, social behaviors and interactions are encompassed into the CPS because of the ever-increasing human-centric computing [3], [4]. With the technological advancement in information, computing and communication, CPSS has proved to be among the fundamental shifts to deliver personalized and proactive human being services through meeting peoples social interaction demands and their reaction to the physical environment [3], [5]. Therefore, helping to offer people from various communities improved customized services. Additional introductory details related to CPSS concepts can be referred in [6], [7].
The ever-increasing popularity of CPSS implementation leads to the generation of a large scale of heterogeneous data from numerous sources, such as user response, sensor observation, social network, and measuring instrument. Data partaking and association proves to be a pivotal component to sustain the quality of services in a pervasive environment like smart homes and societies [8]. Thus, through data fusion, CPSS provides full integration of CPS and CSS capabilities by fusing numerous data from physical space, cyber space, and social space to enhances the quality of data henceforth promoting quality decision making [3].
Typically, the usual data fusion processes fuse data originating from physical space. However, CPSS data exists in three spaces (i.e, cyber, physical and social spaces), which makes it a challenging task to merge the data. Precisely, all facets of data in social networks should be taken into account, which implies data fusion in CPSS needs algorithms and data processing techniques that are sophisticated [1], [6], [7], [9]. In this perspective, a human in a data fusion based system becomes both a data provider as well as a data consumer, which provides valued inferences that are missing from the customary physical sensors [3], [10].
Deep learning represents a promising method for precise mining of information in CPSS. The last few years have seen the significant achievement of deep learning methods in numerous machine learning/data mining tasks ranging from data analytics, and autonomous systems to signal and information processing [16]. The success greatly depends on the substantial collection of CPSS data making deep learning widely accepted in healthcare applications such as phenotype extortion, critical disease development prediction, and others [17], [18]. Therefore, there is an obvious danger posed to the privacy of deep learning models made on users sensitive data, like healthcare records. For instance, [19] proves that the private information of an individual in the training dataset can be recuperated by repetitively querying the output probabilities of a classifier for disease recognition built upon a CNN. Existing privacy worries discourage users from partaking their data and thereby hinder the future growth of deep learning itself.
However, the architectural complexity of CPSS makes the assessment of privacy threats challenging, and new privacy concerns arise [20]. Moreover, CPSSs depends on numerous data centers and sensors comprising of a vast amount of private and personal data. For instance, wearable patient devices are reporting real-time data continuously to doctors for consultation [21]. However, without proper privacy preservation approach, individual privacy of CPSS users may be compromised [22].
Attacks on CPSS can be categorized as security-based (active) and privacy based (passive). Passive attacks have the objective of accessing private and individual data being shared from the public datasets [23], [24]. There exist researches that propose various cryptographic methods to protect the privacy of data [11]. However, cryptographic methods are naturally computationally expensive. Furthermore, in scenarios requiring public sharing of data, it becomes more challenging to guarantee privacy [12]. Likewise, anonymization methods like k-anonymity [13] are being proposed to tackle privacy concerns. However, anonymization methods fail to ensure the required level of privacy [25]. Especially as the number of attributes datasets increases, they become vulnerable to re-identification [14].
As a concrete privacy notion, differential privacy [15] is considered as the future of privacy. It is a promising scheme for preserving privacy in deep learning and data fusion. Differential privacy safeguards real-time or statistical data by inserting the appropriate quantity of noise while retaining the healthy trade-off between accuracy and privacy. Especially, differential privacy offers individuals an attestable privacy assurance, which is benefiting from a concrete theoretical foundation compared to other privacy-preserving schemes [14], [26], [27].
Additionally, through the adjustment of privacy budget value, differential privacy can compromise gracefully between utility and privacy, where larger privacy budgets imply weak privacy guarantee offered. Furthermore, it ensures data owners that adversaries are not capable of inferring any information related to a single record confidently from the released deep learning resulting models, even if all the remaining records in the dataset are known to the adversary [28].
Differential privacy proves to be capable of preserving privacy of a large quantity of data from real-time data and databases [29]. It realizes privacy-preservation through the addition of a carefully calibrated quantity of noise to the output results or model according to the robust mechanisms rather than merely individual data anonymization. This is completely achieved due to differential privacy capability to provide efficient and effectual schemes solving privacy preservation problems by using the basic methods like functional mechanism [30], exponential mechanism [31], Gaussian mechanism [32] and Laplace mechanism [11]. These mechanisms satisfy the need for privacy preservation by combining differential privacy strength for a wide range of non-private deep learning prototypes (see Table 1).
However, there is very little research on differential privacy deep learning and data fusion for CPSSs. Dwork [33], Ji [34] has their focus on a few basic differential privacy based machine learning and they do not focus on differentially privacy data fusion and deep learning. Therefore, at present, the community lacks a comprehensive synopsis of state-of-the-art methods on differential privacy-preserving deep learning and data fusion for CPSSs. Inspired by this, we devise a comprehensive survey on differential privacy for deep learning and data fusion on CPSS. Specifically, the survey offers the following major contribution.
- •
We propose a novel differentially private data fusion and deep learning framework for CPSS.
- •
We offer a thorough and comprehensive analysis of the state-of-the-art techniques. Consequently, the survey put forward new perspectives for clear understanding of the existing works and thus, improve the privacy levels of deep learning and data fusion techniques.
- •
To enable timely and impending research in the area of privacy-preserving data fusion and learning, we suggest various promising future research directions.
We organize this monograph as follows. Section 2 provides a brief introduction of the key CPSS concepts, including system architectures, applications, and the motives to use differential privacy in CPSS. Section 3 surveys key concepts on differential privacy, including privacy attacks on deep learning, privacy budget, and various differential privacy mechanisms. Section 4 reviews the data fusion in CPSS by focusing on CPSS data requirements for data fusion, data fusion techniques, and differential privacy data fusion. Section 5 describes differential privacy deep learning for CPSS. We propose CPSS data fusion and deep learning framework design and provide the future work and challenges of CPSS data fusion and deep learning in Section 6. Conclusions are included in Section 7.
Section snippets
Tensor operations and tensor decompositions
Tensors generalize matrices in high-order space and commonly referred to as multi-dimensional arrays, are used to represent extra variable types in higher dimensions [3]. In a tensor, the number of dimensions represents the tensor’s order. Tensor factorization represents an interactive approach to analyze sets of matrices that are common in size [35]. A set of information contained in a given heterogeneous set of data can be merged by factorizing every data set through a tensor decomposition
Differential privacy
As Section 2 narrates, there are various CPSS applications comprises of sensitive and private data, however, CPSS’s data are always under constant attack by the adversaries. They attack critical systems so as to gain partial or complete access to sensitive information. This part of the survey delves into the various attacks of privacy that are closely related to CPSS’s and the recent notion of privacy guarantee given by differential privacy [67].
Data fusion in CPSSs
This part surveys data fusion techniques, CPSS Data Fusion Requirements, data fusion for deep learning, and differentially private data fusion. They are vital to achieving private data fusion and deep Learning for CPSS.
Differentially private deep learning in CPSS
In this section, we firstly introduce the objective perturbation mechanism for achieving differentially private deep learning in CPSS. Then, three differentially private protection levels are illustrated based on recent researches are discussed.
Proposed framework
As Sections 4 Data fusion in CPSSs, 5 Differentially private deep learning in CPSS describes, there are some existing differentially privacy methods for deep learning and data fusion, respectively. Unfortunately, non of the existing method takes into account both techniques. As a result, isolating key aspects of data fusion when dealing with deep learning necessitating the failure to realize the interoperations. To address the challenging problem, we propose a Differentially Private data fusion
Conclusion
Technological advancement has made cyber–physical–social systems (CPSS) an integral part of our daily lives. However, most deep learning techniques used for CPSS are vulnerable to various privacy concerns during model training and prediction. Therefore, it is paramount to enhance deep learning methods’ privacy, especially those majored in domains with critical and sensitive information. Throughout this article, we give a thorough review on preserving privacy in data fusion and deep learning
CRediT authorship contribution statement
Nicholaus J. Gati: Conception and design of study, Writing - original draft, Writing - review & editing. Laurence T. Yang: Conception and design of study, Writing - original draft, Writing - review & editing. Jun Feng: Conception and design of study, Writing - original draft, Writing - review & editing. Xin Nie: Conception and design of study, Writing - original draft. Zhian Ren: Conception and design of study, Writing - original draft. Samwel K. Tarus: Conception and design of study, Writing -
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by the National Key Research and Development Plan of China under Grant 2017YFB0801804, Grant 2018YFB1800103, and Grant 2019YFB1170062; in part by the National Natural Science Foundation of China under Grant 62002128, and Grant 61932010; and in part by the Fundamental Research Research Funds for the Central universities under Grant 2018KFYXKJC046.
References (150)
- et al.
A survey: Cyber-physical-social systems and their system-level design methodology
Future Gener. Comput. Syst.
(2020) - et al.
Privacy-preserving computation in cyber-physical-social systems: A survey of the state-of-the-art and perspectives
Inform. Sci.
(2020) - et al.
Multisensor data fusion: A review of the state-of-the-art
Inf. Fusion
(2013) - et al.
A personalized hashtag recommendation approach using LDA-based topic model in microblog environment
Future Gener. Comput. Syst.
(2016) - et al.
Security-aware optimization for ubiquitous computing systems with SEAT graph approach
J. Comput. System Sci.
(2013) - et al.
Achieving location error tolerant barrier coverage for wireless sensor networks
Comput. Netw.
(2017) - et al.
Data fusion in cyber-physical-social systems: State-of-the-art and perspectives
Inf. Fusion
(2019) - et al.
Secure and efficient outsourcing differential privacy data release scheme in cyber–physical system
Future Gener. Comput. Syst.
(2020) - et al.
Intelligent mobile malware detection using permission requests and API calls
Future Gener. Comput. Syst.
(2020) - et al.
An effective feature engineering for DNN using hybrid PCA–GWO for intrusion detection in IoMT architecture
Comput. Commun.
(2020)