Elsevier

Future Generation Computer Systems

Volume 74, September 2017, Pages 76-85
Future Generation Computer Systems

Multi-key privacy-preserving deep learning in cloud computing

https://doi.org/10.1016/j.future.2017.02.006Get rights and content

Highlights

  • In the basic scheme, we use M-FHE as our privacy-preserving technique. Only the decrypt operation needs the interaction among data owners.

  • In the advanced scheme, we propose a hybrid structure scheme by combining the double decryption mechanism and FHE.

  • In the advanced scheme, only the encrypt and decrypt algorithms are performed by data providers.

  • We prove that these two multi-key privacy-preserving deep learning schemes over encrypted data are secure.

Abstract

Deep learning has attracted a lot of attention and has been applied successfully in many areas such as bioinformatics, imaging processing, game playing and computer security etc. On the other hand, deep learning usually requires a lot of training data which may not be provided by a sole owner. As the volume of data gets huge, it is common for users to store their data in a third-party cloud. Due to the confidentiality of the data, data are usually stored in encrypted form. To apply deep learning to these datasets owned by multiple data owners on cloud, we need to tackle two challenges: (i) the data are encrypted with different keys, all operations including intermediate results must be secure; and (ii) the computational cost and the communication cost of the data owner(s) should be kept minimal. In our work, we propose two schemes to solve the above problems. We first present a basic scheme based on multi-key fully homomorphic encryption (MK-FHE), then we propose an advanced scheme based on a hybrid structure by combining the double decryption mechanism and fully homomorphic encryption (FHE). We also prove that these two multi-key privacy-preserving deep learning schemes over encrypted data are secure.

Introduction

Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storing, networking and analytical software. The application of these resources has fostered impressive Big Data advancements  [1], [2] and Internet of Things (IoT). Because of the wide range of applications, people have paid more attention to cloud security for data management, data storing, and data processing in recent decades. Recently, deep learning has made impressive success in a wide range of applications, such as bioinformatics, image processing, game playing, natural language processing, computer security etc. On the other hand, in order to get an accurate result without over-fitting, deep learning requires a lot of training records to determine tens of thousands of parameters. In many cases, the massive amount of training data cannot be provided by one single user, but collected from different users. The trend is to have these huge datasets stored in an untrusted third-party cloud system in encrypted form due to the confidentiality and sensitivity of the datasets.

To also leverage the computing ability provided by the cloud platform, more and more applications choose to conduct the deep learning on the cloud platform. In fact, if the amount of training records is huge, it is difficult, to download all records by a single user for processing. This poses a new challenge to the community, i.e., how to perform deep learning over outsourced encrypted data owned by multiple users in cloud. In this paper, we consider to develop a multi-key privacy-preserving deep neural network in cloud over encrypted data. The key challenges include (1) Data are located in different places and encrypted with different keys. To protect data privacy, all computation (e.g. inner product and the approximation of nonlinear sigmoid function used in deep learning), intermediate results generated during the deep learning process and the learning results must be secure. (2) To improve the efficiency of the deep learning process, computation should be done by the cloud server so as to decrease the computation/communication cost of the data owner(s). Existing solutions such as secure multi-party computation (SMC)  [3], encryption schemes, garbled circuit, and detective controls were designed for other scenarios and cannot be applied directly to tackle these two challenges.

Our Contributions. To solve the above challenges, this paper designs two schemes to support multi-key learning system. Both schemes allow multiple data owners with different datasets to collaboratively learn a neural network model securely in cloud computing. To protect the confidentiality, data owners encrypt their sensitive data with different public keys before uploading to cloud server.

We first propose a basic scheme which is based on multi-key fully homomorphic encryption (MK-FHE)  [4], [5], [6]. In this scheme, multiple data owners send their data (encrypted with different public keys chosen by data owners independently of each other) to an untrusted cloud server. Cloud server computes the output of deep learning on this joint data and issues it back to all participating data owners. Finally, all of the data owners jointly perform a secure SMC protocol to decrypt and extract results from this encrypted deep learning results.

To avoid the interaction among multiple data owners, we further propose an advanced scheme which is based on a hybrid structure by combining the double decryption mechanism (BCP scheme  [7]) and fully homomorphic encryption (FHE)  [8]. If we only use BCP scheme to support the secure computation, in the training phase, both the computation of inner product of the inputs and weights, and the computation of the activation function require additional communication with the cloud server. To solve this challenge, we introduce FHE scheme directly by transforming BCP ciphertext into FHE ciphertext, such that the computations over FHE ciphertext can be realized without interaction. In this scheme, a cloud server C and an authorized center (a trusted third party) AU are queried, which is assumed to be non-colluding and honest-but-curious. The cloud server C keeps the encrypted datasets under different public keys uploaded by multiple data owners. The authorized center AU, on the other hand, only holds the master key of the master decryption of BCP scheme and the private key of FHE. In this paper, all participants are assumed to be honest-but-curious.

In summary, our contributions can be summarized as follows:

  • We address a multi-key privacy-preserving deep learning in cloud computing by proposing two schemes, which allow multiple data owners to conduct collaboratively privacy-preserving deep learning.

  • Our multi-key privacy-preserving deep learning schemes are able to preserve the privacy of sensitive data, intermediate results as well as the training model.

  • We provide a security analysis to guarantee the privacy-preserving of our proposed two schemes.

  • We give an application of our advanced scheme in face recognition. Note that our solutions are generic and can be applied to perform many other machine mining with the same setting over the same setting.

Organization. The rest of this paper is organized as follows. In Section  2, we briefly discuss the related work. Some notations, including deep learning, stochastic gradient descent, BCP scheme, FHE, and MK-FHE will be described in Section  3. We give the system model definition and describe the details of our privacy-preserving deep learning system in Section  4 and Section  5, respectively. Section  6 shows the complexity and security analysis for the proposed system. And we give an application in our system in Section  7. Finally, we conclude the paper in Section  8.

Section snippets

Deep learning

In cloud computing, deep learning has shown its success in many cases such as image recognition  [9], [10], speech recognition  [11], and biomedical data analysis  [12]. Deep learning is able to transform the original data into a higher level and more abstract expression. It means that high-dimensional original data can be converted to low-dimensional data by training a multiple neural network with a small central layer to reconstruct high-dimensional input data. Through these transformations,

Deep learning

Deep learning can be viewed as a multi-layer neural network. The input data or variables that we are able to observe is presented at the input layer. There are also several hidden layers, which extract increasingly abstract features from the input layer. They are called “hidden” because the parameters for these layers are not given in the data. During the learning process, the model must determine which features are useful for explaining the relationships in the input data. Precisely, we take a

Multi-key privacy-preserving deep learning system

In a deep learning system, each data owner has a sensitive dataset DBi in which local resources are fully administrated by the data owner. In our multi-key system, we consider n such data owners, denoted by P1,,Pn. Each data owner Pi(i[1,n]) has own pair of public and private keys (pki,ski), local sensitive dataset DBi has Ii attributes {X1(i),X2(i),,XIi(i)} and DB1DB2DBn=Φ, where i[1,n]. These data owners want to perform collaborative deep learning with the other data owners. Due to the

The basic scheme

In this subsection, we give a basic scheme (refer to Fig. 2) to realize a scenario that multiple data owners want to collaboratively learn the parameters W(1),W(2) with their partitioned data without leaking the information of their sensitive datasets.

Main idea. Generally speaking, SMC cannot handle the data encrypted with different public keys, and it can only deal with the ciphertext under the same public key. In our basic scheme, to preserve the data privacy when multiple parties are

Security analysis

Our system aims to achieve the privacy of input data, the intermediate results security and output results security of the deep learning under the semi-honest model, i.e., all participants in our two schemes are assumed to be semi-honest. Now, we describe the definition of the semantic security  [41], i.e., security against polynomially indistinguishable chosen-plaintext attack (referred as IND-CPA security).

Definition 6.1 Semantic Security(SS), IND-CPA

A public-key encryption scheme E=(KeyGen,Enc,Dec) is semantically secure, if for any

Application

In this section, we show an application of our advanced scheme in face recognition.

Privacy-Preserving Face Recognition. As a typical biometric authentication technique, face recognition is increasingly applied in real life. The widespread use of this technique arouse people’s many privacy concerns, especially outsourcing computing in an untrusted cloud server.

Assume there exists an image sample set, which collect n×m grayscale images, n (e.g.  n=20) person P1,,Pn in various poses, and each

Conclusions and future work

In this paper, we focused on the privacy issues of collaborative deep learning in cloud computing, and proposed two schemes, i.e.,  basic scheme and advanced scheme, to protect the privacy in deep learning. The basic scheme is based on a Mk-FHE scheme, and the advanced scheme is based on a hybrid structure, which combines the double decryption mechanism with FHE scheme. Both schemes are able to tackle the problem of privacy-preserving collaborative deep learning ciphertext with different public

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 61472091), Natural Science Foundation of Guangdong Province for Distinguished Young Scholars (2014A030306020), Science and Technology Planning Project of Guangdong Province, China (2015B010129015) and the Innovation Team Project of Guangdong Universities (No. 2015KCXTD014).

Ping Li received the M.S. and Ph.D. degree in mathematics from Sun Yat-sen University in 2010 and 2016, respectively. Currently, she works at Guangzhou University as postdoctoral. And hers main research interest include cryptography, privacy-preserving and cloud computing.

References (41)

  • V. Chang

    Towards a big data system disaster recovery in a private cloud

    Ad Hoc Networks

    (2015)
  • V. Chang et al.

    Cloud computing adoption framework: A security framework for business clouds

    Future Gener. Comput. Syst.

    (2016)
  • S. Goldwasser et al.

    Probabilistic encryption

    J. Comput. System Sci.

    (1984)
  • Z.W. Wang et al.

    ABE with improved auxiliary input for big data security

    J. Comput. System Sci.

    (2016)
  • O. Goldreich, Secure multi-party computation. Manuscript. Preliminary version, 1998, pp....
  • A. López-Alt et al.

    On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption

  • P. Mukherjee et al.

    Two round multiparty computation via multi-key FHE

  • P. Mukherjee, P, D. Wichs, Two Round MPC from LWE via Multi-Key FHE. IACR Cryptology ePrint Archive, 2015, p....
  • E. Bresson, D. Catalano, D. Pointcheval, A simple public-key cryptosystem with a double trapdoor decryption mechanism...
  • C. Gentry, Fully homomorphic encryption using ideal lattices, in: Symposium on the Theory of Computing,...
  • T.H. Chan et al.

    PCANet: A simple deep learning baseline for image classification?

    IEEE Trans. Image Process.

    (2015)
  • A. Graves, A.R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks. in: ICASSP,...
  • G. Hinton et al.

    Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups

    Signal Process. Mag.

    (2012)
  • M. Liang et al.

    Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach

    IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB)

    (2015)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • V. Chang et al.

    Towards achieving data security with the cloud computing adoption framework

    IEEE Trans. Serv. Comput.

    (2016)
  • G. Sun et al.

    User-defined privacy location-sharing system in mobile online social networks

    J. Netw. Comput. Appl.

    (2016)
  • G. Sun et al.

    L2P2: A location-label based approach for privacy preserving in LBS

    Future Gener. Comput. Syst.

    (2016)
  • J. Li et al.

    Secure deduplication with efficient and reliable convergent key management

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • J. Li et al.

    A hybrid cloud approach for secure authorized deduplication

    IEEE Trans. Parallel Distrib. Syst.

    (2015)
  • Cited by (416)

    View all citing articles on Scopus

    Ping Li received the M.S. and Ph.D. degree in mathematics from Sun Yat-sen University in 2010 and 2016, respectively. Currently, she works at Guangzhou University as postdoctoral. And hers main research interest include cryptography, privacy-preserving and cloud computing.

    Jin Li received the B.S. degree in mathematics from Southwest University in 2002 and the Ph.D. degree in information security from Sun Yat-sen University in 2007. Currently, he works at Guangzhou University as a professor. He has been selected as one of science and technology new star in Guangdong province. His research interests include applied cryptography and security in cloud computing. He has published more than 70 research papers in refereed international conferences and journals and has served as the program chair or program committee member in many international conferences.

    Zhengan Huang received his B.S. and M.S. degrees from Department of Mathematics, Sun Yat-sen University in 2009 and 2011, respectively, and his Ph.D. degree from Department of Computer Science and Engineering, Shanghai Jiao Tong University in 2015. He served as a security engineer in Huawei Technologies Co. Ltd. from 2015 to 2016. Currently, he is a postdoctoral researcher in Guangzhou University. His research interests include public-key cryptography and information security.

    Tong Li received his B.S. (2011) and M.S. (2014) from Taiyuan University of Technology and Beijing University of Technology, respectively, both in Computer Science Technology. Currently, he is a Ph.D. candidate at Nankai University. His research interests include applied cryptography and data privacy protection in cloud computing.

    Chong-Zhi Gao received his Ph.D. (2004) in applied mathematics from Sun Yat-sen University. Currently, he is a professor at the School of Computer Science of Guangzhou University. His research interests include cryptography and privacy in machine learning.

    Siu-Ming Yiu received a B.S. in Computer Science from the Chinese University of Hong Kong, a M.S. in Computer and Information Science from Temple University, and a Ph.D. in Computer Science from The University of Hong Kong. Currently, he is a associate professor of the University of Hong Kong. His research interest include bioinformatics, computer security and cryptography.

    Kai Chen received the B.S. degree from Nanjing University, China, in 2004, and the Ph.D. degree from University of Chinese Academy of Sciences in 2010. He is an Associate Professor in the Institute of Information Engineering, Chinese Academy of Sciences. He was also a postdoc at Pennsylvania State University, State College, PA USA. His research interests include software security, security testing on smartphones, and privacy protection in social networks.

    View full text