Privacy-Preserving Split Learning via Pareto Optimal Search

Yu, Xi; Xiang, Liyao; Wang, Shiming; Long, Chengnian

doi:10.1007/978-3-031-51482-1_7

Xi Yu¹¹,
Liyao Xiang ORCID: orcid.org/0000-0003-0165-4930¹¹,
Shiming Wang¹¹ &
…
Chengnian Long¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14347))

Included in the following conference series:

European Symposium on Research in Computer Security

748 Accesses

Abstract

With the rapid development of deep learning, it has become a trend for clients to perform split learning with an untrusted cloud server. The models are split into the client-end and server-end with features transmitted in between. However, features are typically vulnerable to attribute inference attacks to the input data. Most existing schemes target protecting data privacy at the inference, but not at the training stage. It remains a significant challenge to remove private information from the features while accomplishing the learning task with high utility.

We found the fundamental issue is that utility and privacy are mostly conflicting tasks, which are hardly handled by the linear scalarization commonly used in previous works. Thus we resort to the multi-objective optimization (MOO) paradigm, seeking a Pareto optimal solution according to the utility and privacy objectives. The privacy objective is formulated by the mutual information between feature and sensitive attributes and is approximated by Gaussian models. In each training iteration, we select a direction that balances the dual goal of moving toward the Pareto Front and toward the users’ preference while keeping the privacy loss under the preset threshold. With a theoretical guarantee, the privacy of sensitive attributes is well preserved throughout training and at convergence. Experimental results on image and tabular datasets reveal our method is superior to the state-of-the-art in terms of utility and privacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Feature Sniffer: A Stealthy Inference Attacks Framework on Split Learning

Privacy-preserving distributed learning: collaborative training on principal components and orthogonal projections of datapoints

Article 07 October 2024

Pixel-Wise Reconstruction of Private Data in Split Federated Learning

References

Doersch, C.: Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016)
Duong, T., Hazelton, M.L.: Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation. J. Multivar. Anal. 93(2), 417–433 (2005)
Article MathSciNet Google Scholar
Gupta, O., Raskar, R.: Distributed learning of deep neural network over multiple agents. J. Netw. Comput. Appl. 116, 1–8 (2018)
Article Google Scholar
Hershey, J.R., Olsen, P.A.: Approximating the Kullback Leibler divergence between gaussian mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP 2007, vol. 4, pp. IV–317. IEEE (2007)
Google Scholar
Hillermeier, C.: Nonlinear Multiobjective Optimization: A Generalized Homotopy Approach, vol. 135. Springer Science & Business Media, Cham (2001)
Book Google Scholar
Jia, J., Gong, N.Z.: $\{$AttriGuard$\}$: a practical defense against attribute inference attacks via adversarial machine learning. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 513–529 (2018)
Google Scholar
Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. 110(15), 5802–5805 (2013)
Article Google Scholar
Li, A., Duan, Y., Yang, H., Chen, Y., Yang, J.: TIPRDC: task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 824–832 (2020)
Google Scholar
Liu, S., Du, J., Shrivastava, A., Zhong, L.: Privacy adversarial network: representation learning for mobile data privacy. Proc. ACM Interact. Mob. Wear. Ubiquit. Technol. 3(4), 1–18 (2019)
Google Scholar
Mahapatra, D., Rajan, V.: Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization. In: International Conference on Machine Learning, pp. 6597–6607. PMLR (2020)
Google Scholar
Mahapatra, D., Rajan, V.: Exact pareto optimal search for multi-task learning: touring the pareto front. arXiv preprint arXiv:2108.00597 (2021)
Moyer, D., Gao, S., Brekelmans, R., Galstyan, A., Ver Steeg, G.: Invariant representations without adversarial training. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Osia, S.A., Taheri, A., Shamsabadi, A.S., Katevas, K., Haddadi, H., Rabiee, H.R.: Deep private-feature extraction. IEEE Trans. Knowl. Data Eng. 32(1), 54–66 (2018)
Article Google Scholar
Pasquini, D., Ateniese, G., Bernaschi, M.: Unleashing the tiger: inference attacks on split learning. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 2113–2129 (2021)
Google Scholar
Pittaluga, F., Koppal, S., Chakrabarti, A.: Learning privacy preserving encodings through adversarial training. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 791–799. IEEE (2019)
Google Scholar
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Article MathSciNet Google Scholar
Salamatian, S., et al.: Managing your private and public data: Bringing down inference attacks against your privacy. IEEE J. Select. Top. Signal Process. 9(7), 1240–1255 (2015)
Article Google Scholar
Sener, O., Koltun, V.: Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Song, C., Shmatikov, V.: Overlearning reveals sensitive attributes. In: 8th International Conference on Learning Representations, ICLR 2020 (2020)
Google Scholar
Weinsberg, U., Bhagat, S., Ioannidis, S., Taft, N.: Blurme: inferring and obfuscating user gender based on ratings. In: Proceedings of the Sixth ACM Conference on Recommender Systems, pp. 195–202 (2012)
Google Scholar
D Xiaochen, Z.: Feature inference attacks on split learning with an honest-but-curious server, Ph.D. thesis, National University of Singapore (2022)
Google Scholar
Xie, Q., Dai, Z., Du, Y., Hovy, E., Neubig, G.: Controllable invariance through adversarial feature learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Yang, T.Y., Brinton, C., Mittal, P., Chiang, M., Lan, A.: Learning informative and private representations via generative adversarial networks. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 1534–1543. IEEE (2018)
Google Scholar
Yao, A.C.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), pp. 160–164. IEEE (1982)
Google Scholar
Zheng, T., Li, B.: Infocensor: an information-theoretic framework against sensitive attribute inference and demographic disparity. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, pp. 437–451 (2022)
Google Scholar

Download references

Acknowledgements

This work was supported in part by NSF China (62272306, 62136006, 62032020), and a specialized technology project for the pre-research of generic information system equipment (31511130302).

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Xi Yu, Liyao Xiang, Shiming Wang & Chengnian Long

Authors

Xi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Liyao Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Shiming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chengnian Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liyao Xiang .

Editor information

Editors and Affiliations

University of California, Irvine, CA, USA
Gene Tsudik
University of Padua, Padua, Italy
Mauro Conti
Delft University of Technology, Delft, The Netherlands
Kaitai Liang
Delft University of Technology, Delft, The Netherlands
Georgios Smaragdakis

Appendices

A Notations

We list all notations in this paper in Table 6 for better understanding.

Table 6. Notations used in this paper

Full size table

B Proof for Loss Formulation

Proof

$\textbf{Lower} \, \textbf{bound} \, \textbf{of} \, \textbf{I}(\textbf{z};\textbf{y}).$ According to [13], for any conditional distribution q(y|z), it holds that

$$\begin{aligned} I(z;y) \ge H(y)+\mathbb {E}_{z,y}\log q(y|z). \end{aligned}$$

(13)

Hence the lower bound of I(z; y) can be defined as

$$\begin{aligned} \mathcal {L}=H(y)+\max _{\phi } \mathbb {E}_{z,y} \log q_{\phi }(y|z). \end{aligned}$$

(14)

The model with parameter $\phi $ is exactly the classifier performing the original task using uploaded feature z. For fixed z, the better the classifier is trained, the better estimation result it will achieve. Since H(y) is a constant, we only need to optimize the second term of $\mathcal {L}$, which can be translated into minimizing the cross entropy loss. Denoting the cross entropy loss as $CE(\cdot )$, the utility loss can be defined as:

$$ L_u = CE(g(f(x)),y). $$

$\textbf{Upper} \, \textbf{Bound} \, \textbf{of} \, \textbf{I}(\textbf{z};\textbf{s}).$ Assuming s is a discrete variable, the mutual information can be written as

$$\begin{aligned} I(z;s)=\sum _{a}p(s_a) \int p(z|s_a)\log \frac{p(z|s_a)}{\sum _{b}p(s_b)p(z|s_b)} \text {d} z. \end{aligned}$$

(15)

By Jensen’s Inequality, I(z; s) is upper bounded by:

$$\begin{aligned} \mathcal {U} =\sum _{a}\sum _{b \ne a}p(s_a)p(s_b)KL[p(z|s_a)||p(z|s_b)]. \end{aligned}$$

(16)

C Proof of Lemma 2

Proof

We prove by contradiction. The formula in (10) can be written as

$$\begin{aligned} ||C\boldsymbol{\beta } - \boldsymbol{a}||_2^2 = ||G^T \boldsymbol{d}||_2^2 + ||\boldsymbol{a}||_2^2 - 2\boldsymbol{a}^T G^T \boldsymbol{d}. \end{aligned}$$

(17)

If $\boldsymbol{a}^T G^T \boldsymbol{d} < 0$ at convergence, then $||C\boldsymbol{\beta } - \boldsymbol{a}||_2^2 > ||\boldsymbol{a}||_2^2$, which is greater than the result at $\boldsymbol{d} = \boldsymbol{0}$. Hence it contradicts with the fact that the current solution is optimal. Thus $\boldsymbol{a}^T G^T \boldsymbol{d} \ge 0$ holds by solving (10).

D Proof of Lemma 3

Proof

When rank(C) = rank(G) = m, we can always find $\boldsymbol{\beta }^{*}$ such that $C\boldsymbol{\beta }^{*} = \boldsymbol{l}$. Due to the range constraint of $\boldsymbol{\beta }$, we need to scale $\boldsymbol{\beta }^{*}$ to the range of $[-1,1]^m$ without altering the direction of $C\boldsymbol{\beta }^{*}$. Hence we could get $G^T \boldsymbol{d} = C \boldsymbol{\beta } = k \boldsymbol{l} \succ \boldsymbol{0} (k>0)$, and $\boldsymbol{a}^T G^T \boldsymbol{d} = k \boldsymbol{a}^T \boldsymbol{l} = 0$. The last equality holds due to our claim that $\boldsymbol{a}^T \boldsymbol{l} = 0$.

E Proof of Theorem 1

Proof

If $\theta $ is a Pareto optimal solution in optimizing (12), by Pareto Criticality ([5], ch4), G has rank $m-1$. Hence rank(C) = rank(G) = $m-1$. Note that $\boldsymbol{a} \ne \boldsymbol{0}$ at this point, otherwise the optimization terminates for $\boldsymbol{d} = \boldsymbol{0}$.

Let $ \mathcal {S} = \{ C \boldsymbol{\beta } | \boldsymbol{\beta } \in [-1,1]^m\}$, Col(C) be the column space of C, and Null(C) be the null space of C. It is clear that $\mathcal {S} \subseteq Col(C)$, and Col(C) and Null(C) are orthogonal complement spaces. By minimizing the objective of (12), $C \boldsymbol{\beta }$ turns out to be the best approximation of $\boldsymbol{a}$ on $\mathcal {S}$. Now we prove by contradiction. Assuming $\boldsymbol{d} = \boldsymbol{0}$, we have $C \boldsymbol{\beta } = G^{T} \boldsymbol{d} = \boldsymbol{0}$. Thus the orthogonal projection of $\boldsymbol{a}$ on $\mathcal {S}$ is $\boldsymbol{0}$. Hence the orthogonal projection of $\boldsymbol{a}$ on Col(C) is $\boldsymbol{0}$, which means $\boldsymbol{a} \in Null(C)$. Due to the orthogonality of $\boldsymbol{a}$ and $\boldsymbol{l}$, $\boldsymbol{l} \in Col(C)$, which means $\exists \boldsymbol{\alpha } \ \mathrm {s.t.} \ C \boldsymbol{\alpha } = G^T G \boldsymbol{\alpha } = \boldsymbol{l} \succeq \boldsymbol{0}$. So we can find a direction $\boldsymbol{d} = G \boldsymbol{\alpha }$ to decrease all losses, which is contradictory to the condition of Pareto optimality. Therefore, $\boldsymbol{d}$ is not $\boldsymbol{0}$.

According to Lemma 1, Lemma 2 and Lemma 3, the direction $\boldsymbol{d}$ found by (12) always satisfies $\boldsymbol{a}^T G^T \boldsymbol{d} \ge 0$, so with proper step size, $\mu $ keeps decreasing.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, X., Xiang, L., Wang, S., Long, C. (2024). Privacy-Preserving Split Learning via Pareto Optimal Search. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14347. Springer, Cham. https://doi.org/10.1007/978-3-031-51482-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-51482-1_7
Published: 11 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51481-4
Online ISBN: 978-3-031-51482-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Privacy-Preserving Split Learning via Pareto Optimal Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Feature Sniffer: A Stealthy Inference Attacks Framework on Split Learning

Privacy-preserving distributed learning: collaborative training on principal components and orthogonal projections of datapoints

Pixel-Wise Reconstruction of Private Data in Split Federated Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Notations

B Proof for Loss Formulation

Proof

C Proof of Lemma 2

Proof

D Proof of Lemma 3

Proof

E Proof of Theorem 1

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us