Towards Understanding the fairness of differentially private margin classifiers

Ruan, Wenqiang; Xu, Mingxin; Jing, Yinan; Han, Weili

doi:10.1007/s11280-022-01088-1

Towards Understanding the fairness of differentially private margin classifiers

Published: 28 July 2022

Volume 26, pages 1201–1221, (2023)
Cite this article

World Wide Web Aims and scope Submit manuscript

Wenqiang Ruan¹,
Mingxin Xu¹,
Yinan Jing¹ &
…
Weili Han ORCID: orcid.org/0000-0001-8663-436X¹

285 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Margin classifiers, such as Support Vector Machine, are usually critical in the high-stakes decision domains. In recent years, differential privacy has been widely employed in margin classifiers to protect user privacy. However, incorporating differential privacy into margin classifiers might adversely cause the fairness issue in the sense that differentially private margin classifiers have significantly different true positive rates on different groups that are determined by sensitive attributes (e.g. race). In order to address this issue, we are motivated to identify the factor that dominates the fairness of differentially private margin classifiers based on well-designed experiments and further analysis. We first conduct an empirical study on three classical margin classifiers learned via three representative differentially private empirical risk minimization algorithms, respectively. The empirical result shows that the fairness of differentially private margin classifiers strongly depends on the fairness of their non-private versions. We then analyze how differential privacy impacts the fairness of margin classifiers and confirm the empirical study results. In a general sense, our study shows that when non-private margin classifiers are fair, the fairness of their differentially private counterparts can be ensured.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

Notes

References

de Paula, D.A.V., Artes, R., Ayres, F., Minardi, A.: Estimating credit and profit scoring of a brazilian credit union with logistic regression and machine-learning techniques. RAUSP Manage. J. 54, 321–336 (2019)
Article Google Scholar
Zhang, L., Hu, H., Zhang, D.: A credit risk assessment model based on svm for small and medium enterprises in supply chain finance. Financ. Innov. 1(14), 1–21 (2015)
Google Scholar
Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)
Iyengar, R., Near, J.P., Song, D., Thakkar, O., Thakurta, A., Wang, L.: Towards practical differentially private convex optimization. In: Proceedings of 2019 IEEE Symposium on Security and Privacy (SP), pp. 299–316. IEEE (2019)
Wu, X., Li, F., Kumar, A., Chaudhuri, K., Jha, S., Naughton, J.: Bolt-on differential privacy for scalable stochastic gradient descent-based analytics. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1307–1322 (2017)
Yu, D., Zhang, H., Chen, W., Liu, T.-Y.: Do not let privacy overbill utility: Gradient embedding perturbation for private learning. In: ICLR 2021 (2021)
Zhou, Y., Wu, S., Banerjee, A.: Bypassing the ambient dimension: Private $\{sgd\}$ with gradient subspace identification. In: International Conference on Learning Representations (2021)
Huang, X., Ding, Y., Jiang, Z.L., Qi, S., Wang, X., Liao, Q.: Dp-fl: a novel differentially private federated learning framework for the unbalanced data. World Wide Web 23(4), 2529–2545 (2020)
Article Google Scholar
Chouldechova, A., Roth, A.: A snapshot of the frontiers of fairness in machine learning. Commun. ACM 63(5), 82–89 (2020)
Article Google Scholar
Ranjbar Kermany, N., Zhao, W., Yang, J., Wu, J., Pizzato, L.: A fairness-aware multi-stakeholder recommender system. World Wide Web 24(6), 1995–2018 (2021)
Article Google Scholar
Donini, M., Oneto, L., Ben-David, S., Shawe-Taylor, J., Pontil, M.: Empirical risk minimization under fairness constraints. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18, pp. 2796–2806. Curran Associates Inc., (2018)
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems, pp. 3315–3323 (2016)
Mandal, D., Deng, S., Jana, S., Wing, J., Hsu, D.J.: Ensuring fairness beyond the training data. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 18445–18456 (2020)
Roh, Y., Lee, K., Whang, S.E., Suh, C.: Fairbatch: Batch selection for model fairness. In: International Conference on Learning Representations (2021)
Hu, R., Zhu, X., Zhu, Y., Gan, J.: Robust svm with adaptive graph learning. World Wide Web 23(3), 1945–1968 (2020)
Article Google Scholar
Bagdasaryan, E., Poursaeed, O., Shmatikov, V.: Differential privacy has disparate impact on model accuracy. In: Advances in Neural Information Processing Systems, pp. 15479–15488 (2019)
Farrand, T., Mireshghallah, F., Singh, S., Trask, A.: Neither private nor fair: Impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice. PPMLP’20, pp. 15–19. Association for Computing Machinery, (2020)
Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art. Sociol. Meth. Res. 50(1), 3–44 (2021)
Article MathSciNet Google Scholar
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226 (2012)
Hebert-Johnson, U., Kim, M., Reingold, O., Rothblum, G.: Multicalibration: Calibration for the (Computationally-identifiable) masses. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1939–1948 (2018)
Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. Journal of Machine Learning Research 12(3) (2011)
Bassily, R., Smith, A., Thakurta, A.: Private empirical risk minimization: Efficient algorithms and tight error bounds. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 464–473. IEEE (2014)
Su, D., Cao, J., Li, N., Bertino, E., Lyu, M., Jin, H.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Sec. (TOPS) 20(4), 1–33 (2017)
Article Google Scholar
Jain, P., Kothari, P., Thakurta, A.: Differentially private online learning. In: Proceedings of Conference on Learning Theory, pp. 24–1 (2012)
Bu, Z., Dong, J., Long, Q., Su, W.J.: Deep learning with gaussian differential privacy. Harvard data science review 2020(23) (2020)
Jagielski, M., Kearns, M., Mao, J., Oprea, A., Roth, A., Sharifi-Malvajerdi, S., Ullman, J.: Differentially private fair learning. In: International Conference on Machine Learning, pp. 3000–3008. PMLR (2019)
Cummings, R., Gupta, V., Kimpara, D., Morgenstern, J.: On the compatibility of privacy and fairness. In: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization. UMAP’19 Adjunct, pp. 309–315. Association for Computing Machinery, (2019)
Ding, J., Zhang, X., Li, X., Wang, J., Yu, R., Pan, M.: Differentially private and fair classification via calibrated functional mechanism. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 622–629 (2020)
Khalili, M.M., Zhang, X., Abroshan, M., Sojoudi, S.: Improving fairness and privacy in selection problems. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)
Mozannar, H., Ohannessian, M.I., Srebro, N.: Fair learning with private demographic data. arXiv preprint arXiv:2002.11651 (2020)
Tran, C., Fioretto, F., Hentenryck, P.V.: Differentially private and fair deep learning: A lagrangian dual approach. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 9932–9939 (2021)
Xu, D., Du, W., Wu, X.: Removing disparate impact of differentially private stochastic gradient descent on model accuracy. arXiv preprint arXiv:2003.03699 (2020)
Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning, (2012)
Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Large margin classifiers: convex loss, low noise, and convergence rates. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1173–1180 (2004)
Dwork, C.: Differential privacy. In: Proceedings of Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Part II, pp. 1–12 (2006)
Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web. WWW ’17, pp. 1171–1180. International World Wide Web Conferences Steering Committee, (2017)
Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml
Cherkassky, V., Ma, Y.: Practical selection of svm parameters and noise estimation for svm regression. Neural Netw. 17(1), 113–126 (2004)
Article MATH Google Scholar
Rahimi, A., Recht, B.: Uniform approximation of functions with random bases. In: 2008 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 555–561 (2008)
Heaven, D.: Why deep-learning ais are so easy to fool. Nature, 163–166 (2019)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 206–215 (2019)
Xue, M., He, C., Wang, J., Liu, W.: One-to-n & n-to-one: Two advanced backdoor attacks against deep learning models. IEEE Transactions on Dependable and Secure Computing (2020)
Dasgupta, S., Schulman, L.: A probabilistic analysis of em for mixtures of separated, spherical gaussians. J. Mach. Learn. Res. 8, 203–226 (2007)
MathSciNet MATH Google Scholar
Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12, pp. 1571–1578. Omnipress, (2012)

Download references

Acknowledgements

This paper is supported by the National Key R&D Program of China (2019YFE0103800) and Natural Science Foundation of China (U1836207). We thank Professor X. Sean Wang, Chuanwang Wang for their insightful comments.

Author information

Authors and Affiliations

Laboratory for Data Analytics and Security, Fudan University, No. 2005, Songhu Road, 200438, Shanghai, China
Wenqiang Ruan, Mingxin Xu, Yinan Jing & Weili Han

Authors

Wenqiang Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Mingxin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yinan Jing
View author publications
You can also search for this author in PubMed Google Scholar
Weili Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weili Han.

Appendices

A: Deviation properties of AMP, PSGD, DPSGD

In this section, we identify the deviation properties of AMP, PSGD and DPSGD.

The pseudocodes of AMP are shown in Algorithm 1. According to the design of AMP, we identify its deviation property as follows.

Proof of Theorem 1

AMP follows $(\frac{n\gamma }{\Lambda }+(\sqrt{2p\log \frac{2}{\alpha }})( \frac{4L}{\Lambda \epsilon _3}(1+\sqrt{2\log \frac{1}{\delta _1}}) + \frac{n\gamma }{\Lambda \epsilon _2}(1+\sqrt{2\log \frac{1}{\delta _2}}))$,$\alpha$)-deviation.

Proof of Theorem 1

The utility guarantee of AMP contains two parts. First, it bounds the distance between optimal model parameters $\theta _{approx}$ under private loss function and optimal model parameters $\theta ^*$ under non-private loss function. Second, it bounds the distance between private output $\theta _{out}$ and $\theta _{approx}$. The first bound is $\frac{2n\left\| \mathbf {b_1} \right\| _2}{\Lambda }$ (see inequality 10 of [4]). The second bound is $\frac{n\gamma }{\Lambda } + \left\| \mathbf {b_2} \right\| _2$(see inequality 5 of [4]). Therefore, the total bound of the deviation of model parameters is $\frac{n(\gamma +2\left\| \mathbf {b_1} \right\| _2)}{\Lambda } + \left\| \mathbf {b_2} \right\| _2$, where $\mathbf {b_1}$ and $\mathbf {b_2}$ are distributed as $\mathcal N(0,\sigma _1^2I_{p\times p}), \mathcal N(0,\sigma _2^2I_{p\times p})$, here $\sigma _1$ = $\frac{\frac{2L}{n}(1+\sqrt{2\log \frac{1}{\delta _1}})}{\epsilon _3}$, $\sigma _2$=$\frac{\frac{n\gamma }{\Lambda }(1+\sqrt{2\log \frac{1}{\delta _2}})}{\epsilon _2}$.

According to Lemma 2 in [43]: with probability $\ge 1-\frac{\alpha }{2}$,

$$\begin{aligned} \left\| \mathbf {b_s} \right\| _2 \le \sigma _s\sqrt{2p\log \frac{2}{\alpha }} \end{aligned}$$

AMP thus follows $(\frac{n\gamma }{\Lambda }+(\sqrt{2p\log \frac{2}{\alpha }})( \frac{4L}{\Lambda \epsilon _3}(1+\sqrt{2\log \frac{1}{\delta _1}}) + \frac{n\gamma }{\Lambda \epsilon _2}(1+\sqrt{2\log \frac{1}{\delta _2}}))$,$\alpha$)-deviation.

We show the pseudocodes of PSGD in Algorithm 2. We then identify the deviation property of PSGD.

Proof of Lemma 2

PSGD follows ($\frac{2p\ln (p/\alpha )kTL\eta }{n\epsilon }$,$\alpha$)-deviation.

Proof of Lemma 2

The sensitivity of PSGD is $\frac{2kTL\eta }{n}$ (see Corollary 1 in [5]). As the noise is directly added on the final model, the Euclidean distance between private model and non-private model is the $L_2$ norm of the noise, which is distributed as Gamma distribution $\Gamma$(p, $\frac{2kTL\eta }{n\epsilon }$). According to Theorem 2 in [5]: for the noise vector $\kappa$, whose $L_2$ norm is distributed according to the Gamma distribution $\Gamma (p, \Delta )$, we have that with probability at least 1-$\alpha$, $\left\| \kappa \right\| _2 \le p\Delta ln(\frac{p}{\alpha })$. Therefore, PSGD follows ($\frac{2p\ln (p/\alpha )kTL\eta }{n\epsilon }$,$\alpha$)-deviation.

We then identify the deviation property of DPSGD under a strong convexity and continuity assumption on loss functions. The pseudocodes of DPSGD are shown in Algorithm 3.

Proof of Lemma 3

When applying DPSGD to optimize a $\Delta$-strongly convex and $L_2$-Lipchitz continuous loss function, if we set learning rate as $\frac{1}{\Delta t}$, DPSGD follows ($\frac{4(L^2 + p\sigma ^2)}{\Delta ^2T\alpha }$,$\alpha$)-deviation.

Proof of Lemma 3

Let $G_t$ as the gradient at iteration $t$, according to Theorem 2.4 of [22],

$$\begin{aligned} \mathbb {E}[\left\| G_t \right\| _{2}^{2}] \le L^2 + p\sigma ^2 \end{aligned}$$

Then according to Lemma 1 of [44],

$$\begin{aligned} \mathbb {E}[\left\| \theta _{t} - \theta ^* \right\| _2] \le \frac{4(L^2 + p\sigma ^2)}{\Delta ^2t} \end{aligned}$$

Finally, according to Markov inequality,

$$\begin{aligned} Pr(\left\| \theta _{priv} - \theta ^* \right\| _2 \le \frac{4(L^2 + p\sigma ^2)}{\Delta ^2T\alpha }) \ge 1-\alpha \end{aligned}$$

The deviation properties of AMP, PSGD and DPSGD show that $\lambda$ is inversely propertional to $\alpha$. Therefore, they deviate private hyperplane from the original hyperplane little with high probability.

B: Empirical Results of the Rest Three Datasets

We train Linear SVM, Kernal SVM and LR models on German, Student, Arrhythmia datasets under the same setting with that of Section 4. The test results are shown in Figures 10, 11 and 12. Even though with large variances, from the average results, we can find that when a significant TPR gap exists in the non-private model, the private models will have larger TPR gaps. On the other hand, when the TPR gaps of non-private models are negligible, the private models will have similar, even smaller, TPR gaps with the non-private models. We then explain why the TPR gaps of margin classifiers trained on these three datasets have such large variances.

The overview of German, Student, Arrhythmia datasets is shown in Table 6. The sizes of these three datasets are all less than or equal to 1,000. Consequently, the sizes of their testing datasets are less than or equal to 200. Even though labels are balanced distributed and different groups have the same number of data samples, the number of ‘positive’ samples of each group in testing datasets is less than or equal to 50. Therefore, the inversion of one data sample’s prediction changes the TPR of the corresponding group by at least 2%. As a result, the test results of these datasets are greatly impacted by the randomness of noise sampling, and all have large variances.

Table 6 Overview of supplementary datasets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruan, W., Xu, M., Jing, Y. et al. Towards Understanding the fairness of differentially private margin classifiers. World Wide Web 26, 1201–1221 (2023). https://doi.org/10.1007/s11280-022-01088-1

Download citation

Received: 18 January 2022
Revised: 19 May 2022
Accepted: 15 July 2022
Published: 28 July 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11280-022-01088-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Understanding the fairness of differentially private margin classifiers

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A comparative analysis of gradient boosting algorithms

Notes

References

Acknowledgements