Skip to main content
Log in

Towards Understanding the fairness of differentially private margin classifiers

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Margin classifiers, such as Support Vector Machine, are usually critical in the high-stakes decision domains. In recent years, differential privacy has been widely employed in margin classifiers to protect user privacy. However, incorporating differential privacy into margin classifiers might adversely cause the fairness issue in the sense that differentially private margin classifiers have significantly different true positive rates on different groups that are determined by sensitive attributes (e.g. race). In order to address this issue, we are motivated to identify the factor that dominates the fairness of differentially private margin classifiers based on well-designed experiments and further analysis. We first conduct an empirical study on three classical margin classifiers learned via three representative differentially private empirical risk minimization algorithms, respectively. The empirical result shows that the fairness of differentially private margin classifiers strongly depends on the fairness of their non-private versions. We then analyze how differential privacy impacts the fairness of margin classifiers and confirm the empirical study results. In a general sense, our study shows that when non-private margin classifiers are fair, the fairness of their differentially private counterparts can be ensured.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/propublica/Compas-analysis

  2. https://github.com/sunblaze-ucb/dpml-benchmark

References

  1. de Paula, D.A.V., Artes, R., Ayres, F., Minardi, A.: Estimating credit and profit scoring of a brazilian credit union with logistic regression and machine-learning techniques. RAUSP Manage. J. 54, 321–336 (2019)

    Article  Google Scholar 

  2. Zhang, L., Hu, H., Zhang, D.: A credit risk assessment model based on svm for small and medium enterprises in supply chain finance. Financ. Innov. 1(14), 1–21 (2015)

    Google Scholar 

  3. Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)

  4. Iyengar, R., Near, J.P., Song, D., Thakkar, O., Thakurta, A., Wang, L.: Towards practical differentially private convex optimization. In: Proceedings of 2019 IEEE Symposium on Security and Privacy (SP), pp. 299–316. IEEE (2019)

  5. Wu, X., Li, F., Kumar, A., Chaudhuri, K., Jha, S., Naughton, J.: Bolt-on differential privacy for scalable stochastic gradient descent-based analytics. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1307–1322 (2017)

  6. Yu, D., Zhang, H., Chen, W., Liu, T.-Y.: Do not let privacy overbill utility: Gradient embedding perturbation for private learning. In: ICLR 2021 (2021)

  7. Zhou, Y., Wu, S., Banerjee, A.: Bypassing the ambient dimension: Private \(\{sgd\}\) with gradient subspace identification. In: International Conference on Learning Representations (2021)

  8. Huang, X., Ding, Y., Jiang, Z.L., Qi, S., Wang, X., Liao, Q.: Dp-fl: a novel differentially private federated learning framework for the unbalanced data. World Wide Web 23(4), 2529–2545 (2020)

    Article  Google Scholar 

  9. Chouldechova, A., Roth, A.: A snapshot of the frontiers of fairness in machine learning. Commun. ACM 63(5), 82–89 (2020)

    Article  Google Scholar 

  10. Ranjbar Kermany, N., Zhao, W., Yang, J., Wu, J., Pizzato, L.: A fairness-aware multi-stakeholder recommender system. World Wide Web 24(6), 1995–2018 (2021)

    Article  Google Scholar 

  11. Donini, M., Oneto, L., Ben-David, S., Shawe-Taylor, J., Pontil, M.: Empirical risk minimization under fairness constraints. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18, pp. 2796–2806. Curran Associates Inc., (2018)

  12. Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems, pp. 3315–3323 (2016)

  13. Mandal, D., Deng, S., Jana, S., Wing, J., Hsu, D.J.: Ensuring fairness beyond the training data. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 18445–18456 (2020)

  14. Roh, Y., Lee, K., Whang, S.E., Suh, C.: Fairbatch: Batch selection for model fairness. In: International Conference on Learning Representations (2021)

  15. Hu, R., Zhu, X., Zhu, Y., Gan, J.: Robust svm with adaptive graph learning. World Wide Web 23(3), 1945–1968 (2020)

    Article  Google Scholar 

  16. Bagdasaryan, E., Poursaeed, O., Shmatikov, V.: Differential privacy has disparate impact on model accuracy. In: Advances in Neural Information Processing Systems, pp. 15479–15488 (2019)

  17. Farrand, T., Mireshghallah, F., Singh, S., Trask, A.: Neither private nor fair: Impact of data imbalance on utility and fairness in differential privacy. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice. PPMLP’20, pp. 15–19. Association for Computing Machinery, (2020)

  18. Berk, R., Heidari, H., Jabbari, S., Kearns, M., Roth, A.: Fairness in criminal justice risk assessments: The state of the art. Sociol. Meth. Res. 50(1), 3–44 (2021)

    Article  MathSciNet  Google Scholar 

  19. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226 (2012)

  20. Hebert-Johnson, U., Kim, M., Reingold, O., Rothblum, G.: Multicalibration: Calibration for the (Computationally-identifiable) masses. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1939–1948 (2018)

  21. Chaudhuri, K., Monteleoni, C., Sarwate, A.D.: Differentially private empirical risk minimization. Journal of Machine Learning Research 12(3) (2011)

  22. Bassily, R., Smith, A., Thakurta, A.: Private empirical risk minimization: Efficient algorithms and tight error bounds. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 464–473. IEEE (2014)

  23. Su, D., Cao, J., Li, N., Bertino, E., Lyu, M., Jin, H.: Differentially private k-means clustering and a hybrid approach to private optimization. ACM Trans. Priv. Sec. (TOPS) 20(4), 1–33 (2017)

    Article  Google Scholar 

  24. Jain, P., Kothari, P., Thakurta, A.: Differentially private online learning. In: Proceedings of Conference on Learning Theory, pp. 24–1 (2012)

  25. Bu, Z., Dong, J., Long, Q., Su, W.J.: Deep learning with gaussian differential privacy. Harvard data science review 2020(23) (2020)

  26. Jagielski, M., Kearns, M., Mao, J., Oprea, A., Roth, A., Sharifi-Malvajerdi, S., Ullman, J.: Differentially private fair learning. In: International Conference on Machine Learning, pp. 3000–3008. PMLR (2019)

  27. Cummings, R., Gupta, V., Kimpara, D., Morgenstern, J.: On the compatibility of privacy and fairness. In: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization. UMAP’19 Adjunct, pp. 309–315. Association for Computing Machinery, (2019)

  28. Ding, J., Zhang, X., Li, X., Wang, J., Yu, R., Pan, M.: Differentially private and fair classification via calibrated functional mechanism. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 622–629 (2020)

  29. Khalili, M.M., Zhang, X., Abroshan, M., Sojoudi, S.: Improving fairness and privacy in selection problems. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)

  30. Mozannar, H., Ohannessian, M.I., Srebro, N.: Fair learning with private demographic data. arXiv preprint arXiv:2002.11651 (2020)

  31. Tran, C., Fioretto, F., Hentenryck, P.V.: Differentially private and fair deep learning: A lagrangian dual approach. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pp. 9932–9939 (2021)

  32. Xu, D., Du, W., Wu, X.: Removing disparate impact of differentially private stochastic gradient descent on model accuracy. arXiv preprint arXiv:2003.03699 (2020)

  33. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning, (2012)

  34. Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Large margin classifiers: convex loss, low noise, and convergence rates. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1173–1180 (2004)

  35. Dwork, C.: Differential privacy. In: Proceedings of Automata, Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Part II, pp. 1–12 (2006)

  36. Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web. WWW ’17, pp. 1171–1180. International World Wide Web Conferences Steering Committee, (2017)

  37. Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml

  38. Cherkassky, V., Ma, Y.: Practical selection of svm parameters and noise estimation for svm regression. Neural Netw. 17(1), 113–126 (2004)

    Article  MATH  Google Scholar 

  39. Rahimi, A., Recht, B.: Uniform approximation of functions with random bases. In: 2008 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 555–561 (2008)

  40. Heaven, D.: Why deep-learning ais are so easy to fool. Nature, 163–166 (2019)

  41. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 206–215 (2019)

  42. Xue, M., He, C., Wang, J., Liu, W.: One-to-n & n-to-one: Two advanced backdoor attacks against deep learning models. IEEE Transactions on Dependable and Secure Computing (2020)

  43. Dasgupta, S., Schulman, L.: A probabilistic analysis of em for mixtures of separated, spherical gaussians. J. Mach. Learn. Res. 8, 203–226 (2007)

    MathSciNet  MATH  Google Scholar 

  44. Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12, pp. 1571–1578. Omnipress, (2012)

Download references

Acknowledgements

This paper is supported by the National Key R&D Program of China (2019YFE0103800) and Natural Science Foundation of China (U1836207). We thank Professor X. Sean Wang, Chuanwang Wang for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weili Han.

Appendices

A: Deviation properties of AMP, PSGD, DPSGD

In this section, we identify the deviation properties of AMP, PSGD and DPSGD.

figure a

The pseudocodes of AMP are shown in Algorithm 1. According to the design of AMP, we identify its deviation property as follows.

Proof of Theorem 1

AMP follows \((\frac{n\gamma }{\Lambda }+(\sqrt{2p\log \frac{2}{\alpha }})( \frac{4L}{\Lambda \epsilon _3}(1+\sqrt{2\log \frac{1}{\delta _1}}) + \frac{n\gamma }{\Lambda \epsilon _2}(1+\sqrt{2\log \frac{1}{\delta _2}}))\),\(\alpha\))-deviation.

Proof of Theorem 1

The utility guarantee of AMP contains two parts. First, it bounds the distance between optimal model parameters \(\theta _{approx}\) under private loss function and optimal model parameters \(\theta ^*\) under non-private loss function. Second, it bounds the distance between private output \(\theta _{out}\) and \(\theta _{approx}\). The first bound is \(\frac{2n\left\| \mathbf {b_1} \right\| _2}{\Lambda }\) (see inequality 10 of [4]). The second bound is \(\frac{n\gamma }{\Lambda } + \left\| \mathbf {b_2} \right\| _2\)(see inequality 5 of [4]). Therefore, the total bound of the deviation of model parameters is \(\frac{n(\gamma +2\left\| \mathbf {b_1} \right\| _2)}{\Lambda } + \left\| \mathbf {b_2} \right\| _2\), where \(\mathbf {b_1}\) and \(\mathbf {b_2}\) are distributed as \(\mathcal N(0,\sigma _1^2I_{p\times p}), \mathcal N(0,\sigma _2^2I_{p\times p})\), here \(\sigma _1\) = \(\frac{\frac{2L}{n}(1+\sqrt{2\log \frac{1}{\delta _1}})}{\epsilon _3}\), \(\sigma _2\)=\(\frac{\frac{n\gamma }{\Lambda }(1+\sqrt{2\log \frac{1}{\delta _2}})}{\epsilon _2}\).

According to Lemma 2 in [43]: with probability \(\ge 1-\frac{\alpha }{2}\),

$$\begin{aligned} \left\| \mathbf {b_s} \right\| _2 \le \sigma _s\sqrt{2p\log \frac{2}{\alpha }} \end{aligned}$$

AMP thus follows \((\frac{n\gamma }{\Lambda }+(\sqrt{2p\log \frac{2}{\alpha }})( \frac{4L}{\Lambda \epsilon _3}(1+\sqrt{2\log \frac{1}{\delta _1}}) + \frac{n\gamma }{\Lambda \epsilon _2}(1+\sqrt{2\log \frac{1}{\delta _2}}))\),\(\alpha\))-deviation.

figure b

We show the pseudocodes of PSGD in Algorithm 2. We then identify the deviation property of PSGD.

Proof of Lemma 2

PSGD follows (\(\frac{2p\ln (p/\alpha )kTL\eta }{n\epsilon }\),\(\alpha\))-deviation.

Proof of Lemma 2

The sensitivity of PSGD is \(\frac{2kTL\eta }{n}\) (see Corollary 1 in [5]). As the noise is directly added on the final model, the Euclidean distance between private model and non-private model is the \(L_2\) norm of the noise, which is distributed as Gamma distribution \(\Gamma\)(p, \(\frac{2kTL\eta }{n\epsilon }\)). According to Theorem 2 in [5]: for the noise vector \(\kappa\), whose \(L_2\) norm is distributed according to the Gamma distribution \(\Gamma (p, \Delta )\), we have that with probability at least 1-\(\alpha\), \(\left\| \kappa \right\| _2 \le p\Delta ln(\frac{p}{\alpha })\). Therefore, PSGD follows (\(\frac{2p\ln (p/\alpha )kTL\eta }{n\epsilon }\),\(\alpha\))-deviation.

We then identify the deviation property of DPSGD under a strong convexity and continuity assumption on loss functions. The pseudocodes of DPSGD are shown in Algorithm 3.

figure c

Proof of Lemma 3

When applying DPSGD to optimize a \(\Delta\)-strongly convex and \(L_2\)-Lipchitz continuous loss function, if we set learning rate as \(\frac{1}{\Delta t}\), DPSGD follows (\(\frac{4(L^2 + p\sigma ^2)}{\Delta ^2T\alpha }\),\(\alpha\))-deviation.

Proof of Lemma 3

Let \(G_t\) as the gradient at iteration \(t\), according to Theorem 2.4 of [22],

$$\begin{aligned} \mathbb {E}[\left\| G_t \right\| _{2}^{2}] \le L^2 + p\sigma ^2 \end{aligned}$$

Then according to Lemma 1 of [44],

$$\begin{aligned} \mathbb {E}[\left\| \theta _{t} - \theta ^* \right\| _2] \le \frac{4(L^2 + p\sigma ^2)}{\Delta ^2t} \end{aligned}$$

Finally, according to Markov inequality,

$$\begin{aligned} Pr(\left\| \theta _{priv} - \theta ^* \right\| _2 \le \frac{4(L^2 + p\sigma ^2)}{\Delta ^2T\alpha }) \ge 1-\alpha \end{aligned}$$

The deviation properties of AMP, PSGD and DPSGD show that \(\lambda\) is inversely propertional to \(\alpha\). Therefore, they deviate private hyperplane from the original hyperplane little with high probability.

B: Empirical Results of the Rest Three Datasets

We train Linear SVM, Kernal SVM and LR models on German, Student, Arrhythmia datasets under the same setting with that of Section 4. The test results are shown in Figures 1011 and 12. Even though with large variances, from the average results, we can find that when a significant TPR gap exists in the non-private model, the private models will have larger TPR gaps. On the other hand, when the TPR gaps of non-private models are negligible, the private models will have similar, even smaller, TPR gaps with the non-private models. We then explain why the TPR gaps of margin classifiers trained on these three datasets have such large variances.

Fig. 10
figure 10

TPR gaps of non-private and differentially private SVM models trained on German, Student, and Arrhythmia datasets

Fig. 11
figure 11

TPR gaps of non-private and differentially private Kernel SVM model strained on German185,0.3, Student225,0.1, and Arrhythmia390,0.3 datasets. The subscripts of datasets indicate the dimension of the target feature space and standard variance of kernel function approximation method

Fig. 12
figure 12

TPR gaps of non-private and differentially private LR models trained on German, Student, and Arrhythmia datasets

The overview of German, Student, Arrhythmia datasets is shown in Table 6. The sizes of these three datasets are all less than or equal to 1,000. Consequently, the sizes of their testing datasets are less than or equal to 200. Even though labels are balanced distributed and different groups have the same number of data samples, the number of ‘positive’ samples of each group in testing datasets is less than or equal to 50. Therefore, the inversion of one data sample’s prediction changes the TPR of the corresponding group by at least 2%. As a result, the test results of these datasets are greatly impacted by the randomness of noise sampling, and all have large variances.

Table 6 Overview of supplementary datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruan, W., Xu, M., Jing, Y. et al. Towards Understanding the fairness of differentially private margin classifiers. World Wide Web 26, 1201–1221 (2023). https://doi.org/10.1007/s11280-022-01088-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01088-1

Keywords

Navigation