Skip to main content

Data Removal from an AUC Optimization Model

  • Conference paper
  • First Online:
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13280))

Included in the following conference series:

  • 2889 Accesses

Abstract

Our learned model may be required to make some dynamic adjustments owing to data removals in privacy, adversarial learning, etc. Previous studies on this issue mostly focus on the standard classification accuracy. This work takes one step on data removal for AUC optimization, where previous methods can not be applied directly since AUC is measured by a sum of losses defined over pairs of instances from different classes. We develop the Data Removal algorithm for AUC optimization (DRAUC), and the basic idea is to adjust the trained model according to the removed data, rather than retrain another model again from the scratch. Our algorithm only needs to maintain some data statistics, without storing the training data in memory. For high-dimensional data, we utilize the frequent direction algorithm to approximate the second-order statistics, and solve the numerical solution based on gradient descent so as to avoid calculating the inverse of Hessian matrix. We verify the effectiveness of the proposed DRAUC both theoretically and empirically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/sven-lijie/DRAUC.

  2. 2.

    http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/.

  3. 3.

    http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/.

References

  1. Brzezinski, D., Stefanowski, J.: Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52(2), 531–562 (2017). https://doi.org/10.1007/s10115-017-1022-8

    Article  Google Scholar 

  2. Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8(3–4), 231–357 (2015)

    Article  Google Scholar 

  3. Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: 2015 IEEE Symposium on Security and Privacy, pp. 463–480. IEEE (2015)

    Google Scholar 

  4. Cook, R., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, New York (1982)

    MATH  Google Scholar 

  5. Flach, P., Hernández-Orallo, J., Ramirez, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th International Conference on Machine Learning, pp. 657–664 (2011)

    Google Scholar 

  6. Gao, W., Jin, R., Zhu, S., Zhou, Z.H.: One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning, pp. 906–914 (2013)

    Google Scholar 

  7. Gao, W., Zhou, Z.H.: On the consistency of AUC pairwise optimization. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 939–945 (2015)

    Google Scholar 

  8. Ghashami, M., Liberty, E., Phillips, J., Woodruff, D.: Frequent directions: simple and deterministic matrix sketching. SIAM J. Comput. 45(5), 1762–1792 (2016)

    Article  MathSciNet  Google Scholar 

  9. Ginart, A., Guan, M., Valiant, G., Zou, J.: Making AI forget you: data deletion in machine learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 3518–3531 (2019)

    Google Scholar 

  10. Guo, C., Goldstein, T., Hannun, A., van der Maaten, L.: Certified data removal from machine learning models. In: Proceedings of the 37th International Conference on Machine Learning, pp. 3832–3842 (2019)

    Google Scholar 

  11. Herschtal, A., Raskutti, B.: Optimising area under the ROC curve using gradient descent. In: Proceedings of the 21st International Conference on Machine Learning, p. 49 (2004)

    Google Scholar 

  12. Izzo, Z., Anne Smart, M., Chaudhuri, K., Zou, J.: Approximate data deletion from machine learning models. In: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, pp. 2008–2016 (2021)

    Google Scholar 

  13. Koh, P., Liang, P.: Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1885–1894 (2017)

    Google Scholar 

  14. Liu, M., Yuan, Z., Ying, Y., Yang, T.: Stochastic AUC maximization with deep neural networks. In: Proceedings of the 8th International Conference on Learning Representations (2019)

    Google Scholar 

  15. Liu, M., Zhang, X., Chen, Z., Wang, X., Yang, T.: Fast stochastic AUC maximization with \( o (1/n) \)-convergence rate. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3189–3197 (2018)

    Google Scholar 

  16. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.: Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inform. 19(6), 1893–1905 (2014)

    Article  Google Scholar 

  17. Natole, M., Ying, Y., Lyu, S.: Stochastic proximal algorithms for AUC maximization. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3710–3719. PMLR (2018)

    Google Scholar 

  18. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th International Conference on Machine Learning, pp. 445–453 (1998)

    Google Scholar 

  19. Shen, S.Q., Yang, B.B., Gao, W.: AUC optimization with a reject option. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 5684–5691 (2020)

    Google Scholar 

  20. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy, pp. 3–18 (2017)

    Google Scholar 

  21. Voigt, P., Von dem Bussche, A.: The EU General Data Protection Regulation (DGPR), vol. 10. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57959-7

    Book  Google Scholar 

  22. Wu, J., Brubaker, S., Mullin, M., Rehg, J.: Fast asymmetric learning for cascade face detection. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 369–382 (2008)

    Article  Google Scholar 

  23. Wu, Y., Dobriban, E., Davidson, S.: DeltaGrad: rapid retraining of machine learning models. In: Proceedings of the 37th International Conference on Machine Learning, pp. 10355–10366 (2020)

    Google Scholar 

  24. Ying, Y., Wen, L., Lyu, S.: Stochastic online AUC maximization. In: Advances in Neural Information Processing Systems, vol. 29, pp. 451–459 (2016)

    Google Scholar 

Download references

Acknowledgement

The authors want to thank the anonymous reviewers for helpful comments and suggestions. This research is supported by National Science Foundation of China (61921006, 61876078).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Gao .

Editor information

Editors and Affiliations

A Analysis of DRAUC with High-Dimensional Data

A Analysis of DRAUC with High-Dimensional Data

We introduce the following lemma [2] for strongly convex and smooth function:

Lemma 1

Let \(\mathbf {w}^* = \arg \min _{\mathbf {w}} \mathcal {L}(\mathbf {w})\) w.r.t. \(\mu \)-strongly convex and \(\beta \)-smooth function \(\mathcal {L}(\mathbf {w})\). After t iterations of gradient descent with step size \(\eta _t = {2}/(\beta +\mu )\), we have

$$ \Vert \mathbf {w}_t - \mathbf {w}^* \Vert \le \left( \frac{\beta - \mu }{\beta + \mu } \right) ^t \Vert \mathbf {w}_0 - \mathbf {w}^*||. $$

We introduce a lemma for AUC optimization as follows:

Lemma 2

For bounded space \(\mathcal {W}=\{\mathbf {w}:\Vert \mathbf {w}\Vert \le B\}\), let \(\mathbf {w}^*=\arg \min _{\mathbf {w}\in \mathcal {W}} \mathcal {L}(\mathbf {w};S_n)\) and \(\mathbf {w}^*_{-R}=\arg \min _{\mathbf {w}\in \mathcal {W}}\mathcal {L}(\mathbf {w};S_n {\setminus } R)\). For regularization parameter \(\lambda >0\), we have

$$ \Vert \mathbf {w}^* -\mathbf {w}^*_{-R} \Vert \le 4B \frac{4 + \lambda }{\lambda }\left( \frac{r_+}{n_+}+\frac{r_-}{n_-}-\frac{r_+r_-}{n_+n_-}\right) . $$

Proof

From the definition of \(\mathcal {L}(\mathbf {w};S_n)\), we have

$$ \Vert \nabla \mathcal {L}(\mathbf {w}_1;S_n) - \nabla \mathcal {L}(\mathbf {w}_2;S_n) \Vert \le (4+\lambda ) \Vert \mathbf {w}_1- \mathbf {w}_2\Vert , $$

where \(\mathbf {w}_1, \mathbf {w}_2\in \mathcal {W}\), \(\Vert \mathbf {x}_i\Vert \le 1\) and \(\Vert \mathbf {x}_j\Vert \le 1\), and thus \(\mathcal {L}(\mathbf {w};S_n)\) is \((4+ \lambda )\)-smooth. From Cauchy’s mean-value theorem, we have

$$\begin{aligned} \left| \mathcal {L}(\mathbf {w}_1;S_n) - \mathcal {L}(\mathbf {w}_2;S_n) \right|= & {} | \nabla \mathcal {L}^\top (\kappa \mathbf {w}_1 + (1-\kappa )\mathbf {w}_2 ;S_n) (\mathbf {w}_1 - \mathbf {w}_2) | \nonumber \\\le & {} \Vert \nabla \mathcal {L}(\kappa \mathbf {w}_1 + (1-\kappa )\mathbf {w}_2 ;S_n) \Vert \Vert \mathbf {w}_1 - \mathbf {w}_2 \Vert \end{aligned}$$
(12)

where \(\kappa \in [0,1]\) and \(\kappa \mathbf {w}_1 + (1-\kappa )\mathbf {w}_2 \in \mathcal {W}\). We also have \(\nabla \mathcal {L}(\mathbf {w}^*;S_n) = 0\), and it holds that

$$\begin{aligned} \Vert \nabla \mathcal {L}(t \mathbf {w}_1 + (1-t)\mathbf {w}_2 ;S_n) \Vert \le \max _{\mathbf {w}\in \mathcal {W}} \Vert \nabla \mathcal {L}(\mathbf {w};S_n) - \nabla \mathcal {L}(\mathbf {w}^*;S_n) \Vert \le 2B(4+\lambda ) \end{aligned}$$

which yields that \(\mathcal {L}(\mathbf {w};S_n)\) is \(2B(4+\lambda )\)-Lipschitz from Eq. (12). Recall that \(\varDelta (\mathbf {w}) = n_+ n_- \mathcal {L}(\mathbf {w}; S_n) - (n_+-r_+)(n_--r_-)\mathcal {L}(\mathbf {w};S_n {\setminus } R )\), and we have

(13)

where the first inequality holds from the optimal solution of \(\mathcal {L}(\mathbf {w}^*_{-R};S_n {\setminus } R)\), and the last inequality follows from the \(2B(n_+ n_- - (n_+ - r_+)(n_--r_-))(\lambda +4)\)-Lipschitzness of \(\varDelta (\mathbf {w})\). For \(\lambda \)-strongly convex function \(\mathcal {L}(\mathbf {w};S_n)\), we have

$$\begin{aligned} \mathcal {L}(\mathbf {w}^*_{-R};S_n) - \mathcal {L}(\mathbf {w}^*;S_n) \ge \frac{\lambda }{2} \Vert \mathbf {w}^*_{-R} - \mathbf {w}^*\Vert ^2 \end{aligned}$$
(14)

from \(\nabla \mathcal {L}(\mathbf {w}^*;S_n)=0\). Combining Eq. (13) and  (14) completes the proof.

It is necessary to introduce the following lemma from [8]:

Lemma 3

Let Z be the sketch matrix of X using frequent direction. We have

$$ \left\| X[X]^\top - Z[Z]^\top \right\| \le {2 tr( X[X]^\top )}/{m} $$

where m is the sketch size.

Proof of Theorem 2.

Proof

Let \(\hat{\mathcal {L}}(\mathbf {w},S_n{\setminus } R)\) be the loss by replacing covariance matrices \(\mathcal {S}^+_{-R}\) and \(\mathcal {S}^-_{-R}\) with \(\hat{\mathcal {S}}^+_{-R}\) and \(\hat{\mathcal {S}}^-_{-R}\), and \(\hat{\mathbf {w}}^*_{-R} = \mathop {\arg \min }_{\mathbf {w}} \{ \hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R) \}\), we have

$$\begin{aligned} \Vert \hat{\mathbf {w}}_{-R} - \mathbf {w}^*_{-R} \Vert \le \left\| \hat{\mathbf {w}}_{-R} - \hat{\mathbf {w}}^*_{-R} \right\| + \left\| \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R} \right\| . \end{aligned}$$
(15)

Combining with Lemma 1, this follows that

$$\begin{aligned} \Vert \hat{\mathbf {w}}_{-R} - \mathbf {w}^*_{-R} \Vert \le \left( \frac{2}{\lambda + 2} \right) ^T \left\| \mathbf {w}^* -\mathbf {w}^*_{-R} \right\| + \left( 1+\left( \frac{2}{\lambda +2} \right) ^T \right) \left\| \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R} \right\| . \end{aligned}$$
(16)

To bound \(\Vert \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R}\Vert \), we first rewrite \(\mathcal {L}(\mathbf {w};S_n{\setminus } R)\) as

$$ \mathcal {L}(\mathbf {w};S_n{\setminus } R) = \mathbf {w}^\top (A_1 + A_2)\mathbf {w}+ \mathbf {w}^\top \mathbf {a}+ 1/ 2 $$

where \(\mathbf {a}= \mathbf {c}^-_{-R} - \mathbf {c}^+_{-R}\), \(A_1 = \mathcal {S}^+_{-R} + \mathcal {S}^-_{-R} + \lambda \mathbf {I}_d\) and \(A_2 = (\mathbf {c}^-_{-R} - \mathbf {c}^+_{-R})(\mathbf {c}^-_{-R} - \mathbf {c}^+_{-R})^\top \). Similarly, we rewrite \(\hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R)\) as

$$ \hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R) = \mathbf {w}^\top (\hat{A}_1 + A_2)\mathbf {w}+ \mathbf {w}^\top \mathbf {a}+ 1 /2\quad \text {with}\quad \hat{A}_1 = \hat{\mathcal {S}}^+_{-R} + \hat{\mathcal {S}}^-_{-R} + \lambda \mathbf {I}_d. $$

Minimizing \(\mathcal {L}(\mathbf {w};S_n)\) and \(\hat{\mathcal {L}}(\mathbf {w},S_n)\) gives

$$ \mathbf {w}^*_{-R} = (A_1 + A_2)^{-1}\mathbf {a}\quad \text {and} \quad \hat{\mathbf {w}}^*_{-R} = (\hat{A}_1 + A_2)^{-1}\mathbf {a}, \quad \text {respectively}. $$

It is easy to get

$$\begin{aligned}&\left\| (A_1 + A_2)^{1/2} (\hat{A_1}+A_2)^{-1}(A_1 + A_2)^{1/2} - \mathbf {I}_d \right\| \\ =&\left\| (\hat{A}_1 + A_2)^{-1/2} (A_1-\hat{A}_1) (\hat{A}_1 + A_2)^{-1/2} \right\| \le \left\| A_1-\hat{A}_1 \right\| \left\| (\hat{A_1}+A_2)^{-1}\right\| \le \frac{2\tau }{\lambda m }, \end{aligned}$$

where \(\tau = \max (rank(X^+_n[X^+_n]^\top ),rank(X^-_n[X^-_n]^\top ))\), and the inequality comes from Lemma 3. Denote \(\varOmega = (A_1 + A_2)^{1/2} (\hat{A_1}+A_2)^{-1}(A_1 + A_2)^{1/2} - \mathbf {I}_d \), we have

$$\begin{aligned}&\left\| \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R} \right\| = \left\| \left( (\hat{A}_1 + A_2) ^{-1} - \left( A_1 + A_2 \right) ^{-1}\right) \mathbf {a}\right\| \nonumber \\ =&\left\| (A_1+A_2)^{-1/2} \varOmega (A_1+A_2)^{-1/2} \mathbf {a}\right\| \nonumber \le \frac{2\tau }{\lambda m } \left\| (A_1 + A_2)^{-1} \mathbf {a}\right\| \le \frac{2\tau }{\lambda m } B, \end{aligned}$$

which completes the proof by combining with Eq. (16) and Lemma 2.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Guo, JQ., Gao, W. (2022). Data Removal from an AUC Optimization Model. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05933-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05932-2

  • Online ISBN: 978-3-031-05933-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics