Data Removal from an AUC Optimization Model

Li, Jie; Guo, Jun-Qi; Gao, Wei

doi:10.1007/978-3-031-05933-9_18

Jie Li¹³,
Jun-Qi Guo¹³ &
Wei Gao¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13280))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2889 Accesses

Abstract

Our learned model may be required to make some dynamic adjustments owing to data removals in privacy, adversarial learning, etc. Previous studies on this issue mostly focus on the standard classification accuracy. This work takes one step on data removal for AUC optimization, where previous methods can not be applied directly since AUC is measured by a sum of losses defined over pairs of instances from different classes. We develop the Data Removal algorithm for AUC optimization (DRAUC), and the basic idea is to adjust the trained model according to the removed data, rather than retrain another model again from the scratch. Our algorithm only needs to maintain some data statistics, without storing the training data in memory. For high-dimensional data, we utilize the frequent direction algorithm to approximate the second-order statistics, and solve the numerical solution based on gradient descent so as to avoid calculating the inverse of Hessian matrix. We verify the effectiveness of the proposed DRAUC both theoretically and empirically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Brzezinski, D., Stefanowski, J.: Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52(2), 531–562 (2017). https://doi.org/10.1007/s10115-017-1022-8
Article Google Scholar
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8(3–4), 231–357 (2015)
Article Google Scholar
Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: 2015 IEEE Symposium on Security and Privacy, pp. 463–480. IEEE (2015)
Google Scholar
Cook, R., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, New York (1982)
MATH Google Scholar
Flach, P., Hernández-Orallo, J., Ramirez, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th International Conference on Machine Learning, pp. 657–664 (2011)
Google Scholar
Gao, W., Jin, R., Zhu, S., Zhou, Z.H.: One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning, pp. 906–914 (2013)
Google Scholar
Gao, W., Zhou, Z.H.: On the consistency of AUC pairwise optimization. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 939–945 (2015)
Google Scholar
Ghashami, M., Liberty, E., Phillips, J., Woodruff, D.: Frequent directions: simple and deterministic matrix sketching. SIAM J. Comput. 45(5), 1762–1792 (2016)
Article MathSciNet Google Scholar
Ginart, A., Guan, M., Valiant, G., Zou, J.: Making AI forget you: data deletion in machine learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 3518–3531 (2019)
Google Scholar
Guo, C., Goldstein, T., Hannun, A., van der Maaten, L.: Certified data removal from machine learning models. In: Proceedings of the 37th International Conference on Machine Learning, pp. 3832–3842 (2019)
Google Scholar
Herschtal, A., Raskutti, B.: Optimising area under the ROC curve using gradient descent. In: Proceedings of the 21st International Conference on Machine Learning, p. 49 (2004)
Google Scholar
Izzo, Z., Anne Smart, M., Chaudhuri, K., Zou, J.: Approximate data deletion from machine learning models. In: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, pp. 2008–2016 (2021)
Google Scholar
Koh, P., Liang, P.: Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1885–1894 (2017)
Google Scholar
Liu, M., Yuan, Z., Ying, Y., Yang, T.: Stochastic AUC maximization with deep neural networks. In: Proceedings of the 8th International Conference on Learning Representations (2019)
Google Scholar
Liu, M., Zhang, X., Chen, Z., Wang, X., Yang, T.: Fast stochastic AUC maximization with $ o (1/n) $-convergence rate. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3189–3197 (2018)
Google Scholar
Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.: Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inform. 19(6), 1893–1905 (2014)
Article Google Scholar
Natole, M., Ying, Y., Lyu, S.: Stochastic proximal algorithms for AUC maximization. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3710–3719. PMLR (2018)
Google Scholar
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th International Conference on Machine Learning, pp. 445–453 (1998)
Google Scholar
Shen, S.Q., Yang, B.B., Gao, W.: AUC optimization with a reject option. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 5684–5691 (2020)
Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy, pp. 3–18 (2017)
Google Scholar
Voigt, P., Von dem Bussche, A.: The EU General Data Protection Regulation (DGPR), vol. 10. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57959-7
Book Google Scholar
Wu, J., Brubaker, S., Mullin, M., Rehg, J.: Fast asymmetric learning for cascade face detection. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 369–382 (2008)
Article Google Scholar
Wu, Y., Dobriban, E., Davidson, S.: DeltaGrad: rapid retraining of machine learning models. In: Proceedings of the 37th International Conference on Machine Learning, pp. 10355–10366 (2020)
Google Scholar
Ying, Y., Wen, L., Lyu, S.: Stochastic online AUC maximization. In: Advances in Neural Information Processing Systems, vol. 29, pp. 451–459 (2016)
Google Scholar

Download references

Acknowledgement

The authors want to thank the anonymous reviewers for helpful comments and suggestions. This research is supported by National Science Foundation of China (61921006, 61876078).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Jie Li, Jun-Qi Guo & Wei Gao

Authors

Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Qi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Gao .

Editor information

Editors and Affiliations

Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
João Gama
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Tianrui Li
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Yang Yu
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Enhong Chen
JD iCity, JD Technology & JD Intelligent Cities Research, Beijing, China
Yu Zheng
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Fei Teng

A Analysis of DRAUC with High-Dimensional Data

We introduce the following lemma [2] for strongly convex and smooth function:

Lemma 1

Let $\mathbf {w}^* = \arg \min _{\mathbf {w}} \mathcal {L}(\mathbf {w})$ w.r.t. $\mu $-strongly convex and $\beta $-smooth function $\mathcal {L}(\mathbf {w})$. After t iterations of gradient descent with step size $\eta _t = {2}/(\beta +\mu )$, we have

$$ \Vert \mathbf {w}_t - \mathbf {w}^* \Vert \le \left( \frac{\beta - \mu }{\beta + \mu } \right) ^t \Vert \mathbf {w}_0 - \mathbf {w}^*||. $$

We introduce a lemma for AUC optimization as follows:

Lemma 2

For bounded space $\mathcal {W}=\{\mathbf {w}:\Vert \mathbf {w}\Vert \le B\}$, let $\mathbf {w}^*=\arg \min _{\mathbf {w}\in \mathcal {W}} \mathcal {L}(\mathbf {w};S_n)$ and $\mathbf {w}^*_{-R}=\arg \min _{\mathbf {w}\in \mathcal {W}}\mathcal {L}(\mathbf {w};S_n {\setminus } R)$. For regularization parameter $\lambda >0$, we have

$$ \Vert \mathbf {w}^* -\mathbf {w}^*_{-R} \Vert \le 4B \frac{4 + \lambda }{\lambda }\left( \frac{r_+}{n_+}+\frac{r_-}{n_-}-\frac{r_+r_-}{n_+n_-}\right) . $$

Proof

From the definition of $\mathcal {L}(\mathbf {w};S_n)$, we have

$$ \Vert \nabla \mathcal {L}(\mathbf {w}_1;S_n) - \nabla \mathcal {L}(\mathbf {w}_2;S_n) \Vert \le (4+\lambda ) \Vert \mathbf {w}_1- \mathbf {w}_2\Vert , $$

where $\mathbf {w}_1, \mathbf {w}_2\in \mathcal {W}$, $\Vert \mathbf {x}_i\Vert \le 1$ and $\Vert \mathbf {x}_j\Vert \le 1$, and thus $\mathcal {L}(\mathbf {w};S_n)$ is $(4+ \lambda )$-smooth. From Cauchy’s mean-value theorem, we have

$$\begin{aligned} \left| \mathcal {L}(\mathbf {w}_1;S_n) - \mathcal {L}(\mathbf {w}_2;S_n) \right|= & {} | \nabla \mathcal {L}^\top (\kappa \mathbf {w}_1 + (1-\kappa )\mathbf {w}_2 ;S_n) (\mathbf {w}_1 - \mathbf {w}_2) | \nonumber \\\le & {} \Vert \nabla \mathcal {L}(\kappa \mathbf {w}_1 + (1-\kappa )\mathbf {w}_2 ;S_n) \Vert \Vert \mathbf {w}_1 - \mathbf {w}_2 \Vert \end{aligned}$$

(12)

where $\kappa \in [0,1]$ and $\kappa \mathbf {w}_1 + (1-\kappa )\mathbf {w}_2 \in \mathcal {W}$. We also have $\nabla \mathcal {L}(\mathbf {w}^*;S_n) = 0$, and it holds that

$$\begin{aligned} \Vert \nabla \mathcal {L}(t \mathbf {w}_1 + (1-t)\mathbf {w}_2 ;S_n) \Vert \le \max _{\mathbf {w}\in \mathcal {W}} \Vert \nabla \mathcal {L}(\mathbf {w};S_n) - \nabla \mathcal {L}(\mathbf {w}^*;S_n) \Vert \le 2B(4+\lambda ) \end{aligned}$$

which yields that $\mathcal {L}(\mathbf {w};S_n)$ is $2B(4+\lambda )$-Lipschitz from Eq. (12). Recall that $\varDelta (\mathbf {w}) = n_+ n_- \mathcal {L}(\mathbf {w}; S_n) - (n_+-r_+)(n_--r_-)\mathcal {L}(\mathbf {w};S_n {\setminus } R )$, and we have

(13)

where the first inequality holds from the optimal solution of $\mathcal {L}(\mathbf {w}^*_{-R};S_n {\setminus } R)$, and the last inequality follows from the $2B(n_+ n_- - (n_+ - r_+)(n_--r_-))(\lambda +4)$-Lipschitzness of $\varDelta (\mathbf {w})$. For $\lambda $-strongly convex function $\mathcal {L}(\mathbf {w};S_n)$, we have

$$\begin{aligned} \mathcal {L}(\mathbf {w}^*_{-R};S_n) - \mathcal {L}(\mathbf {w}^*;S_n) \ge \frac{\lambda }{2} \Vert \mathbf {w}^*_{-R} - \mathbf {w}^*\Vert ^2 \end{aligned}$$

(14)

from $\nabla \mathcal {L}(\mathbf {w}^*;S_n)=0$. Combining Eq. (13) and (14) completes the proof.

It is necessary to introduce the following lemma from [8]:

Lemma 3

Let Z be the sketch matrix of X using frequent direction. We have

$$ \left\| X[X]^\top - Z[Z]^\top \right\| \le {2 tr( X[X]^\top )}/{m} $$

where m is the sketch size.

Proof of Theorem 2.

Proof

Let $\hat{\mathcal {L}}(\mathbf {w},S_n{\setminus } R)$ be the loss by replacing covariance matrices $\mathcal {S}^+_{-R}$ and $\mathcal {S}^-_{-R}$ with $\hat{\mathcal {S}}^+_{-R}$ and $\hat{\mathcal {S}}^-_{-R}$, and $\hat{\mathbf {w}}^*_{-R} = \mathop {\arg \min }_{\mathbf {w}} \{ \hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R) \}$, we have

$$\begin{aligned} \Vert \hat{\mathbf {w}}_{-R} - \mathbf {w}^*_{-R} \Vert \le \left\| \hat{\mathbf {w}}_{-R} - \hat{\mathbf {w}}^*_{-R} \right\| + \left\| \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R} \right\| . \end{aligned}$$

(15)

Combining with Lemma 1, this follows that

$$\begin{aligned} \Vert \hat{\mathbf {w}}_{-R} - \mathbf {w}^*_{-R} \Vert \le \left( \frac{2}{\lambda + 2} \right) ^T \left\| \mathbf {w}^* -\mathbf {w}^*_{-R} \right\| + \left( 1+\left( \frac{2}{\lambda +2} \right) ^T \right) \left\| \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R} \right\| . \end{aligned}$$

(16)

To bound $\Vert \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R}\Vert $, we first rewrite $\mathcal {L}(\mathbf {w};S_n{\setminus } R)$ as

$$ \mathcal {L}(\mathbf {w};S_n{\setminus } R) = \mathbf {w}^\top (A_1 + A_2)\mathbf {w}+ \mathbf {w}^\top \mathbf {a}+ 1/ 2 $$

where $\mathbf {a}= \mathbf {c}^-_{-R} - \mathbf {c}^+_{-R}$, $A_1 = \mathcal {S}^+_{-R} + \mathcal {S}^-_{-R} + \lambda \mathbf {I}_d$ and $A_2 = (\mathbf {c}^-_{-R} - \mathbf {c}^+_{-R})(\mathbf {c}^-_{-R} - \mathbf {c}^+_{-R})^\top $. Similarly, we rewrite $\hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R)$ as

$$ \hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R) = \mathbf {w}^\top (\hat{A}_1 + A_2)\mathbf {w}+ \mathbf {w}^\top \mathbf {a}+ 1 /2\quad \text {with}\quad \hat{A}_1 = \hat{\mathcal {S}}^+_{-R} + \hat{\mathcal {S}}^-_{-R} + \lambda \mathbf {I}_d. $$

Minimizing $\mathcal {L}(\mathbf {w};S_n)$ and $\hat{\mathcal {L}}(\mathbf {w},S_n)$ gives

$$ \mathbf {w}^*_{-R} = (A_1 + A_2)^{-1}\mathbf {a}\quad \text {and} \quad \hat{\mathbf {w}}^*_{-R} = (\hat{A}_1 + A_2)^{-1}\mathbf {a}, \quad \text {respectively}. $$

It is easy to get

$$\begin{aligned}&\left\| (A_1 + A_2)^{1/2} (\hat{A_1}+A_2)^{-1}(A_1 + A_2)^{1/2} - \mathbf {I}_d \right\| \\ =&\left\| (\hat{A}_1 + A_2)^{-1/2} (A_1-\hat{A}_1) (\hat{A}_1 + A_2)^{-1/2} \right\| \le \left\| A_1-\hat{A}_1 \right\| \left\| (\hat{A_1}+A_2)^{-1}\right\| \le \frac{2\tau }{\lambda m }, \end{aligned}$$

where $\tau = \max (rank(X^+_n[X^+_n]^\top ),rank(X^-_n[X^-_n]^\top ))$, and the inequality comes from Lemma 3. Denote $\varOmega = (A_1 + A_2)^{1/2} (\hat{A_1}+A_2)^{-1}(A_1 + A_2)^{1/2} - \mathbf {I}_d $, we have

$$\begin{aligned}&\left\| \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R} \right\| = \left\| \left( (\hat{A}_1 + A_2) ^{-1} - \left( A_1 + A_2 \right) ^{-1}\right) \mathbf {a}\right\| \nonumber \\ =&\left\| (A_1+A_2)^{-1/2} \varOmega (A_1+A_2)^{-1/2} \mathbf {a}\right\| \nonumber \le \frac{2\tau }{\lambda m } \left\| (A_1 + A_2)^{-1} \mathbf {a}\right\| \le \frac{2\tau }{\lambda m } B, \end{aligned}$$

which completes the proof by combining with Eq. (16) and Lemma 2.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Guo, JQ., Gao, W. (2022). Data Removal from an AUC Optimization Model. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-05933-9_18
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05932-2
Online ISBN: 978-3-031-05933-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Removal from an AUC Optimization Model

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Analysis of DRAUC with High-Dimensional Data

A Analysis of DRAUC with High-Dimensional Data

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation