Abstract
Our learned model may be required to make some dynamic adjustments owing to data removals in privacy, adversarial learning, etc. Previous studies on this issue mostly focus on the standard classification accuracy. This work takes one step on data removal for AUC optimization, where previous methods can not be applied directly since AUC is measured by a sum of losses defined over pairs of instances from different classes. We develop the Data Removal algorithm for AUC optimization (DRAUC), and the basic idea is to adjust the trained model according to the removed data, rather than retrain another model again from the scratch. Our algorithm only needs to maintain some data statistics, without storing the training data in memory. For high-dimensional data, we utilize the frequent direction algorithm to approximate the second-order statistics, and solve the numerical solution based on gradient descent so as to avoid calculating the inverse of Hessian matrix. We verify the effectiveness of the proposed DRAUC both theoretically and empirically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brzezinski, D., Stefanowski, J.: Prequential AUC: properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52(2), 531–562 (2017). https://doi.org/10.1007/s10115-017-1022-8
Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn. 8(3–4), 231–357 (2015)
Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: 2015 IEEE Symposium on Security and Privacy, pp. 463–480. IEEE (2015)
Cook, R., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, New York (1982)
Flach, P., Hernández-Orallo, J., Ramirez, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th International Conference on Machine Learning, pp. 657–664 (2011)
Gao, W., Jin, R., Zhu, S., Zhou, Z.H.: One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning, pp. 906–914 (2013)
Gao, W., Zhou, Z.H.: On the consistency of AUC pairwise optimization. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 939–945 (2015)
Ghashami, M., Liberty, E., Phillips, J., Woodruff, D.: Frequent directions: simple and deterministic matrix sketching. SIAM J. Comput. 45(5), 1762–1792 (2016)
Ginart, A., Guan, M., Valiant, G., Zou, J.: Making AI forget you: data deletion in machine learning. In: Advances in Neural Information Processing Systems, vol. 32, pp. 3518–3531 (2019)
Guo, C., Goldstein, T., Hannun, A., van der Maaten, L.: Certified data removal from machine learning models. In: Proceedings of the 37th International Conference on Machine Learning, pp. 3832–3842 (2019)
Herschtal, A., Raskutti, B.: Optimising area under the ROC curve using gradient descent. In: Proceedings of the 21st International Conference on Machine Learning, p. 49 (2004)
Izzo, Z., Anne Smart, M., Chaudhuri, K., Zou, J.: Approximate data deletion from machine learning models. In: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, pp. 2008–2016 (2021)
Koh, P., Liang, P.: Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1885–1894 (2017)
Liu, M., Yuan, Z., Ying, Y., Yang, T.: Stochastic AUC maximization with deep neural networks. In: Proceedings of the 8th International Conference on Learning Representations (2019)
Liu, M., Zhang, X., Chen, Z., Wang, X., Yang, T.: Fast stochastic AUC maximization with \( o (1/n) \)-convergence rate. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3189–3197 (2018)
Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.: Systematic poisoning attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inform. 19(6), 1893–1905 (2014)
Natole, M., Ying, Y., Lyu, S.: Stochastic proximal algorithms for AUC maximization. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3710–3719. PMLR (2018)
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the 15th International Conference on Machine Learning, pp. 445–453 (1998)
Shen, S.Q., Yang, B.B., Gao, W.: AUC optimization with a reject option. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 5684–5691 (2020)
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy, pp. 3–18 (2017)
Voigt, P., Von dem Bussche, A.: The EU General Data Protection Regulation (DGPR), vol. 10. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57959-7
Wu, J., Brubaker, S., Mullin, M., Rehg, J.: Fast asymmetric learning for cascade face detection. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 369–382 (2008)
Wu, Y., Dobriban, E., Davidson, S.: DeltaGrad: rapid retraining of machine learning models. In: Proceedings of the 37th International Conference on Machine Learning, pp. 10355–10366 (2020)
Ying, Y., Wen, L., Lyu, S.: Stochastic online AUC maximization. In: Advances in Neural Information Processing Systems, vol. 29, pp. 451–459 (2016)
Acknowledgement
The authors want to thank the anonymous reviewers for helpful comments and suggestions. This research is supported by National Science Foundation of China (61921006, 61876078).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Analysis of DRAUC with High-Dimensional Data
A Analysis of DRAUC with High-Dimensional Data
We introduce the following lemma [2] for strongly convex and smooth function:
Lemma 1
Let \(\mathbf {w}^* = \arg \min _{\mathbf {w}} \mathcal {L}(\mathbf {w})\) w.r.t. \(\mu \)-strongly convex and \(\beta \)-smooth function \(\mathcal {L}(\mathbf {w})\). After t iterations of gradient descent with step size \(\eta _t = {2}/(\beta +\mu )\), we have
We introduce a lemma for AUC optimization as follows:
Lemma 2
For bounded space \(\mathcal {W}=\{\mathbf {w}:\Vert \mathbf {w}\Vert \le B\}\), let \(\mathbf {w}^*=\arg \min _{\mathbf {w}\in \mathcal {W}} \mathcal {L}(\mathbf {w};S_n)\) and \(\mathbf {w}^*_{-R}=\arg \min _{\mathbf {w}\in \mathcal {W}}\mathcal {L}(\mathbf {w};S_n {\setminus } R)\). For regularization parameter \(\lambda >0\), we have
Proof
From the definition of \(\mathcal {L}(\mathbf {w};S_n)\), we have
where \(\mathbf {w}_1, \mathbf {w}_2\in \mathcal {W}\), \(\Vert \mathbf {x}_i\Vert \le 1\) and \(\Vert \mathbf {x}_j\Vert \le 1\), and thus \(\mathcal {L}(\mathbf {w};S_n)\) is \((4+ \lambda )\)-smooth. From Cauchy’s mean-value theorem, we have
where \(\kappa \in [0,1]\) and \(\kappa \mathbf {w}_1 + (1-\kappa )\mathbf {w}_2 \in \mathcal {W}\). We also have \(\nabla \mathcal {L}(\mathbf {w}^*;S_n) = 0\), and it holds that
which yields that \(\mathcal {L}(\mathbf {w};S_n)\) is \(2B(4+\lambda )\)-Lipschitz from Eq. (12). Recall that \(\varDelta (\mathbf {w}) = n_+ n_- \mathcal {L}(\mathbf {w}; S_n) - (n_+-r_+)(n_--r_-)\mathcal {L}(\mathbf {w};S_n {\setminus } R )\), and we have
where the first inequality holds from the optimal solution of \(\mathcal {L}(\mathbf {w}^*_{-R};S_n {\setminus } R)\), and the last inequality follows from the \(2B(n_+ n_- - (n_+ - r_+)(n_--r_-))(\lambda +4)\)-Lipschitzness of \(\varDelta (\mathbf {w})\). For \(\lambda \)-strongly convex function \(\mathcal {L}(\mathbf {w};S_n)\), we have
from \(\nabla \mathcal {L}(\mathbf {w}^*;S_n)=0\). Combining Eq. (13) and (14) completes the proof.
It is necessary to introduce the following lemma from [8]:
Lemma 3
Let Z be the sketch matrix of X using frequent direction. We have
where m is the sketch size.
Proof of Theorem 2.
Proof
Let \(\hat{\mathcal {L}}(\mathbf {w},S_n{\setminus } R)\) be the loss by replacing covariance matrices \(\mathcal {S}^+_{-R}\) and \(\mathcal {S}^-_{-R}\) with \(\hat{\mathcal {S}}^+_{-R}\) and \(\hat{\mathcal {S}}^-_{-R}\), and \(\hat{\mathbf {w}}^*_{-R} = \mathop {\arg \min }_{\mathbf {w}} \{ \hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R) \}\), we have
Combining with Lemma 1, this follows that
To bound \(\Vert \hat{\mathbf {w}}^*_{-R} - \mathbf {w}^*_{-R}\Vert \), we first rewrite \(\mathcal {L}(\mathbf {w};S_n{\setminus } R)\) as
where \(\mathbf {a}= \mathbf {c}^-_{-R} - \mathbf {c}^+_{-R}\), \(A_1 = \mathcal {S}^+_{-R} + \mathcal {S}^-_{-R} + \lambda \mathbf {I}_d\) and \(A_2 = (\mathbf {c}^-_{-R} - \mathbf {c}^+_{-R})(\mathbf {c}^-_{-R} - \mathbf {c}^+_{-R})^\top \). Similarly, we rewrite \(\hat{\mathcal {L}}(\mathbf {w};S_n{\setminus } R)\) as
Minimizing \(\mathcal {L}(\mathbf {w};S_n)\) and \(\hat{\mathcal {L}}(\mathbf {w},S_n)\) gives
It is easy to get
where \(\tau = \max (rank(X^+_n[X^+_n]^\top ),rank(X^-_n[X^-_n]^\top ))\), and the inequality comes from Lemma 3. Denote \(\varOmega = (A_1 + A_2)^{1/2} (\hat{A_1}+A_2)^{-1}(A_1 + A_2)^{1/2} - \mathbf {I}_d \), we have
which completes the proof by combining with Eq. (16) and Lemma 2.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Guo, JQ., Gao, W. (2022). Data Removal from an AUC Optimization Model. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-05933-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05932-2
Online ISBN: 978-3-031-05933-9
eBook Packages: Computer ScienceComputer Science (R0)