Abstract
Removing the impact of some revoked training data from the machine learning models, i.e., machine unlearning, is a non-trivial task, which plays a pivotal role in fortifying the privacy and security of ML-based applications. This paper focuses on the problem of machine unlearning for random forests efficiently, with the streaming setting of revocation requests. The existing works are all devoted to speeding up the unlearning process of a single revocation request, none of works target at the streaming scenario. A straightforward solution is to carry out the unlearning technique tailored for a single request immediately when a new revocation request arrives. Undoubtedly, that is time-inefficient, since the time cost is proportional to the number of requests involved in the streaming. To solve this problem, this paper proposes a lazy unlearning strategy to carry out the unlearning operations involved in different revocation requests in a single batch, so as to avoid redundant computations and implement computation sharing. In particular, we adopt a node level unlearning policy, which carries out the unlearning operations on demand of the testing request by checking whether the revocation requests on this node can affect the inference of the testing request. Experiments on several real datasets show that compared to the baseline, lazy unlearning strategy can improve the unlearning efficiency by 1.1X-4X on different datasets and the number of retraining times is reduced to 1/4 on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bourtoule, L., et al.: Machine unlearning. In: IEEE Symposium on Security and Privacy, pp. 141–159 (2021)
Brophy, J., Lowd, D.: Machine unlearning for random forests. In: ICML. Proceedings of Machine Learning Research, vol. 139, pp. 1092–1104 (2021)
Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: IEEE Symposium on Security and Privacy, pp. 463–480 (2015)
Criteo.: Criteo click-through rate prediction (2015). https://ailab.criteo.com/downloadcriteo-1tb-click-logs-dataset/. Accessed 25 Jan 2021
Dua, D., Graff, C.: UCI machine learning repository (2019). https://archive.ics.uci.edu/ml
Ginart, A., Guan, M.Y., Valiant, G., Zou, J.: Making AI forget you: data deletion in machine learning. In: NeurIPS, pp. 3513–3526 (2019)
Kaggle.: Medical appointment no shows (2016). https://www.kaggle.com/joniarroba/noshowappointments. Accessed 25 Jan 2021
Kaggle.: 120 years of olympic history: Athletes and events (2018). https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results. Accessed 28 July 2020
Kaggle.: Credit card fraud detection (2018). https://www.kaggle.com/mlg-ulb/creditcardfraud/. Accessed 27 July 2020
Kaggle: Dataset surgical binary classification (2018). https://www.kaggle.com/omnamahshivai/surgical-dataset-binary-classification/version/1#. Accessed 29 July 2020
Liu, X., Zhao, R., Zhang, Y., Zhang, F.: Prognosis prediction of breast cancer based on CGAN. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds.) WISA 2021. LNCS, vol. 12999, pp. 190–197. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87571-8_16
Mantelero, A.: The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’. Comput. Law Secur. Rev. 29(3), 229–235 (2013)
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Research, Administration, I.T.: Airline on-time performance and causes of flight delays (2019). https://catalog.data.gov/dataset/airline-on-time-performance-and-causes-of-flight-delays-on-time-data. Accessed 16 April 2020
Schelter, S., Grafberger, S., Dunning, T.: Hedgecut: maintaining randomised trees for low-latency machine unlearning. In: SIGMOD Conference, pp. 1545–1557 (2021)
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (2017). https://doi.org/10.1109/SP.2017.41
Strack, B., et al.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed. Res. Int. 2014 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, N. et al. (2023). Lazy Machine Unlearning Strategy for Random Forests. In: Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X. (eds) Web Information Systems and Applications. WISA 2023. Lecture Notes in Computer Science, vol 14094. Springer, Singapore. https://doi.org/10.1007/978-981-99-6222-8_32
Download citation
DOI: https://doi.org/10.1007/978-981-99-6222-8_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6221-1
Online ISBN: 978-981-99-6222-8
eBook Packages: Computer ScienceComputer Science (R0)