Lazy Machine Unlearning Strategy for Random Forests

Sun, Nan; Wang, Ning; Wang, Zhigang; Nie, Jie; Wei, Zhiqiang; Liu, Peishun; Wang, Xiaodong; Qu, Haipeng

doi:10.1007/978-981-99-6222-8_32

Nan Sun ORCID: orcid.org/0009-0009-8354-2158¹²,
Ning Wang¹²,
Zhigang Wang¹²,
Jie Nie¹²,
Zhiqiang Wei¹²,
Peishun Liu¹²,
Xiaodong Wang¹² &
…
Haipeng Qu¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14094))

Included in the following conference series:

International Conference on Web Information Systems and Applications

608 Accesses

Abstract

Removing the impact of some revoked training data from the machine learning models, i.e., machine unlearning, is a non-trivial task, which plays a pivotal role in fortifying the privacy and security of ML-based applications. This paper focuses on the problem of machine unlearning for random forests efficiently, with the streaming setting of revocation requests. The existing works are all devoted to speeding up the unlearning process of a single revocation request, none of works target at the streaming scenario. A straightforward solution is to carry out the unlearning technique tailored for a single request immediately when a new revocation request arrives. Undoubtedly, that is time-inefficient, since the time cost is proportional to the number of requests involved in the streaming. To solve this problem, this paper proposes a lazy unlearning strategy to carry out the unlearning operations involved in different revocation requests in a single batch, so as to avoid redundant computations and implement computation sharing. In particular, we adopt a node level unlearning policy, which carries out the unlearning operations on demand of the testing request by checking whether the revocation requests on this node can affect the inference of the testing request. Experiments on several real datasets show that compared to the baseline, lazy unlearning strategy can improve the unlearning efficiency by 1.1X-4X on different datasets and the number of retraining times is reduced to 1/4 on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bourtoule, L., et al.: Machine unlearning. In: IEEE Symposium on Security and Privacy, pp. 141–159 (2021)
Google Scholar
Brophy, J., Lowd, D.: Machine unlearning for random forests. In: ICML. Proceedings of Machine Learning Research, vol. 139, pp. 1092–1104 (2021)
Google Scholar
Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: IEEE Symposium on Security and Privacy, pp. 463–480 (2015)
Google Scholar
Criteo.: Criteo click-through rate prediction (2015). https://ailab.criteo.com/downloadcriteo-1tb-click-logs-dataset/. Accessed 25 Jan 2021
Dua, D., Graff, C.: UCI machine learning repository (2019). https://archive.ics.uci.edu/ml
Ginart, A., Guan, M.Y., Valiant, G., Zou, J.: Making AI forget you: data deletion in machine learning. In: NeurIPS, pp. 3513–3526 (2019)
Google Scholar
Kaggle.: Medical appointment no shows (2016). https://www.kaggle.com/joniarroba/noshowappointments. Accessed 25 Jan 2021
Kaggle.: 120 years of olympic history: Athletes and events (2018). https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results. Accessed 28 July 2020
Kaggle.: Credit card fraud detection (2018). https://www.kaggle.com/mlg-ulb/creditcardfraud/. Accessed 27 July 2020
Kaggle: Dataset surgical binary classification (2018). https://www.kaggle.com/omnamahshivai/surgical-dataset-binary-classification/version/1#. Accessed 29 July 2020
Liu, X., Zhao, R., Zhang, Y., Zhang, F.: Prognosis prediction of breast cancer based on CGAN. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds.) WISA 2021. LNCS, vol. 12999, pp. 190–197. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87571-8_16
Chapter Google Scholar
Mantelero, A.: The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’. Comput. Law Secur. Rev. 29(3), 229–235 (2013)
Google Scholar
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Article Google Scholar
Research, Administration, I.T.: Airline on-time performance and causes of flight delays (2019). https://catalog.data.gov/dataset/airline-on-time-performance-and-causes-of-flight-delays-on-time-data. Accessed 16 April 2020
Schelter, S., Grafberger, S., Dunning, T.: Hedgecut: maintaining randomised trees for low-latency machine unlearning. In: SIGMOD Conference, pp. 1545–1557 (2021)
Google Scholar
Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (2017). https://doi.org/10.1109/SP.2017.41
Strack, B., et al.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed. Res. Int. 2014 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Ocean University of China, Qingdao, China
Nan Sun, Ning Wang, Zhigang Wang, Jie Nie, Zhiqiang Wei, Peishun Liu, Xiaodong Wang & Haipeng Qu

Authors

Nan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Nie
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Peishun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haipeng Qu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Wang .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Long Yuan
Guangzhou University, Guangzhou, China
Shiyu Yang
Huazhong University of Science and Technology, Wuhan, China
Ruixuan Li
University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas
National University of Defense Technology, Changsha, China
Xiang Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, N. et al. (2023). Lazy Machine Unlearning Strategy for Random Forests. In: Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X. (eds) Web Information Systems and Applications. WISA 2023. Lecture Notes in Computer Science, vol 14094. Springer, Singapore. https://doi.org/10.1007/978-981-99-6222-8_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-6222-8_32
Published: 09 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6221-1
Online ISBN: 978-981-99-6222-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Lazy Machine Unlearning Strategy for Random Forests