Skip to main content

Lazy Machine Unlearning Strategy for Random Forests

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2023)

Abstract

Removing the impact of some revoked training data from the machine learning models, i.e., machine unlearning, is a non-trivial task, which plays a pivotal role in fortifying the privacy and security of ML-based applications. This paper focuses on the problem of machine unlearning for random forests efficiently, with the streaming setting of revocation requests. The existing works are all devoted to speeding up the unlearning process of a single revocation request, none of works target at the streaming scenario. A straightforward solution is to carry out the unlearning technique tailored for a single request immediately when a new revocation request arrives. Undoubtedly, that is time-inefficient, since the time cost is proportional to the number of requests involved in the streaming. To solve this problem, this paper proposes a lazy unlearning strategy to carry out the unlearning operations involved in different revocation requests in a single batch, so as to avoid redundant computations and implement computation sharing. In particular, we adopt a node level unlearning policy, which carries out the unlearning operations on demand of the testing request by checking whether the revocation requests on this node can affect the inference of the testing request. Experiments on several real datasets show that compared to the baseline, lazy unlearning strategy can improve the unlearning efficiency by 1.1X-4X on different datasets and the number of retraining times is reduced to 1/4 on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bourtoule, L., et al.: Machine unlearning. In: IEEE Symposium on Security and Privacy, pp. 141–159 (2021)

    Google Scholar 

  2. Brophy, J., Lowd, D.: Machine unlearning for random forests. In: ICML. Proceedings of Machine Learning Research, vol. 139, pp. 1092–1104 (2021)

    Google Scholar 

  3. Cao, Y., Yang, J.: Towards making systems forget with machine unlearning. In: IEEE Symposium on Security and Privacy, pp. 463–480 (2015)

    Google Scholar 

  4. Criteo.: Criteo click-through rate prediction (2015). https://ailab.criteo.com/downloadcriteo-1tb-click-logs-dataset/. Accessed 25 Jan 2021

  5. Dua, D., Graff, C.: UCI machine learning repository (2019). https://archive.ics.uci.edu/ml

  6. Ginart, A., Guan, M.Y., Valiant, G., Zou, J.: Making AI forget you: data deletion in machine learning. In: NeurIPS, pp. 3513–3526 (2019)

    Google Scholar 

  7. Kaggle.: Medical appointment no shows (2016). https://www.kaggle.com/joniarroba/noshowappointments. Accessed 25 Jan 2021

  8. Kaggle.: 120 years of olympic history: Athletes and events (2018). https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results. Accessed 28 July 2020

  9. Kaggle.: Credit card fraud detection (2018). https://www.kaggle.com/mlg-ulb/creditcardfraud/. Accessed 27 July 2020

  10. Kaggle: Dataset surgical binary classification (2018). https://www.kaggle.com/omnamahshivai/surgical-dataset-binary-classification/version/1#. Accessed 29 July 2020

  11. Liu, X., Zhao, R., Zhang, Y., Zhang, F.: Prognosis prediction of breast cancer based on CGAN. In: Xing, C., Fu, X., Zhang, Y., Zhang, G., Borjigin, C. (eds.) WISA 2021. LNCS, vol. 12999, pp. 190–197. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87571-8_16

    Chapter  Google Scholar 

  12. Mantelero, A.: The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’. Comput. Law Secur. Rev. 29(3), 229–235 (2013)

    Google Scholar 

  13. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)

    Article  Google Scholar 

  14. Research, Administration, I.T.: Airline on-time performance and causes of flight delays (2019). https://catalog.data.gov/dataset/airline-on-time-performance-and-causes-of-flight-delays-on-time-data. Accessed 16 April 2020

  15. Schelter, S., Grafberger, S., Dunning, T.: Hedgecut: maintaining randomised trees for low-latency machine unlearning. In: SIGMOD Conference, pp. 1545–1557 (2021)

    Google Scholar 

  16. Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (2017). https://doi.org/10.1109/SP.2017.41

  17. Strack, B., et al.: Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed. Res. Int. 2014 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, N. et al. (2023). Lazy Machine Unlearning Strategy for Random Forests. In: Yuan, L., Yang, S., Li, R., Kanoulas, E., Zhao, X. (eds) Web Information Systems and Applications. WISA 2023. Lecture Notes in Computer Science, vol 14094. Springer, Singapore. https://doi.org/10.1007/978-981-99-6222-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-6222-8_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-6221-1

  • Online ISBN: 978-981-99-6222-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics