Abstract
New international regulations concerning personal management data guarantee the ‘Right to Be Forgotten’. One might request to have their data erased from third-party tools and services. This requirement is especially challenging when considering the behavior of machine learning estimators that will need to forget portions of their knowledge. In this paper, we investigate the impact of these learning and forgetting policies in data stream learning. In data stream mining, the sheer volume of instances typically makes it unfeasible to store the data or retraining the learning models from scratch. Hence, more efficient solutions are needed to deal with the dynamic nature of online machine learning. We modify an incremental k-NN classifier to enable it to erase its past data and we also investigate the impact of data forgetting in the obtained predictive performance. Our proposal is compared against the original k-NN algorithm using seven non-stationary stream datasets. Our results show that the forgetting-enabled algorithm can achieve similar prediction patterns compared to the vanilla one, although it yields lower predictive performance at the beginning of the learning process. Such a scenario is a typical cold-start behavior often observed in data stream mining applications, and not necessarily related to the employed forgetting mechanisms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Available at https://scikit-multiflow.github.io/.
- 2.
References
Albertini, M.K., de Mello, R.F.: A self-organizing neural network to approach novelty detection. In: Machine Learning: Concepts, Methodologies, Tools and Applications. IGIGlobal (2012)
Alves, C., Bernardini, F., Meza, E.B.M., Sousa, L.: Evaluating the behaviour of stream learning algorithms for detecting invasion on wireless networks. Int. J. Secur. Netw. 15(3), 133–140 (2020)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis, vol. 11 (2010). http://portal.acm.org/citation.cfm?id=1859903
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)
European Commission: ethics guidelines for trustworthy AI (2019). https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai. Accessed 17 July 2020
European parliament: general data protection regulation (2016). https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679. Accessed 18 May 2020
Faial, D., Bernardini, F., Meza, E.M., Miranda, L., Viterbo, J.: A methodology for taxi demand prediction using stream learning. In: 2020 International Conference on Systems, Signals and Image Processing (IWSSIP) (2020)
Faial, D., Bernardini, F., Miranda, L., Viterbo, J.: Anomaly detection in vehicle traffic data using batch and stream supervised learning. In: Moura Oliveira, P., Novais, P., Reis, L.P. (eds.) EPIA 2019. LNCS (LNAI), vol. 11804, pp. 675–684. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30241-2_56
Frank, E., Hall, M.A., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, Burlington (2016)
Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Holzinger, A., et al.: Machine learning and knowledge extraction in digital pathology needs an integrative approach. In: Holzinger, A., Goebel, R., Ferri, M., Palade, V. (eds.) Towards Integrative Machine Learning and Knowledge Extraction. LNCS (LNAI), vol. 10344, pp. 13–50. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69775-8_2
Jantke, P.: Types of incremental learning. In: Proceedings of the AAAI Symposium on Training Issues in Incremental Learning (1993)
Lemaire, V., Salperwyck, C., Bondu, A.: A survey on supervised classification on data streams. In: Zimányi, E., Kutsche, R.-D. (eds.) eBISS 2014. LNBIP, vol. 205, pp. 88–125. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17551-5_4
Losing, V., Hammer, B., Wersing, H.: KNN classifier with self adjusting memory for heterogeneous concept drift. In: Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM), pp. 291–300 (2016)
Manapragada, C., Webb, G., Salehi, M.: Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2018, pp. 1953–1962 (2018)
Mellado, D., Saavedra, C., Chabert, S., Torres, R., Salas, R.: Self-improving generative artificial neural network for pseudorehearsal incremental class learning. Algorithms 12, 206 (2019)
Mirzasoleiman, B., Karbasi, A., Krause, A.: Deletion-robust submodular maximization: data summarization with the “right to be forgotten”. In: Proceedings of 34th International Conference on Machine Learning, Proceedings Machine Learning Research, vol. 70, pp. 2449–2458 (2017)
Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(1), 2915–2914 (2018)
Polikar, R., Udpa, L., Udpa, S.S., Honavar, V.: LEARN++: an incremental learning algorithm for multilayer perceptron networks. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (2000)
Politou, E., Alepis, E., Patsakis, C.: Forgetting personal data and revoking consent under the GDPR: challenges and proposed solutions. J. Cybersecur. 4(1), tyy001 (2018)
Villaronga, E.F., Kieseberg, P.T.L.: Humans forget, machines remember: artificial intelligence and the right to be forgotten. Comput. Law Secur. Rev. 34(2), 304–313 (2018)
Zamora-Martínez, F., Romeu, P., Botella-Rocamora, P., Pardo, J.: On-line learning of indoor temperature forecasting models towards energy efficiency. Energ. Build. 83, 162–172 (2014)
Acknowledgments
This research was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES), Process n. 88882.183880; and PIBIC/CNPQ/UFF. We also gratefully acknowledge Albert Bifet and Paristech University, who hosted Flávia Bernardini for a week and allowed us to have discussions to achieve these results.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Libera, C., Miranda, L., Bernardini, F., Mastelini, S., Viterbo, J. (2022). ‘Right to Be Forgotten’: Analyzing the Impact of Forgetting Data Using K-NN Algorithm in Data Stream Learning. In: Janssen, M., et al. Electronic Government. EGOV 2022. Lecture Notes in Computer Science, vol 13391. Springer, Cham. https://doi.org/10.1007/978-3-031-15086-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-15086-9_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15085-2
Online ISBN: 978-3-031-15086-9
eBook Packages: Computer ScienceComputer Science (R0)