Find Important Training Dataset by Observing the Training Sequence Similarity

Liu, Zhengchang; Diao, Hang; Zhang, Fan; Khan, Samee U.

doi:10.1007/978-3-031-44213-1_34

Zhengchang Liu¹¹,
Hang Diao¹²,
Fan Zhang¹² &
…
Samee U. Khan¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14256))

Included in the following conference series:

International Conference on Artificial Neural Networks

1374 Accesses

Abstract

It is imperative to eliminate training data that has minimal impact on model accuracy. In addition to eliminating training data that share similar features, we propose a novel concept called training sequence, which signifies the trajectory of each training data in terms of correct or incorrect prediction during each training epoch. We eliminate training data that exhibit similar training trajectories. We complement this approach with the identification of hard-to-forget training data that consistently demonstrate accurate prediction. We conducted extensive experiments on various classical classification tasks and compared our approach with forgetting-score method. Our experimental findings demonstrate that our approach outperforms the forgetting-score approach by up to 13.2% and is particularly effective at low training data retention ratios, implying that our method can choose important training datasets with satisfactory performance. Our open-source code is available at the following link: https://github.com/sheldonlll/angle_method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Continual Learning with a Memory of Non-similar Samples

Unified regularity measures for sample-wise learning and generalization

Article Open access 31 December 2024

Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-Links

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009)
Google Scholar
Chang, H.S., Learned-Miller, E., McCallum, A.: Active bias: training more accurate neural networks by emphasizing high variance samples. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Adversarial examples are not bugs, they are features. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR (2018)
Google Scholar
Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: International Conference on Machine Learning, pp. 2525–2534. PMLR (2018)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Learning to learn from noisy labeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5051–5059 (2019)
Google Scholar
Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Article MathSciNet Google Scholar
Toneva, M., Sordoni, A., Combes, R.T.D., Trischler, A., Bengio, Y., Gordon, G.J.: An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159 (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, Y., Gan, W., Yang, J., Wu, W., Yan, J.: Dynamic curriculum learning for imbalanced data classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5017–5026 (2019)
Google Scholar
Wu, L., Zhu, Z., et al.: Towards understanding generalization of deep learning: perspective of loss landscapes. arXiv preprint arXiv:1706.10239 (2017)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Information Engineering, Nanjing Tech University, Nanjing, China
Zhengchang Liu
Ocean College, Zhejiang University, Zhoushan, China
Hang Diao & Fan Zhang
Mississippi State University, Starkville, MS, USA
Samee U. Khan

Authors

Zhengchang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hang Diao
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Samee U. Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Zhang .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Diao, H., Zhang, F., Khan, S.U. (2023). Find Important Training Dataset by Observing the Training Sequence Similarity. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14256. Springer, Cham. https://doi.org/10.1007/978-3-031-44213-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-44213-1_34
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44212-4
Online ISBN: 978-3-031-44213-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics