Rts: learning robustly from time series data with noisy label

Zhou, Zhi; Jin, Yi-Xuan; Li, Yu-Feng

doi:10.1007/s11704-023-3200-z

Rts: learning robustly from time series data with noisy label

Research Article
Published: 28 December 2023

Volume 18, article number 186332, (2024)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Zhi Zhou¹,
Yi-Xuan Jin¹ &
Yu-Feng Li¹

121 Accesses
4 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

Significant progress has been made in machine learning with large amounts of clean labels and static data. However, in many real-world applications, the data often changes with time and it is difficult to obtain massive clean annotations, that is, noisy labels and time series are faced simultaneously. For example, in product-buyer evaluation, each sample records the daily time behavior of users, but the long transaction period brings difficulties to analysis, and salespeople often erroneously annotate the user’s purchase behavior. Such a novel setting, to our best knowledge, has not been thoroughly studied yet, and there is still a lack of effective machine learning methods. In this paper, we present a systematic approach RTS both theoretically and empirically, consisting of two components, Noise-Tolerant Time Series Representation and Purified Oversampling Learning. Specifically, we propose reducing label noise’s destructive impact to obtain robust feature representations and potential clean samples. Then, a novel learning method based on the purified data and time series oversampling is adopted to train an unbiased model. Theoretical analysis proves that our proposal can improve the quality of the noisy data set. Empirical experiments on diverse tasks, such as the house-buyer evaluation task from real-world applications and various benchmark tasks, clearly demonstrate that our new algorithm robustly outperforms many competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams

A Survey of Methods for Detection and Correction of Noisy Labels in Time Series Data

Enhancing Continual Noisy Label Learning with Uncertainty-Based Sample Selection and Feature Enhancement

References

Zhou Z H. Machine Learning. Singapore: Springer, 2021
Book Google Scholar
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
Adomavicius G, Tuzhilin A. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(6): 734–749
Article Google Scholar
Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 2001, 23(1): 89–109
Article Google Scholar
Cao H, Li X L, Woon Y K, Ng S K. SPO: structure preserving oversampling for imbalanced time series classification. In: Proceedings of the 11th IEEE International Conference on Data Mining. 2011, 1008–1013
Guo L Z, Kuang F, Liu Z X, Li Y F, Ma N, Qie X H. IWE-Net: instance weight network for locating negative comments and its application to improve traffic user experience. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 4052–4059
Frenay B, Verleysen M. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 845–869
Article Google Scholar
Atkinson G, Metsis V. A survey of methods for detection and correction of noisy labels in time series data. In: Proceedings of the 17th International Conference on Artificial Intelligence Applications and Innovations. 2021, 479–493
Wei T, Wang H, Tu W, Li Y F. Robust model selection for positive and unlabeled learning with constraints. Science China Information Science, 2022, 65(11): 212101
Article MathSciNet Google Scholar
Pelletier C, Valero S, Inglada J, Champion N, Sicre C M, Dedieu G. Effect of training class label noise on classification performances for land cover mapping with satellite image time series. Remote Sensing, 2017, 9(2): 173
Article Google Scholar
Castellani A, Schmitt S, Hammer B. Estimating the electrical power output of industrial devices with end-to-end time-series classification in the presence of label noise. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2021, 469–484
Atkinson G, Metsis V. Identifying label noise in time-series datasets. In: Proceedings of 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of 2020 ACM International Symposium on Wearable Computers. 2020, 238–243
Cao H, Li X L, Woon D Y K, Ng S K. Integrated oversampling for imbalanced time series classification. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(12): 2809–2822
Article Google Scholar
Kim B, Choi J H, Choo J. Augmenting imbalanced time-series data via adversarial perturbation in latent space. In: Proceedings of the 13th Asian Conference on Machine Learning. 2021, 1633–1644
Huang H, Xu C, Yoo S, Yan W, Wang T, Xue F. Imbalanced time series classification for flight data analyzing with nonlinear granger causality learning. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020, 2533–2540
Geng Y, Luo X. Cost-sensitive convolutional neural networks for imbalanced time series classification. Intelligent Data Analysis, 2019, 23(2): 357–370
Article Google Scholar
Ward M, Malmsten K, Salamy H, Min C H. Data balanced bagging ensemble of convolutional- LSTM neural networks for time series data classification with an imbalanced dataset. In: Proceedings of 2021 IEEE International Symposium on Circuits and Systems. 2021, 1–5
Wei T, Shi J X, Tu W W, Li Y F. Robust long-tailed learning under label noise. 2021, arXiv preprint arXiv: 2108.11569
Wei T, Shi J X, Li Y F, Zhang M L. Prototypical classifier for robust class-imbalanced learning. In: Proceedings of the 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2022, 44–57
Gui X J, Wang W, Tian Z H. Towards understanding deep learning from noisy labels with small-loss criterion. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 2469–2475
Lukasik M, Bhojanapalli S, Menon A K, Kumar S. Does label smoothing mitigate label noise?. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 598
Laine S, Aila T. Temporal ensembling for semi-supervised learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017
Han B, Yao Q, Liu T, Niu G, Tsang I W, Kwok J T, Sugiyama M. A survey of label-noise representation learning: past, present and future. 2021, arXiv preprint arXiv: 2011.04406
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I W, Sugiyama M. Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8536–8546
Li Y F, Liang D M. Safe semi-supervised learning: a brief introduction. Frontiers of Computer Science, 2019, 13(4): 669–676
Article Google Scholar
Jia L H, Guo L Z, Zhou Z, Li Y F. Lamda-ssl: a comprehensive semi-supervised learning toolkit. Science China Information Science, 2023
Dau H A, Bagnall A J, Kamgar K, Yeh C M, Zhu Y, Gharghabi S, Ratanamahatana C A, Keogh E J. The UCR time series archive. IEEE/CAA Journal of Automatica Sinica, 2019, 6(6): 1293–1305
Article Google Scholar
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
Tavenard R, Faouzi J, Vandewiele G, Divo F, Androz G, Holtz C, Payne M, Yurchak R, Rußwurm M, Kolar K, Woods E. Tslearn, a machine learning toolkit for time series data. The Journal of Machine Learning Research, 2020, 21(1): 118
Google Scholar
Han B, Niu G, Yu X, Yao Q, Xu M, Tsang I W, Sugiyama M. SIGUA: forgetting may make learning with noisy labels more robust. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 4006–4016
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321–357
Article Google Scholar
Cao K, Wei C, Gaidon A, Arechiga N, Ma T. Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 140
Brodersen K H, Ong C S, Stephan K E, Buhmann J M. The balanced accuracy and its posterior distribution. In: Proceedings of the 20th International Conference on Pattern Recognition. 2010, 3121–3124

Download references

Acknowledgements

This research was supported by the National Key R&D Program of China (2022YFC3340901) and the National Natural Science Foundation of China (Grant No. 62176118).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Zhi Zhou, Yi-Xuan Jin & Yu-Feng Li

Authors

Zhi Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Yi-Xuan Jin
View author publications
You can also search for this author inPubMed Google Scholar
Yu-Feng Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yu-Feng Li.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Zhi Zhou received a BSc degree from Jilin University, China in 2020. He is currently working toward a PhD degree with the National Key Laboratory for Novel Software Technology, Nanjing University, China. His research interests include weakly-supervised learning, representation learning, and out-of-distribution generalization.

Yi-Xuan Jin received a BSc degree from Northwestern Polytechnical University, China in 2021. He is currently working toward a MS degree with the National Key Laboratory for Novel Software Technology, Nanjing University, China. His research interests include noisy label learning, model reuse and learnware.

Yu-Feng Li received the BSc and PhD degrees in computer science from Nanjing University, China in 2006 and 2013, respectively. He joined the National Key Laboratory for Novel Software Technology at Nanjing University, China in 2013 and is currently a professor. He is a member of the LAMDA group. He is interested in weakly supervised learning, statistical learning, and optimization. He has received an outstanding doctoral dissertation award from China Computer Federation (CCF) and Jiangsu Province. He published more than 70 papers in top-tier journals and conferences such as JMLR, TPAMI, ICML, NIPS. He served as an editorial board member of MLJ, co-chair of ACML22/21 journal track, and area chair of top-tier conferences such as ICML23/22, AISTATS23, NeurIPS23/22, and IJCAI21.

Electronic Supplementary Material