research-article

Identifying label noise in time-series datasets

Authors:

Gentry Atkinson,

Vangelis MetsisAuthors Info & Claims

UbiComp/ISWC '20 Adjunct: Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers

Pages 238 - 243

https://doi.org/10.1145/3410530.3414366

Published: 12 September 2020 Publication History

Abstract

Reliably labeled datasets are crucial to the performance of supervised learning methods. Time-series data pose additional challenges. Data points lying on borders between classes can be mislabeled due to perception limitations of human labelers. Sensor measurements may not be directly interpretable by humans. Thus label noise cannot be manually removed. As a result, time-series datasets often contain a significant amount of label noise that can degrade the performance of machine learning models. This work focuses on label noise identification and removal by extending previous methods developed for static instances to the domain of time-series data. We use a combination of deep learning and visualization algorithms to facilitate automatic noise removal. We show that our approach can identify mislabeled instances, which results in improved classification accuracy on four synthetic and two real publicly available human activity datasets.

References

[1]

Carla E Brodley and Mark A Friedl. 1999. Identifying mislabeled training data. Journal of artificial intelligence research 11 (1999), 131--167.

[2]

Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr. 2018. Time series feature extraction on basis of scalable hypothesis tests (tsfresh-a python package). Neurocomputing 307 (2018), 72--77.

Digital Library

[3]

Benoît Frénay, Ata Kabán, et al. 2014. A comprehensive introduction to label noise. In ESANN.

[4]

Benoît Frénay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5 (2013), 845--869.

[5]

Hristijan Gjoreski, Mathias Ciliberto, Lin Wang, Francisco Javier Ordonez Morales, Sami Mekki, Stefan Valentin, and Daniel Roggen. 2018. The university of sussex-huawei locomotion and transportation dataset for multimodal analytics with mobile devices. IEEE Access 6 (2018), 42592--42604.

[6]

Donghai Guan and Weiwei Yuan. 2013. A survey of mislabeled training data detection techniques for pattern classification. IETE Technical Review 30, 6 (2013), 524--530.

[7]

Mohammed Waleed Kadous. 1999. Learning Comprehensible Descriptions of Multivariate Time Series. In ICML, Vol. 454. 463.

[8]

Hyeokhyen Kwon, Gregory D Abowd, and Thomas Plötz. 2019. Handling annotation uncertainty in human activity recognition. In Proceedings of the 23rd International Symposium on Wearable Computers. 109--117.

Digital Library

[9]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[10]

Song-Mi Lee, Sang Min Yoon, and Heeryon Cho. 2017. Human activity recognition from accelerometer data using Convolutional Neural Network. In 2017 ieee international conference on big data and smart computing (bigcomp). IEEE, 131--134.

[11]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.

[12]

Daniela Micucci, Marco Mobilio, and Paolo Napoletano. 2017. Unimib shar: A dataset for human activity recognition using acceleration data from smartphones. Applied Sciences 7, 10 (2017), 1101.

[13]

Nicolas M Müller and Karla Markert. 2019. Identifying Mislabeled Instances in Classification Datasets. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.

[14]

Sreenivasan Ramasamy Ramamurthy and Nirmalya Roy. 2018. Recent trends in machine learning for human activity recognition---A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1254.

[15]

Liyue Zhao, Gita Sukthankar, and Rahul Sukthankar. 2011. Incremental relabeling for active learning with noisy crowdsourced annotations. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. IEEE, 728--733.

Cited By

Graf SMeyrand PHerry CBem TTsai F(2025)Self-supervised learning reduces label noise in sharp wave ripple classificationScientific Reports10.1038/s41598-025-90380-x15:1Online publication date: 5-Mar-2025
https://doi.org/10.1038/s41598-025-90380-x
Ou YWang FFeng BTang LPan J(2024)Spindle Detection Based on Elastic Time Window and Spatial Pyramid PoolingJournal of Integrative Neuroscience10.31083/j.jin230713423:7Online publication date: 17-Jul-2024
https://doi.org/10.31083/j.jin2307134
Le HLakshminarayanan RLi JMishra VIntille S(2024)Collecting Self-reported Physical Activity and Posture Data Using Audio-based Ecological Momentary AssessmentProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785848:3(1-35)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678584
Show More Cited By

Index Terms

Identifying label noise in time-series datasets
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

TSAR: a Time Series Assisted Relabeling Tool for Reducing Label Noise
PETRA '21: Proceedings of the 14th PErvasive Technologies Related to Assistive Environments Conference

Accurately detecting instances in datasets that have been mislabeled is a difficult problem with several imperfect solutions. Hand-reviewing labels is a reliable but expensive approach. Time series datasets present additional challenges because they ...
Analysis of label noise in graph-based semi-supervised learning
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

In machine learning, one must acquire labels to help supervise a model that will be able to generalize to unseen data. However, the labeling process can be tedious, long, costly, and error-prone. It is often the case that most of our data is unlabeled. ...
Noisy multi-label semi-supervised dimensionality reduction
Highlights
- A new semi-supervised and label noise-tolerant multi-label dimensionality reduction method.
Abstract
Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UbiComp/ISWC '20 Adjunct: Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers

September 2020

732 pages

ISBN:9781450380768

DOI:10.1145/3410530

General Chairs:
Monica Tentori
CICESE, Mexico
,
Nadir Weibel
UC San Diego
,
Kristof Van Laerhoven
University of Siegen, Germany
,
Program Chairs:
Gregory Abowd
Georgia Tech
,
Flora Salim
RMIT, Australia

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UbiComp/ISWC '20

Sponsor:

UbiComp/ISWC '20: 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and 2020 ACM International Symposium on Wearable Computers

September 12 - 17, 2020

Virtual Event, Mexico

Acceptance Rates

Overall Acceptance Rate 764 of 2,912 submissions, 26%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
362
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Graf SMeyrand PHerry CBem TTsai F(2025)Self-supervised learning reduces label noise in sharp wave ripple classificationScientific Reports10.1038/s41598-025-90380-x15:1Online publication date: 5-Mar-2025
https://doi.org/10.1038/s41598-025-90380-x
Ou YWang FFeng BTang LPan J(2024)Spindle Detection Based on Elastic Time Window and Spatial Pyramid PoolingJournal of Integrative Neuroscience10.31083/j.jin230713423:7Online publication date: 17-Jul-2024
https://doi.org/10.31083/j.jin2307134
Le HLakshminarayanan RLi JMishra VIntille S(2024)Collecting Self-reported Physical Activity and Posture Data Using Audio-based Ecological Momentary AssessmentProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785848:3(1-35)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678584
Abdiwijaya ELucky H(2024)Polkadot Cryptocurrency Close Price Prediction Using Machine Learning2024 4th International Conference of Science and Information Technology in Smart Administration (ICSINTESA)10.1109/ICSINTESA62455.2024.10748125(259-264)Online publication date: 12-Jul-2024
https://doi.org/10.1109/ICSINTESA62455.2024.10748125
Côté PNikanjam AAhmed NHumeniuk DKhomh F(2024)Data cleaning and machine learning: a systematic literature reviewAutomated Software Engineering10.1007/s10515-024-00453-w31:2Online publication date: 11-Jun-2024
https://doi.org/10.1007/s10515-024-00453-w
Ma PLiu ZZheng JWang LMa QElkind E(2023)CTWProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/450(4046-4054)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/450
Sun XQi YWang YPan G(2023)Convolutional Multiple Instance Learning for Sleep Spindle Detection With Label RefinementIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2022.315928515:1(272-284)Online publication date: Mar-2023
https://doi.org/10.1109/TCDS.2022.3159285
Kumar PChauhan SAwasthi L(2023)Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research DirectionsArchives of Computational Methods in Engineering10.1007/s11831-023-09986-x31:1(179-219)Online publication date: 12-Aug-2023
https://doi.org/10.1007/s11831-023-09986-x
Zhou ZJin YLi Y(2023)Rts: learning robustly from time series data with noisy labelFrontiers of Computer Science10.1007/s11704-023-3200-z18:6Online publication date: 28-Dec-2023
https://doi.org/10.1007/s11704-023-3200-z
Piane JWang YMa XFurst JRaicu DHong JBures MPark JCerny T(2022)AutoreviseProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507222(81-84)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3477314.3507222
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten