skip to main content
10.1145/3410530.3414366acmconferencesArticle/Chapter ViewAbstractPublication PagesubicompConference Proceedingsconference-collections
research-article

Identifying label noise in time-series datasets

Published: 12 September 2020 Publication History

Abstract

Reliably labeled datasets are crucial to the performance of supervised learning methods. Time-series data pose additional challenges. Data points lying on borders between classes can be mislabeled due to perception limitations of human labelers. Sensor measurements may not be directly interpretable by humans. Thus label noise cannot be manually removed. As a result, time-series datasets often contain a significant amount of label noise that can degrade the performance of machine learning models. This work focuses on label noise identification and removal by extending previous methods developed for static instances to the domain of time-series data. We use a combination of deep learning and visualization algorithms to facilitate automatic noise removal. We show that our approach can identify mislabeled instances, which results in improved classification accuracy on four synthetic and two real publicly available human activity datasets.

References

[1]
Carla E Brodley and Mark A Friedl. 1999. Identifying mislabeled training data. Journal of artificial intelligence research 11 (1999), 131--167.
[2]
Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr. 2018. Time series feature extraction on basis of scalable hypothesis tests (tsfresh-a python package). Neurocomputing 307 (2018), 72--77.
[3]
Benoît Frénay, Ata Kabán, et al. 2014. A comprehensive introduction to label noise. In ESANN.
[4]
Benoît Frénay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5 (2013), 845--869.
[5]
Hristijan Gjoreski, Mathias Ciliberto, Lin Wang, Francisco Javier Ordonez Morales, Sami Mekki, Stefan Valentin, and Daniel Roggen. 2018. The university of sussex-huawei locomotion and transportation dataset for multimodal analytics with mobile devices. IEEE Access 6 (2018), 42592--42604.
[6]
Donghai Guan and Weiwei Yuan. 2013. A survey of mislabeled training data detection techniques for pattern classification. IETE Technical Review 30, 6 (2013), 524--530.
[7]
Mohammed Waleed Kadous. 1999. Learning Comprehensible Descriptions of Multivariate Time Series. In ICML, Vol. 454. 463.
[8]
Hyeokhyen Kwon, Gregory D Abowd, and Thomas Plötz. 2019. Handling annotation uncertainty in human activity recognition. In Proceedings of the 23rd International Symposium on Wearable Computers. 109--117.
[9]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[10]
Song-Mi Lee, Sang Min Yoon, and Heeryon Cho. 2017. Human activity recognition from accelerometer data using Convolutional Neural Network. In 2017 ieee international conference on big data and smart computing (bigcomp). IEEE, 131--134.
[11]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.
[12]
Daniela Micucci, Marco Mobilio, and Paolo Napoletano. 2017. Unimib shar: A dataset for human activity recognition using acceleration data from smartphones. Applied Sciences 7, 10 (2017), 1101.
[13]
Nicolas M Müller and Karla Markert. 2019. Identifying Mislabeled Instances in Classification Datasets. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
[14]
Sreenivasan Ramasamy Ramamurthy and Nirmalya Roy. 2018. Recent trends in machine learning for human activity recognition---A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1254.
[15]
Liyue Zhao, Gita Sukthankar, and Rahul Sukthankar. 2011. Incremental relabeling for active learning with noisy crowdsourced annotations. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing. IEEE, 728--733.

Cited By

View all
  • (2025)Self-supervised learning reduces label noise in sharp wave ripple classificationScientific Reports10.1038/s41598-025-90380-x15:1Online publication date: 5-Mar-2025
  • (2024)Spindle Detection Based on Elastic Time Window and Spatial Pyramid PoolingJournal of Integrative Neuroscience10.31083/j.jin230713423:7Online publication date: 17-Jul-2024
  • (2024)Collecting Self-reported Physical Activity and Posture Data Using Audio-based Ecological Momentary AssessmentProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785848:3(1-35)Online publication date: 9-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
UbiComp/ISWC '20 Adjunct: Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers
September 2020
732 pages
ISBN:9781450380768
DOI:10.1145/3410530
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CNN
  2. accelerometer
  3. human activity recognition
  4. label cleaning
  5. label noise
  6. neural networks
  7. time-series data

Qualifiers

  • Research-article

Conference

UbiComp/ISWC '20

Acceptance Rates

Overall Acceptance Rate 764 of 2,912 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Self-supervised learning reduces label noise in sharp wave ripple classificationScientific Reports10.1038/s41598-025-90380-x15:1Online publication date: 5-Mar-2025
  • (2024)Spindle Detection Based on Elastic Time Window and Spatial Pyramid PoolingJournal of Integrative Neuroscience10.31083/j.jin230713423:7Online publication date: 17-Jul-2024
  • (2024)Collecting Self-reported Physical Activity and Posture Data Using Audio-based Ecological Momentary AssessmentProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785848:3(1-35)Online publication date: 9-Sep-2024
  • (2024)Polkadot Cryptocurrency Close Price Prediction Using Machine Learning2024 4th International Conference of Science and Information Technology in Smart Administration (ICSINTESA)10.1109/ICSINTESA62455.2024.10748125(259-264)Online publication date: 12-Jul-2024
  • (2024)Data cleaning and machine learning: a systematic literature reviewAutomated Software Engineering10.1007/s10515-024-00453-w31:2Online publication date: 11-Jun-2024
  • (2023)CTWProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/450(4046-4054)Online publication date: 19-Aug-2023
  • (2023)Convolutional Multiple Instance Learning for Sleep Spindle Detection With Label RefinementIEEE Transactions on Cognitive and Developmental Systems10.1109/TCDS.2022.315928515:1(272-284)Online publication date: Mar-2023
  • (2023)Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research DirectionsArchives of Computational Methods in Engineering10.1007/s11831-023-09986-x31:1(179-219)Online publication date: 12-Aug-2023
  • (2023)Rts: learning robustly from time series data with noisy labelFrontiers of Computer Science10.1007/s11704-023-3200-z18:6Online publication date: 28-Dec-2023
  • (2022)AutoreviseProceedings of the 37th ACM/SIGAPP Symposium on Applied Computing10.1145/3477314.3507222(81-84)Online publication date: 25-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media