loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Rashida Hasan and Cheehung Henry Chu

Affiliation: Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, Louisiana, U.S.A.

Keyword(s): Machine Learning, Classifiers, Learning From Noisy Data, Class Noise, Attribute Noise.

Abstract: Classification is one of the fundamental tasks in machine learning. The quality of data is important in constructing any machine learning model with good prediction performance. Real-world data often suffer from noise which is usually referred to as errors, irregularities, and corruptions in a dataset. However, we have no control over the quality of data used in classification tasks. The presence of noise in a dataset poses three major negative consequences, viz. (i) a decrease in the classification accuracy (ii) an increase in the complexity of the induced classifier (iii) an increase in the training time. Therefore, it is important to systematically explore the effects of noise in classification performance. Even though there have been published studies on the effect of noise either for some particular learner or for some particular noise type, there is a lack of study where the impact of different noise on different learners has been investigated. In this work, we focus on both sc enarios: various learners and various noise types and provide a detailed analysis of their effects on the prediction performance. We use five different classifiers (J48, Naive Bayes, Support Vector Machine, k-Nearest Neighbor, Random Forest) and 10 benchmark datasets from the UCI machine learning repository and three publicly available image datasets. Our results can be used to guide the development of noise handling mechanisms. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.227.24.209

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Hasan, R. and Chu, C. (2022). Noise in Datasets: What Are the Impacts on Classification Performance?. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-549-4; ISSN 2184-4313, SciTePress, pages 163-170. DOI: 10.5220/0010782200003122

@conference{icpram22,
author={Rashida Hasan. and Cheehung Henry Chu.},
title={Noise in Datasets: What Are the Impacts on Classification Performance?},
booktitle={Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2022},
pages={163-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010782200003122},
isbn={978-989-758-549-4},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Noise in Datasets: What Are the Impacts on Classification Performance?
SN - 978-989-758-549-4
IS - 2184-4313
AU - Hasan, R.
AU - Chu, C.
PY - 2022
SP - 163
EP - 170
DO - 10.5220/0010782200003122
PB - SciTePress