SRF: A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise

Mirylenka, Katsiaryna; Giannakopoulos, George; Palpanas, Themis

doi:10.1007/978-3-642-30217-6_10

Katsiaryna Mirylenka²³,
George Giannakopoulos²⁴ &
Themis Palpanas²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7301))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2935 Accesses
2 Citations

Abstract

Machine learning algorithms perform differently in settings with varying levels of training set mislabeling noise. Therefore, the choice of a good algorithm for a particular learning problem is crucial. In this paper, we introduce the “Sigmoid Rule” Framework focusing on the description of classifier behavior in noisy settings. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the parameters of the above sigmoid function using five different classifiers, namely, Naive Bayes, kNN, SVM, a decision tree classifier, and a rule-based classifier. Our study leads to the definition of intuitive criteria based on the sigmoid parameters that can be used to compare the behavior of learning algorithms in the presence of varying levels of noise. Furthermore, we show that there exists a connection between these parameters and the characteristics of the underlying dataset, hinting at how the inherent properties of a dataset affect learning. The framework is applicable to concept drift scenaria, including modeling user behavior over time, and mining of noisy data series, as in sensor networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

On classifier behavior in the presence of mislabeling noise

Article 05 December 2016

Probabilistic Hoeffding Trees

Dynamic Classifier Selection Based on Imprecise Probabilities: A Case Study for the Naive Bayes Classifier

References

Ali, S., Smith, K.A.: On learning algorithm selection for classification. Applied Soft Computing 6(2), 119–138 (2006)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Article Google Scholar
Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(10), 1404–1407 (2002)
Article Google Scholar
Chevaleyre, Y., Zucker, J.-D.: Noise-tolerant rule induction from multi-instance data. In: De Raedt, L. (ed.) Proceedings of the ICML 2000 Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries (2000)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: ICML (1995)
Google Scholar
de Sousa, E., Traina, A., Traina Jr., C., Faloutsos, C.: Evaluating the intrinsic dimension of evolving data streams. In: Proceedings of the 2006 ACM Symposium on Applied Computing, pp. 643–648. ACM (2006)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Giannakopoulos, G., Palpanas, T.: Adaptivity in entity subscription services. In: ADAPTIVE (2009)
Google Scholar
Giannakopoulos, G., Palpanas, T.: Content and type as orthogonal modeling features: a study on user interest awareness in entity subscription services. International Journal of Advances on Networks and Services 3(2) (2010)
Google Scholar
Giannakopoulos, G., Palpanas, T.: The effect of history on modeling systems’ performance: The problem of the demanding lord. In: ICDM (2010)
Google Scholar
Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Machine Learning 54(3), 187–193 (2004)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann (2006)
Google Scholar
Kalapanidas, E., Avouris, N., Craciun, M., Neagu, D.: Machine learning algorithms: a study on noise sensitivity. In: Proc. 1st Balcan Conference in Informatics, pp. 356–365 (2003)
Google Scholar
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to platt’s smo algorithm for svm classifier design. Neural Computation 13(3), 637–649 (2001)
Article MATH Google Scholar
Kuh, A., Petsche, T., Rivest, R.L.: Learning time-varying concepts. In: NIPS, pp. 183–189 (1990)
Google Scholar
Li, Q., Li, T., Zhu, S., Kambhamettu, C.: Improving medical/biological data classification performance by wavelet preprocessing. In: Proceedings ICDM Conference (2002)
Google Scholar
Pendrith, M., Sammut, C.: On reinforcement learning of control actions in noisy and non-markovian domains. Technical report, School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia (1994)
Google Scholar
Teytaud, O.: Learning with noise. Extension to regression. In: Proceedings of International Joint Conference on Neural Networks, IJCNN 2001, vol. 3, pp. 1787–1792. IEEE (2002)
Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press (2003)
Google Scholar
Wolpert, D.: The existence of a priori distinctions between learning algorithms. Neural Computation 8, 1391–1421 (1996)
Article Google Scholar
Wolpert, D.: The supervised learning no-free-lunch theorems. In: Proc. 6th Online World Conference on Soft Computing in Industrial Applications. Citeseer (2001)
Google Scholar
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Computation 8, 1341–1390 (1996)
Article Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A.F.M., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Trento, Italy
Katsiaryna Mirylenka & Themis Palpanas
Institute of Informatics and Telecommunications of NCSR Demokritos, Greece
George Giannakopoulos

Authors

Katsiaryna Mirylenka
View author publications
You can also search for this author in PubMed Google Scholar
George Giannakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Themis Palpanas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Michigan State University, 428 S. Shaw Lane, 48824-1226, East Lansing, MI, USA
Pang-Ning Tan
School of Information Technologies, University of Sydney, 1 Cleveland St., 2006, Sydney, NSW, Australia
Sanjay Chawla
Faculty of Computing and Informatics, Jalan Multimedia, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Chin Kuan Ho
Department of Computing and Information Systems, The University of Melbourne, 111 Barry Street, 3053, Melbourne, VIC, Australia
James Bailey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mirylenka, K., Giannakopoulos, G., Palpanas, T. (2012). SRF: A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30217-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-30217-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30216-9
Online ISBN: 978-3-642-30217-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics