Solving classification problems by knowledge sets
Introduction
One of the possibilities to improve generalization performance for classification problems is to incorporate additional knowledge, sometimes called prior knowledge. Various types of prior knowledge have been already incorporated to SVM. In [1], the authors distinguish two types of prior knowledge: knowledge about class invariance, and knowledge about the data. The first type includes knowledge about classification in regions of the input space, [2], [3], [4], [5], knowledge about class invariance during transformation of the input. The second type includes knowledge about unlabeled examples, imbalance of classes, and quality of the data. In [2], [3], the authors proposed informally a concept of knowledge sets: as for example cubes supposed to belong to one of the two categories; they concentrated on incorporating prior knowledge in the form of polyhedral knowledge sets. In this paper, instead of incorporating prior knowledge, we use a concept of knowledge sets to model a standard classification problem, based only on training examples. We can interpret a knowledge set as information about classification for a set of data points in the input space. A decision boundary is supposed to lie outside knowledge sets (in an uncertain set). The similar concept of uncertainty is related to version spaces, which were used in Bayes point machines (BPS), [6]. A version space is a set of hypotheses that are consistent with a training sample. A soft version space is a version space where an error in classifying training data is allowed and is controlled by a parameter. The BPS method from each version space finds a representative candidate for a solution as a Bayes point, which is approximated by the center of mass of a polyhedron. In [7], the authors instead of a version space maintain a set of possible weight-vectors in the form of an axis-aligned box and they choose the candidate with the center of mass of a box. In BPS, a final version space is chosen according to the empirical test error, while in [7] the authors compare different boxes by using the principles from SVM: the principle of the empirical risk minimization (ERM) and the structural risk minimization (SRM) for the worst case hypothesis from the box. They also added the third principle of large volume. Large volume transductive principle was briefly treated in [8] for the case of hyperplanes and extended in [9]. In our approach, we deal with uncertainty in the input space instead of a hypothesis space. We propose a theoretical model of knowledge sets, where we define knowledge sets and an uncertain set. The knowledge sets are defined purely on sets, without assuming any particular space for elements or shapes, like boxes.
There are at least three models of a classification problem [10]: the risk minimization model, estimating the regression function of expected conditional probabilities of classification for given data, the Bayes approach of estimating density functions for conditional probabilities of data for particular classes. None of the above models is suitable for the concept of knowledge sets, so we propose a new classification model, called a knowledge set model. Remark 1 In the knowledge set model, first, we generate knowledge sets. Then, we find a classifier based on the knowledge sets.
The most known method of classification based on predicting density functions from sample data is a Bayes classifier which is an intersection of density functions. Density functions are predicted by using for example Kernel Density Estimation (KDE), [11]. In this paper, we propose a classification method based on the knowledge set model, called knowledge set machines (KSM). In the proposed method instead of predicting directly the decision boundary from estimated density functions, we add an intermediate step – constructing knowledge sets. Then, we find a classifier based on knowledge sets by using the maximal margin principle used for example in SVM. Knowledge sets can be interpreted as partitioning an input space. The most known algorithm of partitioning for classification are decision trees which creates boxes with particular classification. There were some attempts to improve partitioning by using tighter boxes, covering only part of the input space, and using another boxes for classifying the rest of the space, [12].
The outline of the paper is as follows. First, we will analyze a knowledge set model. Then, we will present the KSM method based on this model. Finally, we will show experiments and results. The introduction to SVM and density estimation is in Appendix B Support vector classification basics, Appendix C Density estimation basics respectively.
Section snippets
Knowledge set model
At the beginning, we present some basic definitions and propositions. Notation for a knowledge set model is described in Appendix Appendix A. We will define some mathematical structures on a set, which will consist of the environment objects – common for proposed structures, and main objects. We propose the following set of environment objects, E: a universe X of possible elements of the set S, a set C of possible classes, and a set of mappings M, mapping some of to some class , . We
The classification method based on knowledge sets
We propose a classification method based on knowledge sets, which we call KSM. Consider the task of binary classification problem, with predicted and . In the analysis of uncertain models, we have found that we should look for margin and possibly superlevel knowledge sets. Because we do not know the prediction bands and the limited domain, we should test margin and superlevel knowledge sets with different values of the parameters. If we have limited number of possible tests, we
Experiments
We compare performance of SVM with KSM for various real world data sets. We chose all real world data sets for binary classification from the LibSVM site [19] which originally come from UCI Machine Learning Repository and Statlog (for the aia data sets, we chose only a1a; the covtype data set is reduced to the first 25,000 data vectors); see the details about the data sets in Table 1. We use LIBSVM [20] for running internally SVM in all methods. For all data sets, every feature is scaled
Summary
In this paper, we proposed a novel theoretical classification model of knowledge sets. We derived some bounds for a classification error. We proposed incorporation of four models of uncertainty to a knowledge set model: limited domain, prediction bands, limited prediction bands and limited measure of an uncertain set. We derived the SVM method from the proposed model. The advantage of a knowledge set model is greater understanding of classification problems and the source of ideas for designing
Acknowledgments
I would like to express my sincere gratitude to Professor Witold Dzwinel and Marcin Kurdziel (AGH University of Science and Technology, Department of Computer Science) and Professor Stan Matwin (Dalhousie University) and Roman Staszczyk for discussion and useful suggestions. The research is financed by the National Science Centre, project id 217859, UMO-2013/09/B/ST6/01549, titled “Interactive Visual Text Analytics (IVTA): Development of novel user-driven text mining and visualization methods
Marcin Orchel received the Ph.D. degree in computer science from AGH University of Science and Technology, Poland, in 2013.
He is currently an Assistant Professor in the Department of Computer Science at AGH University of Science and Technology, Poland. His current research interests include support vector machines, machine learning, data mining, and stock price prediction.
References (28)
- et al.
Incorporating prior knowledge in support vector machines for classificationa review
Neurocomputing
(2008) - et al.
A theoretical framework for supervised learning from regions
Neurocomputing
(2014) - et al.
Tuning svm parameters by using a hybrid clpso-bfgs algorithm
Neurocomputing
(2010) - et al.
A pso and pattern search based memetic algorithm for svms parameters optimization
Neurocomputing
(2013) - et al.
Knowledge-based support vector machine classifiers
- G.M. Fung, O.L. Mangasarian, J.W. Shavlik, Knowledge-based nonlinear kernel classifiers, in: Learning Theory and Kernel...
- et al.
Nonlinear knowledge-based classification
IEEE Trans. Neural Netw.
(2008) - et al.
Bayes point machines
J. Mach. Learn. Res.
(2001) - K. Crammer, T. Wagner, Volume regularization for binary classification, in: P.L. Bartlett, F.C.N. Pereira, C.J.C....
Estimation of Dependences Based on Empirical Data: Springer Series in (Statistics Springer Series in Statistics)
(1982)
Large margin vs. large volume in transductive learning
Mach. Learn.
Systemy uczace sie: rozpoznawanie wzorcow, analiza skupien i redukcja wymiarowosci
Cited by (2)
A knowledge based machine tool maintenance planning system using case-based reasoning techniques
2019, Robotics and Computer-Integrated ManufacturingCitation Excerpt :Through knowledge representation, Case retrieving, Case adaptation and Case revising, new knowledge will be stored as Cases in the knowledge base. The definition of knowledge is normally bonded with data and information [15, 16]. Data, information, knowledge and intelligence are regarded as Pyramid of Knowledge [1].
Knowledge-Uncertainty Axiomatized Framework with Support Vector Machines for Sparse Hyperparameter Optimization
2018, Proceedings of the International Joint Conference on Neural Networks
Marcin Orchel received the Ph.D. degree in computer science from AGH University of Science and Technology, Poland, in 2013.
He is currently an Assistant Professor in the Department of Computer Science at AGH University of Science and Technology, Poland. His current research interests include support vector machines, machine learning, data mining, and stock price prediction.