Elsevier

Neurocomputing

Volume 149, Part B, 3 February 2015, Pages 1109-1124
Neurocomputing

Solving classification problems by knowledge sets

https://doi.org/10.1016/j.neucom.2014.07.022Get rights and content

Abstract

We propose a novel theoretical model and a method for solving binary classification problems. First, we find knowledge sets in the input space by using estimated density functions. Then, we find the final solution outside knowledge sets. We derived bounds for classification error based on knowledge sets. We estimate knowledge sets with examples and find the solution by using support vector machines (SVM). We performed tests on various real world data sets, and we achieved similar generalization performance compared to SVM with significantly smaller number of support vectors.

Introduction

One of the possibilities to improve generalization performance for classification problems is to incorporate additional knowledge, sometimes called prior knowledge. Various types of prior knowledge have been already incorporated to SVM. In [1], the authors distinguish two types of prior knowledge: knowledge about class invariance, and knowledge about the data. The first type includes knowledge about classification in regions of the input space, [2], [3], [4], [5], knowledge about class invariance during transformation of the input. The second type includes knowledge about unlabeled examples, imbalance of classes, and quality of the data. In [2], [3], the authors proposed informally a concept of knowledge sets: as for example cubes supposed to belong to one of the two categories; they concentrated on incorporating prior knowledge in the form of polyhedral knowledge sets. In this paper, instead of incorporating prior knowledge, we use a concept of knowledge sets to model a standard classification problem, based only on training examples. We can interpret a knowledge set as information about classification for a set of data points in the input space. A decision boundary is supposed to lie outside knowledge sets (in an uncertain set). The similar concept of uncertainty is related to version spaces, which were used in Bayes point machines (BPS), [6]. A version space is a set of hypotheses that are consistent with a training sample. A soft version space is a version space where an error in classifying training data is allowed and is controlled by a parameter. The BPS method from each version space finds a representative candidate for a solution as a Bayes point, which is approximated by the center of mass of a polyhedron. In [7], the authors instead of a version space maintain a set of possible weight-vectors in the form of an axis-aligned box and they choose the candidate with the center of mass of a box. In BPS, a final version space is chosen according to the empirical test error, while in [7] the authors compare different boxes by using the principles from SVM: the principle of the empirical risk minimization (ERM) and the structural risk minimization (SRM) for the worst case hypothesis from the box. They also added the third principle of large volume. Large volume transductive principle was briefly treated in [8] for the case of hyperplanes and extended in [9]. In our approach, we deal with uncertainty in the input space instead of a hypothesis space. We propose a theoretical model of knowledge sets, where we define knowledge sets and an uncertain set. The knowledge sets are defined purely on sets, without assuming any particular space for elements or shapes, like boxes.

There are at least three models of a classification problem [10]: the risk minimization model, estimating the regression function of expected conditional probabilities of classification for given data, the Bayes approach of estimating density functions for conditional probabilities of data for particular classes. None of the above models is suitable for the concept of knowledge sets, so we propose a new classification model, called a knowledge set model.

Remark 1

In the knowledge set model, first, we generate knowledge sets. Then, we find a classifier based on the knowledge sets.

The most known method of classification based on predicting density functions from sample data is a Bayes classifier which is an intersection of density functions. Density functions are predicted by using for example Kernel Density Estimation (KDE), [11]. In this paper, we propose a classification method based on the knowledge set model, called knowledge set machines (KSM). In the proposed method instead of predicting directly the decision boundary from estimated density functions, we add an intermediate step – constructing knowledge sets. Then, we find a classifier based on knowledge sets by using the maximal margin principle used for example in SVM. Knowledge sets can be interpreted as partitioning an input space. The most known algorithm of partitioning for classification are decision trees which creates boxes with particular classification. There were some attempts to improve partitioning by using tighter boxes, covering only part of the input space, and using another boxes for classifying the rest of the space, [12].

The outline of the paper is as follows. First, we will analyze a knowledge set model. Then, we will present the KSM method based on this model. Finally, we will show experiments and results. The introduction to SVM and density estimation is in Appendix B Support vector classification basics, Appendix C Density estimation basics respectively.

Section snippets

Knowledge set model

At the beginning, we present some basic definitions and propositions. Notation for a knowledge set model is described in Appendix Appendix A. We will define some mathematical structures on a set, which will consist of the environment objects – common for proposed structures, and main objects. We propose the following set of environment objects, E: a universe X of possible elements of the set S, a set C of possible classes, and a set of mappings M, mapping some of xX to some class cC, xc. We

The classification method based on knowledge sets

We propose a classification method based on knowledge sets, which we call KSM. Consider the task of binary classification problem, with predicted f1(x) and f2(x). In the analysis of uncertain models, we have found that we should look for margin and possibly superlevel knowledge sets. Because we do not know the prediction bands and the limited domain, we should test margin and superlevel knowledge sets with different values of the parameters. If we have limited number of possible tests, we

Experiments

We compare performance of SVM with KSM for various real world data sets. We chose all real world data sets for binary classification from the LibSVM site [19] which originally come from UCI Machine Learning Repository and Statlog (for the aia data sets, we chose only a1a; the covtype data set is reduced to the first 25,000 data vectors); see the details about the data sets in Table 1. We use LIBSVM [20] for running internally SVM in all methods. For all data sets, every feature is scaled

Summary

In this paper, we proposed a novel theoretical classification model of knowledge sets. We derived some bounds for a classification error. We proposed incorporation of four models of uncertainty to a knowledge set model: limited domain, prediction bands, limited prediction bands and limited measure of an uncertain set. We derived the SVM method from the proposed model. The advantage of a knowledge set model is greater understanding of classification problems and the source of ideas for designing

Acknowledgments

I would like to express my sincere gratitude to Professor Witold Dzwinel and Marcin Kurdziel (AGH University of Science and Technology, Department of Computer Science) and Professor Stan Matwin (Dalhousie University) and Roman Staszczyk for discussion and useful suggestions. The research is financed by the National Science Centre, project id 217859, UMO-2013/09/B/ST6/01549, titled “Interactive Visual Text Analytics (IVTA): Development of novel user-driven text mining and visualization methods

Marcin Orchel received the Ph.D. degree in computer science from AGH University of Science and Technology, Poland, in 2013.

He is currently an Assistant Professor in the Department of Computer Science at AGH University of Science and Technology, Poland. His current research interests include support vector machines, machine learning, data mining, and stock price prediction.

References (28)

  • R. El-Yaniv et al.

    Large margin vs. large volume in transductive learning

    Mach. Learn.

    (2008)
  • M. Krzysko

    Systemy uczace sie: rozpoznawanie wzorcow, analiza skupien i redukcja wymiarowosci

    (2009)
  • M. Kobos, J. Mandziuk, Classification based on combination of kernel density estimators, in: International Conference...
  • H. Wang, D. Bell, I. Düntsch, A density based approach to classification, in: Proceedings of 2003 ACM Symposium on...
  • Cited by (2)

    • A knowledge based machine tool maintenance planning system using case-based reasoning techniques

      2019, Robotics and Computer-Integrated Manufacturing
      Citation Excerpt :

      Through knowledge representation, Case retrieving, Case adaptation and Case revising, new knowledge will be stored as Cases in the knowledge base. The definition of knowledge is normally bonded with data and information [15, 16]. Data, information, knowledge and intelligence are regarded as Pyramid of Knowledge [1].

    • Knowledge-Uncertainty Axiomatized Framework with Support Vector Machines for Sparse Hyperparameter Optimization

      2018, Proceedings of the International Joint Conference on Neural Networks

    Marcin Orchel received the Ph.D. degree in computer science from AGH University of Science and Technology, Poland, in 2013.

    He is currently an Assistant Professor in the Department of Computer Science at AGH University of Science and Technology, Poland. His current research interests include support vector machines, machine learning, data mining, and stock price prediction.

    View full text