Abstract
This paper describes in full detail a model of a hierarchical classifier (HC). The original classification problem is broken down into several subproblems and a weak classifier is built for each of them. Subproblems consist of examples from a subset of the whole set of output classes. It is essential for this classification framework that the generated subproblems would overlap, i.e. some individual classes could belong to more than one subproblem. This approach allows to reduce the overall risk. Individual classifiers built for the subproblems are weak, i.e. their accuracy is only a little better than the accuracy of a random classifier. The notion of weakness for a multiclass model is extended in this paper. It is more intuitive than approaches proposed so far. In the HC model described, after a single node is trained, its problem is split into several subproblems using a clustering algorithm. It is responsible for selecting classes similarly classified. The main scope of this paper is focused on finding the most appropriate clustering method. Some algorithms are defined and compared. Finally, we compare a whole HC with other machine learning approaches.
Similar content being viewed by others
References
Bishop C (2006) Pattern recognition and machine learning. Springer, Berlin
Byun H, Lee SW (2003) A survey on pattern recognition applications of support vector machines. Int J Pattern Recognit Artif Intell 17(3):459–486
Chou YY, Shapiro LG (2003) A hierarchical multiple classifier learning algorithm. Pattern Anal Appl 6:150–168
Christiani N, Shawe-Taylor J (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press
Ciampi A, Lechevallier Y, Limas MC, Marcos AG (2008) Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets. Pattern Anal Appl 11:199–220
Dara R, Kamel M, Wanas N (2009) Data dependency in multiple classifier systems. Pattern Recogn 42:1260–1273
Day W, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24
Dietterich T, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286
Eibl G, Pfeiffer KP (2005) Multiclass boosting for weak classifiers. J Mach Learn 6:189–210
Freund Y, Schapire RE (1997) A decision theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55:119–139
Fritzke B (1995) A growing neural gas network learns topologies. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems, vol 7. MIT Press, Cambridge, pp 625–632
Fritzke B (1997) A self-organizing network that can follow non-stationary distributions. In: Gerstner W, Germond A, Hasler M, Nicoud JD (eds) International conference on artificial neural networks. LNCS, vol 1327. Springer, New York, pp 613–618
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Haykin S (2009) Neural networks: a comprehensive foundation. Prentice Hall, New Jersey
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Kearns M, Valiant L (1989) Cryptographic limitations on learning boolean formulae and finite automata. In: Proceedings of 21st annual ACM symposium on theory of computing. ACM Press, New York, pp 434–444
Kumar S, Ghosh J, Crawford M (2005) Hierarchical fusion of multiple classifiers for hyperspectral data analysis. Pattern Anal Appl 5:210–220
Newman DJ, Hettich S, Blake CL (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html
Podolak IT (2008) Hierarchical classifier with overlapping class groups. Expert Syst Appl 34(1):673–682
Podolak IT, Bartocha K (2009) A hierarchical classifier with growing neural gas clustering. In: Kolehmainen M, Toivanen P, Beliczynski B (eds) Adaptive and natural computing algorithms. LNCS, vol 5495. Springer, New York, pp 283–292
Podolak IT, Roman A (2009) A new notion of weakness in classification theory. In: Kurzynski M, Wozniak M (eds) Computer recognition systems, vol 3. Advances in intelligent and soft computing, no. 57. Springer, New York, pp 239–245
Podolak IT, Roman A (2010) Theoretical foundations and practical results for the hierarchical classifier. Comput Intell (submitted)
Rokach L (2006) Decomposition methodology for classification tasks: a meta decomposer framework. Pattern Anal Appl 9:257–271
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336
Setiono R (2001) Feedforward neural network construction using cross validation. Neural Comput 13:2865–2877
Tresp V (2001) Committee machines. In: Handbook for neural network signal processing. CRC Press, Boca Raton
Xi D, Podolak IT, Lee SW (2002) Facial component extraction and face recognition with support vector machines. In: Proceedings of the fifth IEEE international conference on automatic face and gesture recognition. IEEE Computer Society, pp 76–81
Zwitter M, Sokolic M (1988) Primary tumor data set, medical data from the University Medical Center, Institute of Oncology, Ljubljana
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Podolak, I.T., Roman, A. CORES: fusion of supervised and unsupervised training methods for a multi-class classification problem. Pattern Anal Applic 14, 395–413 (2011). https://doi.org/10.1007/s10044-011-0204-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0204-3