Abstract
The ECOC technique is a powerful tool to learn and combine multiple binary learners for multi-class classification. It generally involves three steps: coding, dichotomizers learning, and decoding. In previous ECOC methods, the coding step and the dichotomizers learning step are usually performed independently. This simplifies the learning problem but may lead to unsatisfactory decoding results. To solve this problem, we propose a novel model for learning the ECOC matrix and dichotomizers jointly from data. We formulate the model as a nonlinear programming problem and develop an efficient alternating minimization algorithm to solve it. Specifically, for fixed ECOC matrix, our model is decomposed into a group of mutually independent quadratic programming problems; while for fixed dichotomizers, it is a difference of convex functions problem and can be easily solved using the concave--convex procedure algorithm. Our experimental results on ten data sets from the UCI machine learning repository demonstrated the advantage of our model over state-of-the-art ECOC methods.
Similar content being viewed by others
References
Allwein EL, Schapire RE, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141
Bishop C (2006) Pattern recognition and machine learning. Springer, Berlin
Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(2–3):201–233
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306
Escalera S, Pujol O, Radeva P (2010) On the decoding process in ternary error-correcting output codes. IEEE Trans Pattern Anal Mach Intell 32(1):120–134
Escalera S, Pujol O, Radeva P (2010) Re-coding ECOCs without re-training. Pattern Recognit Lett 31(7):555–562
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Fürnkranz J (2002) Round Robin classification. J Mach Learn Res 2:721–747
Hastie T, Tibshirani R (1997) Classification by pairwise coupling. In: NIPS
Hastie T, Tibshirani R, Friedman J (2009) Elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, New York
Horst R, Thoai NV (1999) DC programming: overview. J Optim Theory Appl 103(1):1–43
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Huang K, Yang H, King I, Lyu MR (2008) Maxi–min margin machine: learning large margin classifiers locally and globally. IEEE Trans Neural Netw 19(2):260–272
Kumar V, Grama A, Gupta A, Karypis G (1994) Introduction to parallel computing: algorithm design and analysis. Benjamin Cummings, Redwood City
Liu CL, Sako H (2006) Class-specific feature polynomial classifier for pattern classification and its application to handwritten numeral recognition. Pattern Recognit 39(4):669–681
Platt JC, Cristianini N, Shawe-Taylor J (1999) Large margin DAGs for multiclass classification. In: NIPS, pp 547–553
Pujol O, Escalera S, Radeva P (2008) An incremental node embedding technique for error correcting output codes. Pattern Recognit 41(2):713–725
Pujol O, Radeva P, Vitrià J (2006) Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes. IEEE Trans Pattern Anal Mach Intell 28(6):1007–1012
Rifkin RM, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
Singh-Miller N, Collins M (2009) Learning label embeddings for nearest-neighbor multi-class classification with an application to speech recognition. In: NIPS, pp 1678–1686
Sriperumbudur B, Lanckriet G (2009) On the Convergence of the concave-convex procedure. In: NIPS, pp 1759–1767
Utschick W, Weichselberger W (2001) Stochastic organization of output codes in multiclass learning problems. Neural Comput 13(5):1065–1102
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936
Zhou J, Peng H, Suen C (2008) Data-driven decomposition for multi-class classification. Pattern Recognit 41(1):67–76
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC) under grant no. 60825301 and no. 61075052. We thank Xu-Yao Zhang for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhong, G., Huang, K. & Liu, CL. Joint learning of error-correcting output codes and dichotomizers from data. Neural Comput & Applic 21, 715–724 (2012). https://doi.org/10.1007/s00521-011-0653-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-011-0653-z