Robust and Efficient Pattern Classification using Large Geometric Margin Minimum Classification Error Training

Watanabe, Hideyuki; Ohashi, Tsukasa; Katagiri, Shigeru; Ohsaki, Miho; Matsuda, Shigeki; Kashioka, Hideki

doi:10.1007/s11265-013-0760-4

Robust and Efficient Pattern Classification using Large Geometric Margin Minimum Classification Error Training

Published: 30 June 2013

Volume 74, pages 297–310, (2014)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Hideyuki Watanabe¹,
Tsukasa Ohashi²,
Shigeru Katagiri²,
Miho Ohsaki²,
Shigeki Matsuda¹ &
…
Hideki Kashioka¹

301 Accesses
3 Citations
Explore all metrics

Abstract

Recently, one of the standard discriminative training methods for pattern classifier design, i.e., Minimum Classification Error (MCE) training, has been revised, and its new version is called Large Geometric Margin Minimum Classification Error (LGM-MCE) training. It is formulated by replacing a conventional misclassification measure, which is equivalent to the so-called functional margin, with a geometric margin that represents the geometric distance between an estimated class boundary and its closest training pattern sample. It seeks the status of the trainable classifier parameters that simultaneously correspond to the minimum of the empirical average classification error count loss and the maximum of the geometric margin. Experimental evaluations showed the fundamental utility of LGM-MCE training. However, to increase its effectiveness, this new training required careful setting for hyperparameters, especially the smoothness degree of the smooth classification error count loss. Exploring the smoothness degree usually requires many trial-and-error repetitions of training and testing, and such burdensome repetition does not necessarily lead to an optimal smoothness setting. To alleviate this problem and further increase the effect of geometric margin employment, we apply in this paper a new idea that automatically determines the loss smoothness of LGM-MCE training. We first introduce a new formalization of it using the Parzen estimation of error count risk and formalize LGM-MCE training that incorporates a mechanism of automatic loss smoothness determination. Importantly, the geometric-margin-based misclassification measure adopted in LGM-MCE training is directly linked with the geometric margin in a pattern sample space. Based on this relation, we also prove that loss smoothness affects the production of virtual samples along the estimated class boundaries in pattern sample space. Finally, through experimental evaluations and in comparisons with other training methods, we elaborate the characteristics of LGM-MCE training and its new function that automatically determines an appropriate loss smoothness degree.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Family of Maximum Margin Criterion for Adaptive Learning

A new constrained maximum margin approach to discriminative learning of Bayesian classifiers

Article 01 May 2018

Generalization improvement for regularized least squares classification

Article 22 June 2017

Notes

While α _y’s are set common to all classes in most MCE embodiments, we also treat the case of different α _y’s for each class in consideration of the introduction of the automatic loss smoothness determination described in Section 3.
Even in the linear discriminant function case, D _y(x; Λ) is not just a constant multiple of d _y(x; Λ) because denominator ∥∇_x d _y(x; Λ)∥ depends on trainable parameter set Λ.
LGM-MCE training is also applicable to other spaces than original input space 𝒳. Indeed, some of the authors of this paper recently formulated a Kernel MCE method that applied LGM-MCE training to kernel-based high-dimensional space and experimentally demonstrated its utility [23].
This relation is described in Chapter 4 of [7], too.
For large-scale applications, the CVML method may not be practical since it repeats \(O(N_{y}^{2})\) computation. In such cases, applying only the IQR-based method, which is just O(N _y) and requires no iterations, may be recommended.
As discussed previously, the term “approximately” can be omitted for linear discriminant functions or Euclidean-distance-based prototype classifiers even if \(\mathbf {\boldsymbol{x}}_{k}^{y}\) is not close to the boundary.
Bishop revealed that the minimization of the sum-of-square (or sum-of-cross-entropy) error with random noise added to the input data (not the transformed data points on misclassification measure space) is almost equivalent to the minimization of regularized error without noise [29]. Similarly, our approach, which produces virtual samples in the input pattern space (Fig. 5), may also minimize the regularized classification error without virtual samples. Theoretical analysis of this interesting issue remains future work.
The computation of best prototype p _j assumes that a smooth “soft-max” function is approximated by the standard “max” operator for computational simplicity.
http://archive.ics.uci.edu/ml/

References

Jebara, T. (2004). Machine learning: Discriminative and generative. Kluwer.
Schlüter, R., Macherey, W., Müller, B., Ney, H. (2001). Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Communication, 34, 287–310.
Article MATH Google Scholar
He, X., Deng, L., Chou, W. (2008). Discriminative learning in sequential pattern recognition. IEEE Signal Processing Magazine, 14–36.
Jiang, H. (2010). Discriminative training of HMMs for automatic speech recognition: a survey. Computer Speech & Language, 24(4), 589–608.
Article Google Scholar
Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification. IEEE Transactions Signal Processing, 40(12), 3043–3054.
Article MATH Google Scholar
Katagiri, S., Juang, B.-H., Lee, C.-H. (1998). Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method. Proceedings of the IEEE, 86(11), 2345–2373.
Article Google Scholar
Bishop, C.M. (2006). Pattern recognition and machine learning. New York: Springer-Verlag.
MATH Google Scholar
Duda, O.H., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
MATH Google Scholar
Fukunaga, K. (1990). Introduction to statistical pattern recognition, 2nd edn. San Diego: Academic Press.
MATH Google Scholar
Vapnik, V.N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
Book MATH Google Scholar
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge University Press.
Abe, S. (2005). Support vector machines for pattern classification. London: Springer-Verlag.
Google Scholar
He, T., Hu, Y., Huo, Q. (2007). An approach to large margin design of prototype-based pattern classifiers. Proceedings of the ICASSP, 2, II-625–628.
Google Scholar
He, T., & Huo, Q. (2008). A study of a new misclassification measure for minimum classification error training of prototype-based pattern classifiers. Proceedings of the ICPR, 1–4.
Wang, Y., & Huo, Q. (2010). Sample-separation-margin based minimum classification error training of pattern classifiers with quadratic discriminant functions. Proceedings of the ICASSP, 1866–1869.
Watanabe, H., Katagiri, S., Yamada, K., McDermott, E., Nakamura, A., Watanabe, S., Ohsaki, M. (2010). Minimum error classification with geometric margin control. Proceedings of the IEEE, 2170–2173.
Watanabe, H., & Katagiri, S. (2011). Minimum classification error training with geometric margin enhancement for robust pattern recognition. Proceedings IEEE MLSP (CD version), 1–6.
Freund, Y., & Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Soviet Journal of Computer and Systems Sciences, 55(1), 119–139.
Article MATH MathSciNet Google Scholar
Liu, C., Jiang, H., Rigazio, L. (2006). Recent improvement on maximum relative margin estimation of HMMs for speech recognition. Proceedings of the ICASSP, 1, I-269–272.
Google Scholar
Jiang, H., Li, X., Liu, C. (2006). Large margin hidden Markov models for speech recognition. IEEE Transactions on Audio, Speech, Language Processing, 14(5), 1584–1595.
Article Google Scholar
Li, J., Yuan, M., Lee, C.-H. (2007). Approximate test risk bound minimization through soft margin estimation. IEEE Transactions on Audio, Speech, Language Processing, 15(8), 2393–2404.
Article Google Scholar
Yu, D., Deng, L., He, X., Acero, A. (2008). Large-margin minimum classification error training: a theoretical risk minimization perspective. Computer Speech & Language, 22, 415–429.
Article Google Scholar
Tanaka, H., Watanabe, H., Katagiri, S., Ohsaki, M. (2012). Experimental evaluation of kernel minimum classification error training. Proceedings of the IEEE TENCON, 1–6.
Watanabe, H., Tokuno, J., Ohashi, T., Katagiri, S., Ohsaki, M. (2011). Minimum classification error training with automatic setting of loss smoothness. Proceedings of the IEEE MLSP, 1–6.
McDermott, E., & Katagiri, S. (2004). A derivation of minimum classification error from the theoretical classification risk using Parzen estimation. Computer Speech & Language, 18, 107–122.
Article Google Scholar
Ohashi, T., Tokuno, J., Watanabe, H., Katagiri, S., Ohsaki, M. (2011). Automatic loss smoothness determination for large geometric margin minimum classification error training. Proceedings IEEE TENCON, 260–264.
Ohashi, T., Watanabe, H., Tokuno, J., Katagiri, S., Ohsaki, M., Matsuda, S., Kashioka, H. (2011). Increasing virtual samples through loss smoothness determination in large geometric margin minimum classification error training. Proceedings IEEE ICASSP, 2081–2084.
Silverman, B.W. (1986). Density estimation for statistics and data analysis. Chapman & Hall/CRC.
Bishop, C.M. (1995). Training with noise is equivalent to Tikhonov regularization. Nural Computation, 7, 108–116.
Article Google Scholar
Sato, A., & Yamada, K. (1996). Advances in neural information processing systems (Vol. 8, pp. 23–429). MIT Press.
Sato, A., & Yamada, K. (1998). A formulation of learning vector quantization using a new misclassification measure. Proceedings ICPR1998, 322–325.
Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal Machine Learning Research, 2, 265–292.
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Information and Communications Technology, 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan
Hideyuki Watanabe, Shigeki Matsuda & Hideki Kashioka
Graduate School of Engineering, Doshisha University, 1-3 Tatara Miyakodani, Kyotanabe-shi, Kyoto, 610-0394, Japan
Tsukasa Ohashi, Shigeru Katagiri & Miho Ohsaki

Authors

Hideyuki Watanabe
View author publications
You can also search for this author in PubMed Google Scholar
Tsukasa Ohashi
View author publications
You can also search for this author in PubMed Google Scholar
Shigeru Katagiri
View author publications
You can also search for this author in PubMed Google Scholar
Miho Ohsaki
View author publications
You can also search for this author in PubMed Google Scholar
Shigeki Matsuda
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Kashioka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hideyuki Watanabe.

Additional information

This work was supported in part by Grant-in-Aid for Scientific Research (B), No. 22300064.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Watanabe, H., Ohashi, T., Katagiri, S. et al. Robust and Efficient Pattern Classification using Large Geometric Margin Minimum Classification Error Training. J Sign Process Syst 74, 297–310 (2014). https://doi.org/10.1007/s11265-013-0760-4

Download citation

Received: 23 October 2012
Revised: 01 February 2013
Accepted: 24 April 2013
Published: 30 June 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11265-013-0760-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and Efficient Pattern Classification using Large Geometric Margin Minimum Classification Error Training

Abstract

Access this article

Similar content being viewed by others

A Family of Maximum Margin Criterion for Adaptive Learning

A new constrained maximum margin approach to discriminative learning of Bayesian classifiers

Generalization improvement for regularized least squares classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust and Efficient Pattern Classification using Large Geometric Margin Minimum Classification Error Training

Abstract

Access this article

Similar content being viewed by others

A Family of Maximum Margin Criterion for Adaptive Learning

A new constrained maximum margin approach to discriminative learning of Bayesian classifiers

Generalization improvement for regularized least squares classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation