Skip to main content
Log in

A new constrained maximum margin approach to discriminative learning of Bayesian classifiers

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

We propose a novel discriminative learning approach for Bayesian pattern classification, called ‘constrained maximum margin (CMM)’. We define the margin between two classes as the difference between the minimum decision value for positive samples and the maximum decision value for negative samples. The learning problem is to maximize the margin under the constraint that each training pattern is classified correctly. This nonlinear programming problem is solved using the sequential unconstrained minimization technique. We applied the proposed CMM approach to learn Bayesian classifiers based on Gaussian mixture models, and conducted the experiments on 10 UCI datasets. The performance of our approach was compared with those of the expectation-maximization algorithm, the support vector machine, and other state-of-the-art approaches. The experimental results demonstrated the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alcalá-Fdez J, Sanchez L, Garcia S, et al., 2009. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput, 13(3):307–318. https://doi.org/10.1007/s00500-008-0323-y

    Article  Google Scholar 

  • Alcalá-Fdez J, Fernández A, Luengo J, et al., 2011. KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multi-Valued Log Soft Comput, 17(2-3):255–287.

    Google Scholar 

  • Bredensteiner EJ, Bennett KP, 1999. Multicategory classification by support vector machines. In: Pang JS(Ed.), Computational Optimization. Springer US, New York, p.53–79. https://doi.org/10.1007/978-1-4615-5197-3_5

    Chapter  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB, 1977. Maximum likelihood from incomplete data via the EMalgorithm. J R Stat Soc B, 39(1):1–38.

    MATH  Google Scholar 

  • Demšar J, 2006. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 7(Jan):1–30.

    MathSciNet  MATH  Google Scholar 

  • Dong W, Zhou M, 2014. Gaussian classifier-based evolutionary strategy for multimodal optimization. IEEE Trans Neur Netw Learn Syst, 25(6):1200–1216. https://doi.org/10.1109/TNNLS.2014.2298402

    Article  Google Scholar 

  • Dvorák J, Savický P, 2007. Softening splits in decision trees using simulated annealing. Int Conf on Adaptive and Natural Computing Algorithms, p.721–729. https://doi.org/10.1007/978-3-540-71618-1_80

    Chapter  Google Scholar 

  • Fiacco AV, McCormick GP, 1990. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. SIAM, Philadelphia. https://doi.org/10.1137/1.9781611971316

    Google Scholar 

  • Forsythe GE, Malcolm MA, Moler CB, 1977. Computer Methods for Mathematical Computations (1st Ed.). Prentice Hall, New Jersey.

    MATH  Google Scholar 

  • Friedman N, Geiger D, Goldszmidt M, 1997. Bayesian network classifiers. Mach Learn, 29(2-3):131–163. https://doi.org/10.1023/A:1007465528199

    Article  MATH  Google Scholar 

  • Gorman RP, Sejnowski TJ, 1988. Analysis of hidden units in a layered network trained to classify sonar targets. Neur Netw, 1(1):75–89. https://doi.org/10.1016/0893-6080(88)90023-8

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, et al., 2009. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl, 11(1):10–18. https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  • Jiang H, 2010. Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang, 24(4):589–608. https://doi.org/10.1016/j.csl.2009.08.002

    Article  Google Scholar 

  • Jiang L, Zhang H, Cai Z, 2009. A novel Bayes model: hidden naïve Bayes. IEEE Trans Knowl Data Eng, 21(10): 1361–1371. https://doi.org/10.1109/TKDE.2008.234

    Article  Google Scholar 

  • Jiang L, Zhang H, Cai Z, et al., 2012. Weighted average of one-dependence estimators. J Exp Theor Artif Intell, 24(2):219–230. https://doi.org/10.1080/0952813X.2011.639092

    Article  Google Scholar 

  • Jiang Y, Zhou ZH, 2004. Editing training data for kNN classifiers with neural network ensemble. Advances in Neural Networks—Int Symp on Neural Networks, p.356–361. https://doi.org/10.1007/978-3-540-28647-9_60

    Google Scholar 

  • Juang BH, Katagiri S, 1992. Discriminative learning for minimum error classification (pattern recognition). IEEE Trans Signal Process, 40(12):3043–3054. https://doi.org/10.1109/78.175747

    Article  MATH  Google Scholar 

  • Karabatak M, 2015. A new classifier for breast cancer detection based on naïve Bayesian. Measurement, 72:32–36. https://doi.org/10.1016/j.measurement.2015.04.028

    Article  Google Scholar 

  • Kim BH, Pfister HD, 2011. An iterative joint linearprogramming decoding of LDPC codes and finite-state channels. IEEE Conf on Communications, p.1–6. https://doi.org/10.1109/icc.2011.5962814

    Google Scholar 

  • Kwok JTY, 1999. Moderating the outputs of support vector machine classifiers. IEEE Trans Neur Netw, 10(5): 1018–1031. https://doi.org/10.1109/72.788642

    Article  Google Scholar 

  • Moerland P, 1999. A comparison of mixture models for density estimation. 9th Int Conf on Artificial Neural Networks, p.25–30. https://doi.org/10.1049/cp:19991079

    Google Scholar 

  • Nádas A, 1983. A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Audio Speech Signal Process, 31(4):814–817. https://doi.org/10.1109/TASSP.1983.1164173

    Article  Google Scholar 

  • OpenCV Team, 2015. Open Source Computer Vision Library. http://opencv.org [Accessed on July 15, 2016].

    Google Scholar 

  • Pernkopf F, Wohlmayr M, 2010. Large margin learning of Bayesian classifiers based on Gaussian mixture models. Joint European Conf on Machine Learning and Knowledge Discovery in Databases, p.50–66. https://doi.org/10.1007/978-3-642-15939-8_4

    Chapter  Google Scholar 

  • Pernkopf F, Wohlmayr M, Tschiatschek S, 2012. Maximum margin Bayesian network classifiers. IEEE Trans Patt Anal Mach Intell, 34(3):521–532. https://doi.org/10.1109/TPAMI.2011.149

    Article  Google Scholar 

  • Povey D, Woodland PC, 2002. Minimum phone error and I-smoothing for improved discriminative training. IEEE Int Conf on Acoustics, p.105–108. https://doi.org/10.1109/ICASSP.2002.5743665

    Google Scholar 

  • University of California, 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml [Accessed on Aug. 10, 2016].

    Google Scholar 

  • Vapnik V, 2013. The Nature of Statistical Learning Theory (2nd Ed.). Springer-Verlag, New York. https://doi.org/10.1007/978-1-4757-3264-1

    Google Scholar 

  • Vlassis N, Likas A, 1999. A kurtosis-based dynamic approach to Gaussian mixture modeling. IEEE Trans Syst Man Cybern A, 29(4):393–399. https://doi.org/10.1109/3468.769758

    Article  Google Scholar 

  • Webb GI, Boughton JR, Wang Z, 2005. Not so naïve Bayes: aggregating one-dependence estimators. Mach Learn, 58(1):5–24. https://doi.org/10.1007/s10994-005-4258-6

    Article  MATH  Google Scholar 

  • Woodland PC, Povey D, 2002. Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang, 16(1):25–47. https://doi.org/10.1006/csla.2001.0182

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeng-min Geng.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 60973059 and 81171407) and the Program for New Century Excellent Talents in University, China (No. NCET-10-0044)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, K., Liu, Xb., Guo, Lh. et al. A new constrained maximum margin approach to discriminative learning of Bayesian classifiers. Frontiers Inf Technol Electronic Eng 19, 639–650 (2018). https://doi.org/10.1631/FITEE.1700007

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1700007

Key words

CLC number

Navigation