Abstract
We propose a novel discriminative learning approach for Bayesian pattern classification, called ‘constrained maximum margin (CMM)’. We define the margin between two classes as the difference between the minimum decision value for positive samples and the maximum decision value for negative samples. The learning problem is to maximize the margin under the constraint that each training pattern is classified correctly. This nonlinear programming problem is solved using the sequential unconstrained minimization technique. We applied the proposed CMM approach to learn Bayesian classifiers based on Gaussian mixture models, and conducted the experiments on 10 UCI datasets. The performance of our approach was compared with those of the expectation-maximization algorithm, the support vector machine, and other state-of-the-art approaches. The experimental results demonstrated the effectiveness of our approach.
Similar content being viewed by others
References
Alcalá-Fdez J, Sanchez L, Garcia S, et al., 2009. KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput, 13(3):307–318. https://doi.org/10.1007/s00500-008-0323-y
Alcalá-Fdez J, Fernández A, Luengo J, et al., 2011. KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multi-Valued Log Soft Comput, 17(2-3):255–287.
Bredensteiner EJ, Bennett KP, 1999. Multicategory classification by support vector machines. In: Pang JS(Ed.), Computational Optimization. Springer US, New York, p.53–79. https://doi.org/10.1007/978-1-4615-5197-3_5
Dempster AP, Laird NM, Rubin DB, 1977. Maximum likelihood from incomplete data via the EMalgorithm. J R Stat Soc B, 39(1):1–38.
Demšar J, 2006. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 7(Jan):1–30.
Dong W, Zhou M, 2014. Gaussian classifier-based evolutionary strategy for multimodal optimization. IEEE Trans Neur Netw Learn Syst, 25(6):1200–1216. https://doi.org/10.1109/TNNLS.2014.2298402
Dvorák J, Savický P, 2007. Softening splits in decision trees using simulated annealing. Int Conf on Adaptive and Natural Computing Algorithms, p.721–729. https://doi.org/10.1007/978-3-540-71618-1_80
Fiacco AV, McCormick GP, 1990. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. SIAM, Philadelphia. https://doi.org/10.1137/1.9781611971316
Forsythe GE, Malcolm MA, Moler CB, 1977. Computer Methods for Mathematical Computations (1st Ed.). Prentice Hall, New Jersey.
Friedman N, Geiger D, Goldszmidt M, 1997. Bayesian network classifiers. Mach Learn, 29(2-3):131–163. https://doi.org/10.1023/A:1007465528199
Gorman RP, Sejnowski TJ, 1988. Analysis of hidden units in a layered network trained to classify sonar targets. Neur Netw, 1(1):75–89. https://doi.org/10.1016/0893-6080(88)90023-8
Hall M, Frank E, Holmes G, et al., 2009. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl, 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Jiang H, 2010. Discriminative training of HMMs for automatic speech recognition: a survey. Comput Speech Lang, 24(4):589–608. https://doi.org/10.1016/j.csl.2009.08.002
Jiang L, Zhang H, Cai Z, 2009. A novel Bayes model: hidden naïve Bayes. IEEE Trans Knowl Data Eng, 21(10): 1361–1371. https://doi.org/10.1109/TKDE.2008.234
Jiang L, Zhang H, Cai Z, et al., 2012. Weighted average of one-dependence estimators. J Exp Theor Artif Intell, 24(2):219–230. https://doi.org/10.1080/0952813X.2011.639092
Jiang Y, Zhou ZH, 2004. Editing training data for kNN classifiers with neural network ensemble. Advances in Neural Networks—Int Symp on Neural Networks, p.356–361. https://doi.org/10.1007/978-3-540-28647-9_60
Juang BH, Katagiri S, 1992. Discriminative learning for minimum error classification (pattern recognition). IEEE Trans Signal Process, 40(12):3043–3054. https://doi.org/10.1109/78.175747
Karabatak M, 2015. A new classifier for breast cancer detection based on naïve Bayesian. Measurement, 72:32–36. https://doi.org/10.1016/j.measurement.2015.04.028
Kim BH, Pfister HD, 2011. An iterative joint linearprogramming decoding of LDPC codes and finite-state channels. IEEE Conf on Communications, p.1–6. https://doi.org/10.1109/icc.2011.5962814
Kwok JTY, 1999. Moderating the outputs of support vector machine classifiers. IEEE Trans Neur Netw, 10(5): 1018–1031. https://doi.org/10.1109/72.788642
Moerland P, 1999. A comparison of mixture models for density estimation. 9th Int Conf on Artificial Neural Networks, p.25–30. https://doi.org/10.1049/cp:19991079
Nádas A, 1983. A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Trans Audio Speech Signal Process, 31(4):814–817. https://doi.org/10.1109/TASSP.1983.1164173
OpenCV Team, 2015. Open Source Computer Vision Library. http://opencv.org [Accessed on July 15, 2016].
Pernkopf F, Wohlmayr M, 2010. Large margin learning of Bayesian classifiers based on Gaussian mixture models. Joint European Conf on Machine Learning and Knowledge Discovery in Databases, p.50–66. https://doi.org/10.1007/978-3-642-15939-8_4
Pernkopf F, Wohlmayr M, Tschiatschek S, 2012. Maximum margin Bayesian network classifiers. IEEE Trans Patt Anal Mach Intell, 34(3):521–532. https://doi.org/10.1109/TPAMI.2011.149
Povey D, Woodland PC, 2002. Minimum phone error and I-smoothing for improved discriminative training. IEEE Int Conf on Acoustics, p.105–108. https://doi.org/10.1109/ICASSP.2002.5743665
University of California, 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml [Accessed on Aug. 10, 2016].
Vapnik V, 2013. The Nature of Statistical Learning Theory (2nd Ed.). Springer-Verlag, New York. https://doi.org/10.1007/978-1-4757-3264-1
Vlassis N, Likas A, 1999. A kurtosis-based dynamic approach to Gaussian mixture modeling. IEEE Trans Syst Man Cybern A, 29(4):393–399. https://doi.org/10.1109/3468.769758
Webb GI, Boughton JR, Wang Z, 2005. Not so naïve Bayes: aggregating one-dependence estimators. Mach Learn, 58(1):5–24. https://doi.org/10.1007/s10994-005-4258-6
Woodland PC, Povey D, 2002. Large scale discriminative training of hidden Markov models for speech recognition. Comput Speech Lang, 16(1):25–47. https://doi.org/10.1006/csla.2001.0182
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (Nos. 60973059 and 81171407) and the Program for New Century Excellent Talents in University, China (No. NCET-10-0044)
Rights and permissions
About this article
Cite this article
Guo, K., Liu, Xb., Guo, Lh. et al. A new constrained maximum margin approach to discriminative learning of Bayesian classifiers. Frontiers Inf Technol Electronic Eng 19, 639–650 (2018). https://doi.org/10.1631/FITEE.1700007
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1700007
Key words
- Discriminative learning
- Statistical modeling
- Bayesian pattern classifiers
- Gaussian mixture models
- UCI datasets