Abstract
Kernel-based methods have been widely investigated in the soft-computing community. However, they focus mainly on numeric data. In this paper, we propose a novel method for kernel learning on categorical data, and show how the method can be used to derive effective classifiers for linear classification. Based on kernel density estimation for categorical attributes, three popular classification methods, i.e., Naive Bayes, nearest neighbor and prototype-based classification, are effectively extended to classify categorical data. We also propose two data-driven approaches to the bandwidth selection problem, with one aimed at minimizing the mean squared error of the kernel estimate and the other endeavored to attribute weights optimization. Theoretical analysis indicates that, as in the numeric case, kernel learning of categorical attributes is capable to make the classes to be more separable, resulting in outstanding performances of the new classifiers on various real-world data sets.
Access this article
Rent this article via DeepDyve
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-015-1926-8/MediaObjects/500_2015_1926_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-015-1926-8/MediaObjects/500_2015_1926_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-015-1926-8/MediaObjects/500_2015_1926_Fig3_HTML.gif)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aitchison J, Aitken C (1976) Multivariate binary discrimination by the kernel method. Biometrika 63:413–420
Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: Proceedings of 8th SIAM international conference on data mining (SDM’08), pp 243–254
Buttrey SE (1998) Nearest-neighbor classification with categorical variables. Comput Stat Data Anal 28:157–169
Chen L (2015) A probabilistic framework for optimizing projected clusters with categorical attributes. Sci China Inf Sci 58:072104
Chen L, Guo G, Wang S, Kong X (2014) Kernel learning method for distance-based classification of categorical data. In: Proceedings of the 14th annual UK workshop on computational intelligence (UKCI’14), pp 58–63
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Cristianini N, Scholkopf B (2002) Support vector machines and kernel methods: the new generation of learning machines. Artif Intell 23(3):31–41
Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York
Guo G, Wang H, Bell D, Bi Y, Greer K (2006) Using kNN model for automatic text categorization. Soft Comput 10(5):423–430
Hall M, Frank E et al (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Han E, Karypis G (2000) Centroid-based document classification: analysis & experimental results. In: Proceedings of the 4th European conference on principles and practice of knowledge discovery in databases (PKDD’00), pp 424–431
Hu Q, Yu D, Xie Z (2008) Neighborhood classifiers. Exp Syst Appl 34:876–886
Jiang L, Cai Z, Wang D, Zhang H (2014) Bayesian citation-KNN with distance weighting. Int J Mach Learn Cybern 5:193–199
John G, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the conference on uncertainty in artificial intelligence (UAI’95), pp 338–345
Lewis D (1998) Naive (bayes) at forty: the independence assumption in information retrieval. In: Proceedings of 10th European conference on machine learning (ECML’98), pp 4–15
Li Q, Racine J (2007) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton
Li Q, Racine J (2008) Nonparametric estimation of conditional CDF and quantile functions with mixed categorical and continuous data. J Bus Econ Stat 26(4):423–434
Light RJ, Marglin BH (1971) An analysis of variance for categorical data. J Am Stat Assoc 66(335):534–544
Murphy K (2012) Machine learning: a probabilistic perspective. The MIT Press, New York
Ouyang D, Li Q, Racine J (2006) Cross-validation and the estimation of probability distributions with categorical data. Nonparametric Stat 18(1):69–100
Paredes R, Vidal E (2006) Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Trans Pattern Anal Mach Intell 28:1100–1110
Seeger M (2006) Bayesian modeling in machine learning: a tutorial review. Tutorial, Saarland University. http://lapmal.epfl.ch/papers/bayes-review
Sen PK (2005) Gini diversity index, hamming distance and curse of dimensionality. Metron Int J Stat LXIII (3):329–349
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Vapnik V (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–1000
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Xiong T, Wang S, Mayers A, Monga E (2012) DHCC: divisive hierarchical clustering of categorical data. Data Min Knowl Discov 24(1):103–135
Zhang J, Chen L, Guo G (2013) Projected-prototype-based classifier for text categorization. Knowl Based Syst 49:179–189
Acknowledgments
L. Chen and G. Guo’s work was supported by the National Natural Science Foundation of China under Grant No. 61175123, and the Fujian Normal University Innovative Research Team (IRTL1207). L. Chen’s work was also supported by the Natural Science Foundation of Fujian Province of China under Grant No. 2015J01238. J. Zhu’s work was supported by the National Social Science Foundation of China (Major Program 13&ZD148).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by D. Neagu.
Rights and permissions
About this article
Cite this article
Chen, L., Ye, Y., Guo, G. et al. Kernel-based linear classification on categorical data. Soft Comput 20, 2981–2993 (2016). https://doi.org/10.1007/s00500-015-1926-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1926-8