Abstract
In multivariate classification problems, 2D visualization methods can be very useful to understand the data properties whenever they transform the n-dimensional data into a set of 2D patterns which are similar to the original data from the classification point of view. This similarity can be understood as that a classification method works similarly on the original n-dimensional and on the 2D mapped patterns, i.e., the classifier performance should not be much lower on the mapped than on the original patterns. We propose several simple and efficient mapping methods which allow to visualize classification problems in 2D. In order to preserve the structure about the original classification problem, the mappings minimize different class overlap measures, combined with different functions (linear, quadratic and polynomic of several degrees) from \({\mathbb {R}}^n\) to \({\mathbb {R}}^2\). They are also able to map into \({\mathbb {R}}^2\) new data points (out of sample), not used during the mapping learning. This is one of the main benefits of the proposed methods, since few supervised mappings offer a similar behavior. For 71 data sets of the UCI database, we compare the SVM performance using the original and the 2D mapped patterns. The comparison also includes other 34 popular supervised and unsupervised methods of dimensionality reduction, some of them used for the first time in classification. One of the proposed methods, the Polynomial Kernel Discriminant Analysis of degree 2 (PKDA2), outperforms the remaining mappings. Compared to the original n-dimensional patterns, PKDA2 achieves 82% of the performance (measured by the Cohen kappa), raising or keeping the performance for 26.8% of the data sets. For 36.6% of the data sets, the performance is reduced by less than 10%, and it is reduced by more than 20% only for 22.5% of the data sets. This low reduction in performance shows that the 2D maps created by PKDA2 really represent the original data, whose ability to be classified in 2D is highly preserved. Besides, PKDA is very fast, with times of the same order than LDA. The MATLAB code is available.
Similar content being viewed by others
References
Agrafiotis D (2003) Stochastic proximity embedding. J Comput Chem 24(10):1215–1221
Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis-a brief tutorial, vol 18. Institute for Signal and information Processing, Starkville
Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Bengio Y, Paiement J, Vincent P, Delalleau O, Roux NL, Ouimet M (2004) Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering. In: Advances in neural information processing systems, vol 16, pp 177–184
Brand M (2002) Charting a manifold. In: Proceedings of neural information processing systems, pp 961–968
Buja A, Swayne D, Littman M, Dean H, Hofmann H, Chen L (2008) Data visualization with multidimensional scaling. J Comput Graph Stat 17(2):444–472
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27
Chen D, Cao X, Wen F, Sun J (2013) Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. In: Proceedings IEEE conference on computer vision and pattern recognition, pp 3025–3032
Coifman R, Lafon S (2006) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30
Cunningham J, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res 16:2859–2900
Donoho D, Grimes C (2005) Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 102:7426–7431
Duda R, Hart P, Stork D (2001) Pattern classification, 2nd edn. Wiley-Interscience, Hoboken
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real classification problems? J Mach Learn Res 15:3133–3181
Globerson A, Roweis S (2006) Metric learning by collapsing classes. In: Advances in neural information processing systems, vol 18, pp 451–458
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighborhood component analysis. In: Proceedings of neural information processing systems, pp 513–520
González-Rufino E, Carrión P, Cernadas E, Fernández-Delgado M, Domínguez-Petit R (2013) Exhaustive comparison of colour texture features and classification methods to discriminate cells categories in histological images of fish ovary. Pattern Recognit 46:2391–2407
He X (2005) Locality preserving projections. Ph.D. thesis, University of Chicago
He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. In: Proceedings of IEEE international conference on computer vision, vol 2, pp 1208–1213
Hinton G, Roweis S (2002) Stochastic neighbor embedding. In: Proceedings of neural information processing systems, pp 833–840
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Jianzhong W (2011) Geometric structure of high-dimensional data and dimensionality reduction, chap. Maximum variance unfolding. Springer, Berlin, pp 181–202
Jolliffe I (2002) Principal component analysis. Wiley Online Library
Lanaaya H, Martin A, Aboutajdine D, Khenchaf AH (2005) A new dimensionality reduction method for seabed characterization: supervised curvilinear component analysis. In: Europe Oceans 2005, vol 1, pp 339–344
Lawrence N (2004) Gaussian process latent variable models for visualisation of high dimensional data. In: Advances in neural information processing systems, vol 16, pp 329–336
Lespinats S, Aupetit M, Meyer-Baese A (2015) Classimap: a new dimension reduction technique for exploratory data analysis of labeled data. Int J Pattern Recogn Artif Intell 29(06):1551008
Lespinats S, Verleysen M, Giron A, Fertil B (2007) DD-HDS: a method for visualization and exploration of high-dimensional data. IEEE Trans Neural Netw 18(5):1265–1279
Li C, Guo J (2006) Supervised Isomap with explicit mapping. In: Int Conf Innov Comput Inf Control, vol 3, pp 345–348
Lichman M (2013) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Lisitsyn S, Widmer C, García F (2013) Tapkee: an efficient dimension reduction library. J Mach Learn Res 14:2355–2359. Software available at http://tapkee.lisitsyn.me
Maaten L (2007) An introduction to dimensionality reduction using Matlab. Technical report 2579-2605, Universiteit Maastricht. http://lvdmaaten.github.io/drtoolbox
Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2579–2605):85
Maaten L, Postma E, Herik H (2009) Dimensionality reduction: A comparative review. Technical report, Tilburg University. http://lvdmaaten.github.io/drtoolbox
Mika S, Ratsch G, Weston J, Schölkopf B, Mullers KR (1999) Fisher discriminant analysis with kernels. In: Proceedings of IEEE workshop in neural networks for signal processing, pp 41–48
Mthembu L, Greene J (2004) A comparison of three class separability measures. In: Proceedings of symposium of the Pattern Recognition Association of South Africa, pp 63–67
Ridder D, Kouropteva O, Okun O, Pietikainen M, Duin R (2003) Supervised locally linear embedding. In: Proceedings of joint international conference ICANN/ICONIP, Lecture Notes in Computer Science, vol 2714, pp 333–341. Springer, Berlin
Ridder D, Loog M, Reinders MJT (2004) Local Fisher embedding. In: Proceedings of the international conference on pattern recognition, vol 2, pp 295–298
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Sammon J (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409
Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Sha F, Saul L (2005) Analysis and extension of spectral methods for nonlinear dimensionality reduction. In: Proceedings of the international conference on machine learning, pp 784–791
Sheskin D (2006) Handbook of parametric and nonparametric statistical procedures. CRC Press, Boca Raton
Silva V, Tenenbaum J (2003) Global versus local methods in nonlinear dimensionality reduction. In: Advances in neural information processing systems, vol 15, pp 705–712
Spearman C (1904) General intelligence objectively determined and measured. Am J Psichol 15:206–221
Teh Y, Roweis S (2002) Automatic alignment of hidden representations. In: Advances in neural information processing systems, vol 15, pp 841–848
Tenenbaum J, Silva VD, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Thornton C (1998) Separability is a learner’s best friend. In: Proceedings of the Neural Computation and Psychology Workshop. Springer, pp 40–46
Tipping M, Bishop C (1999) Probabilistic principal component analysis. J R Stat Soc Ser B 61:611–622
Torgerson W (1952) Multidimensional scaling: I. Theory and method. Psychometrika 17(4):401–419
Verbeek J (2006) Learning nonlinear image manifolds by global alignment of local linear models. IEEE Trans Pattern Anal Mach Intell 28(8):1236–1250
Webb A (1995) Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recogn 28(5):753–759
Weinberger K, Saul L (2006) Unsupervised learning of image manifolds by semidefinite programming. Int J Comput Vis 70(1):77–90
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Weinberger K, Sha F, Zhu Q, Saul L (2006) Graph Laplacian regularization for large-scale semidefinite programming. In: Advances in neural information processing systems, vol 19, pp 1489–1496
Zhang T, Yang J, Zhao D, Ge X (2007) Linear local tangent space alignment and application to face recognition. Neurocomputing 70(7–9):1547–1553
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via local tangent space alignment. SIAM J Sci Comput 26(1):313–338
Zhao L, Zhang Z (2009) Supervised locally linear embedding with probability-based distance for classification. Comput Math Appl 57(6):919–926
Funding
This work was funded by the program Erasmus Mundus Acción 2, Strand 1, Lot 2, PEACE II, with Project Code 2013-2443/001-001.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Alawadi, S., Fernández-Delgado, M., Mera, D. et al. Polynomial Kernel Discriminant Analysis for 2D visualization of classification problems. Neural Comput & Applic 31, 3515–3531 (2019). https://doi.org/10.1007/s00521-017-3290-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3290-3