Skip to main content
Log in

Classification using distances from samples to linear manifolds

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

A classifier is proposed wherein the distances from samples to linear manifolds (DSL) are used to perform classification. For each class, a linear manifold is built, whose dimension is high enough to pass all the training samples of the class. The distance from a query sample to a linear manifold is converted to the distance from a point to a linear subspace. And a simple and stable formula is derived to calculate the distance by virtue of the geometrical fundamental of the Gram matrix as well as the regularization technique. The query sample is assigned into the class whose linear manifold is the nearest. On one synthetic data set, thirteen binary-class data sets as well as six multi-class data sets, the experimental results show that the classification performance of DSL is of competence. On most of the data sets, DSL outperforms the comparing classifiers based on k nearest samples or subspaces, and is even superior to support vector machines on some data sets. Further experiment demonstrates that the test efficiency of DSL is also competitive to kNN and the related state-of-the-art classifiers on many data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Available at http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/neural/bench/cmu/bench.tgz.

Abbreviations

I m :

An identity matrix with m dimension. Especially, I denotes an identity matrix with an appropriate dimension

n :

The dimension of the input space

\(\mathcal{X}_{i,j}\) :

The matrix [x i , …, x j ] with x i , …, x j  ∈ R n

k(xy):

a kernel function

|A|:

The determinant of a square matrix A or the absolute value of a scalar A

h :

The total number of classes

m i :

The training sample number of the ith class

l i :

An (m i  − 1)-dimensional vector with entries 1

z i,j :

The jth training sample of the ith class, 1 < i ≤ h and 1 ≤ j ≤ m i

\(\mathcal{Z}_{i}\) :

The matrix \([z_{i,1},\ldots,z_{i,m_{i}}]\)

\(\mathcal{Z}_{i,j,k}\) :

The matrix [z i,j , …, z i,k ] with j ≤ k

s q :

A query sample

References

  1. Fix E, Hodges J (1951) Discriminatory analysis, non-parametric discrimination: consistency properties. Tech. rep. USAF School of Aviation and Medicine, Randolph Field, p 4

  2. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Transactions Inform. Theory, pp. 21–27

  3. Li B, Chen YW, Chen YQ (2008) The nearest neighbor algorithm of local probability centers. In: IEEE Trans Syst Man Cybern Part B 38:141–154

    Article  Google Scholar 

  4. Wang L, Suter D (2007) Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans Image Proc 16:1646–1661

    Article  MathSciNet  Google Scholar 

  5. Ge SS, Yang Y, Lee TH (2008) Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vision Comput 26: 1607–1620

    Article  Google Scholar 

  6. García-Pedrajas N (2009) Constructing ensembles of classifiers by means of weighted instance selection. IEEE Trans Neural Netw 20:258–277

    Article  Google Scholar 

  7. Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201

    Article  MathSciNet  Google Scholar 

  8. Wang J, Neskovic P, Cooper LN (2006) Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence. Pattern Recognit 39:417–423

    Article  MATH  Google Scholar 

  9. Fayed HA, Atiya AF (2009) A novel template reduction approach for the k-nearest neighbor method. IEEE Trans Neural Netw 20:890–896

    Article  Google Scholar 

  10. Athitsos V, Alon J, Sclaroff S, Kollios G (2008) Boostmap: an embedding method for efficient nearest neighbor retrieval. IEEE Trans Pattern Anal Mach Intell 30:89–104

    Article  Google Scholar 

  11. Domeniconi C, Peng J, Gunopulos D (2002) Locally adaptive metric nearest-neighbor classification. IEEE Trans Pattern Anal Mach Intell 24:1281–1285

    Article  Google Scholar 

  12. Hastie T, Tibshirani R (1996) Discriminant adaptive nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell 18:607–616

    Article  Google Scholar 

  13. Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In: In NIPS, MIT Press, Cambridge

  14. Zuo W, Zhang D, Wang K (2008) On kernel difference-weighted k-nearest neighbor classification. Pattern Anal Appl 11(3–4):247–257

    Article  MathSciNet  Google Scholar 

  15. Alkoot FM, Kittler J (2002) Moderating k-nn classifiers. Pattern Anal Appl 5(3):326–332

    Article  MathSciNet  Google Scholar 

  16. García V, Mollineda RA, Sánchez JS (2008) On the k-nn performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl 11:269–280

    Article  Google Scholar 

  17. Zhang P, Peng J, Domeniconi C (2005) Kernel pooled local subspaces for classification. IEEE Trans Syst Man Cybern Part B 35:489–502.

    Google Scholar 

  18. Balachander T, Kothari R (1999) Kernel based subspace pattern classification. Proc. Int. Joint conf. Neural Netw 5:3119–3122

    Google Scholar 

  19. Nalbantov GI, Groenen PJF, Bioch JC (2007) Nearest convex hull classification. Tech. Rep. EI 2006-50, Econometric Institute.

  20. Kumar MP, Torr P, Zisserman A (2007) An invariant large margin nearest neighbour classifier. In: IEEE 11th International Conference on Computer Vision. ICCV 2007, vol 2. pp 1–8

  21. Vincent P, Bengio Y. K-local hyperplane and convex distance nearest neighbor algorithms. In: NIPS, 2001

  22. Cevikalp H, Larlus D, Neamtu M, Triggs B, Jurie F (2010) Manifold based local classifiers: linear and nonlinear approaches. J Signal Proc Syst 61(1):61–73

    Article  Google Scholar 

  23. Cevikalp H, Triggs B, Polikar R (2008) Nearest hyperdisk methods for high-dimensional classification. In: Cohen WW, McCallum A, Roweis ST (eds) ICML, vol. 307. ACM international conference proceeding series, pp 120–127, ACM, Helsinki

  24. Sam H (2008) K-nearest neighbor finding using maxnearestdist. IEEE Trans Pattern Anal Mach Intell 30:243–252

    Article  Google Scholar 

  25. Cristescu R (1977) Topological vector spaces. Editura Academiei, Bucharest

  26. Lee J, Zhang C (2006) Classification of gene-expression data: the manifold-based metric learning way. Pattern Recognit 39:2450–2463

    Article  MATH  Google Scholar 

  27. Vapnik VN (1998) Statistical learning theory. A Wiley-Interscience Publication, Wiley, New york.

  28. Barth N (1999) The gramian and k-volume in n-space: some classical results in linear algebra. J Young Investig 2 (Online; accessed 19-July-2011).

  29. Simard PY, LeCun YA, Denker JS, Victorri B (1998) Transformation invariance in pattern recognition c tangent distance and tangent propagation. In: Orr GB, Müller K-R (eds) Neural networks: tricks of the trade, vol 1524. Lecture notes in computer science, Springer, Berlin, pp 239–274

  30. Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Transact Pattern Anal Mach Intell 28:69–74

    Article  Google Scholar 

  31. Cawley G, Talbot N (2007) Miscellaneous matlab software (Online; accessed 19-July-2011).

  32. Asuncion A, Newman D (2007) UCI machine learning repository (Online; accessed 19-July-2011)

  33. Gantmacher F, Matrizenrechung I (1958) Veb Deutscher Verlag Der Wissenschaften, Berlin

  34. Meyer CD (2001) Matrix analysis and applied linear algebra. SIAM

  35. Camastra F, Vinciarelli A (2002) Estimating the intrinsic dimension of data with a fractal-based method. IEEE Transact Pattern Anal Mach Intell 24:1404–1407

    Article  Google Scholar 

  36. Aster R, Borchers B, Thurber C (2005) Tikhonov regularization. Int Geophys 90:89–118

    Article  Google Scholar 

  37. Vapnik VN (1999) The nature of statistical learning theory (information science and statistics). Springer, Berlin

  38. Nene SA, Nayar SK, Murase H (1996) Columbia university image library (Online; accessed 20-July-2011)

  39. Keerthi SS, Lin C-J (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15: 1667–1689

    Article  MATH  Google Scholar 

  40. Vieira DAG, Takahashi RHC, Vasconcelos VPJA, Caminhas WM (2008) The Q-norm complexity measure and the minimum gradient method: a novel approach to the machine learning structural risk minimization problem. IEEE Transact Neural Netw 19:1415–1430

    Article  Google Scholar 

  41. Xu Z, Dai M, Meng D (2009) Fast and efficient strategies for model selection of gaussian support vector machine. IEEE Transact Syst Man Cybern Part B Cybern 39:1292–1307

    Article  Google Scholar 

  42. Mu T, Nandi AK (2009) Multiclass classification based on extended support vector data description. IEEE Transact Syst Man Cybern Part B 39:1206–1216

    Article  Google Scholar 

  43. Schölkopf B, Smola AJ (2001) Learning with Kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge

  44. Liu Y, You Z, Cao L (2006) A novel and quick SVM-based multi-class classifier. Pattern Recognit 39:2258–2264

    Article  MATH  Google Scholar 

  45. Graf ABA, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Transact Neural Netw 14:597–605

    Article  Google Scholar 

Download references

Acknowledgments

Thank Editors and Reviewers so much for the time and effort spent in processing our paper. This work is supported by NSFC under Grants 61173182 and 61179071 and SRFDP under Grants 20090181110052.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiguang Liu.

Appendix 1: Proof of (16)

Appendix 1: Proof of (16)

Proof:

For a matrix \(\mathcal{C}\) with a suitable dimension, there is the following relation according to the Sherman–Morrison–Woodbury formula [34]

$$ {\mathcal{C}}(\mu I+{\mathcal{C}}^{\rm T}{\mathcal{C}})^{-1}{\mathcal{C}}^{\rm T} =I-(I+\mu^{-1}{\mathcal{C}}{\mathcal{C}}^{\rm T})^{-1} $$
(17)

Based on (8) as well as (17), it follows that

$$ \begin{aligned} d_{i}^{s_{q}}&=[\psi(s_{q})-\psi(z_{i,1})]^{\rm T}[\psi(s_{q})-\psi(z_{i,1})]- [\psi(s_{q})-\psi(z_{i,1})]^{\rm T}[\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}]\\ &\quad\times\left[[\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}]^{\rm T}[\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}]+\mu I\right]^{-1}\\ &\quad\times [\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}]^{\rm T}[\psi(s_{q})-\psi(z_{i,1})]\\ &=[\psi(s_{q})-\psi(z_{i,1})]^{\rm T}[\psi(s_{q})-\psi(z_{i,1})]- [\psi(s_{q})-\psi(z_{i,1})]^{\rm T}\\ &\quad \times\left[I-\mu\left[[\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}] [\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}]^{\rm T}+\mu I\right]^{-1}\right][\psi(s_{q})-\psi(z_{i,1})]\\ &=[\psi(s_{q})-\psi(z_{i,1})]^{\rm T} \mu\left[[\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}] [\psi({\mathcal{Z}}_{i,2,m_{i}})-\psi(z_{i,1})l^{\rm T}_{i}]^{\rm T}+\mu I\right]^{-1}[\psi(s_{q})-\psi(z_{i,1})]\\ \end{aligned} $$

which is (16). \(\square\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Cao, X. & Liu, J.G. Classification using distances from samples to linear manifolds. Pattern Anal Applic 16, 417–430 (2013). https://doi.org/10.1007/s10044-011-0242-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-011-0242-x

Keywords

Navigation