Abstract
The various techniques used to determine the reduced number of features in principal component analysis are usually ad-hoc and subjective. In this paper, we use a method of finding the number of features which is based on the saturation behavior of a graph and hence is not ad-hoc. It gives a lower bound on the number of features to be selected. We use a database of handwritten digits and reduce the dimensions of the images in this database based on the above method. A comparison with some conventional methods such as scree and cumulative percentage is also performed. These two methods are based on the values of the eigenvalues of the database covariance matrix. The Mahalanobis and Bhattacharyya distances will be shown to be of little use in determining the number of reduced dimensions.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Cangelosi, R., Goriely, A.: Component retention in principal component analysis with application to cDNA microarray data. Biology Direct 2(2) (2007), available from http://www.biology-direct.com/content/2/1/2
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley, New York (2001)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Pr., New York (1990)
Fukunaga, K., Olsen, D.R.: An Algorithm for Finding Intrinsic Dimensionality of Data. IEEE Trans. Comp. 20(2), 176–183 (1971)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality Reduction by Learning an Invariant Mapping. In: IEEE Conf. Comp. Vision and Pattern Recog., pp. 1735–1742. IEEE Computer Society Press, Los Alamitos (2006)
Jackson, J.E.: A User’s Guide to Principal Components. John Wiley, New York (2003)
Jain, A.K., Chandrasekaran, B. (eds.): Dimensionality and Sample Size Considerations in Pattern Recognition Practice, in Handbook of Statistics, pp. 835–855. North Holland, Amsterdam (1982)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Berlin (2002)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Rayner, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality Reduction Using Gene tic Algorithms. IEEE Trans. Evolutionary Computation 4(2), 164–171 (2000)
Tipping, M.E., Bishop, C.M.: Probabilistic Principal Component Analysis. J. Roy. Stat. Soc. 61(3), 611–622 (1999)
Yektaii, M., Bhattacharrya, P.: A Criterion for Measuring the Separability of Clusters and Its Applications to Principal Component Analysis, Concordia Institute for Information Security Institute (CIISE) internal report (March 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yektaii, M., Bhattacharya, P. (2007). Cumulative Global Distance for Dimension Reduction in Handwritten Digits Database. In: Qiu, G., Leung, C., Xue, X., Laurini, R. (eds) Advances in Visual Information Systems. VISUAL 2007. Lecture Notes in Computer Science, vol 4781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76414-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-76414-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76413-7
Online ISBN: 978-3-540-76414-4
eBook Packages: Computer ScienceComputer Science (R0)