Abstract
Visual media data such as an image is the raw data representation for many important applications. Reducing the dimensionality of raw visual media data is desirable since high dimensionality degrades not only the effectiveness but also the efficiency of visual recognition algorithms. We present a comparative study on spatial interest pixels (SIPs), including eight-way (a novel SIP detector), Harris, and Lucas‐Kanade, whose extraction is considered as an important step in reducing the dimensionality of visual media data. With extensive case studies, we have shown the usefulness of SIPs as low-level features of visual media data. A class-preserving dimension reduction algorithm (using GSVD) is applied to further reduce the dimension of feature vectors based on SIPs. The experiments showed its superiority over PCA.
Similar content being viewed by others
Notes
We will strictly distinct the term feature from the term feature vector in the context of media-based classification applications. The former means color, texture, shape and pixel, whereas the latter means the representation of an image/video instance that are ready to feed into some classifier.
In the context of image retrieval or 3D computer vision, they are called interest points. Renaming them as interest pixels is to avoid confusion between the image point and data point (i.e., feature vector).
In ten-fold cross validation, an entire dataset will first be split into ten pieces. Then the test will be run ten times. In each time, nine pieces are used as training data, and remaining one piece is used as test data. The final accuracy estimation is the mean estimation.
Eigenspace and Fisherspace refers to the reduced spaces via PCA and LDA (either classical or generalized), respectively.
References
Arya S (1995) Nearest neighbor searching and applications. In Ph. D. Thesis, University of Maryland, College Park, Maryland
Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE TPAMI 19(7):711–720
Bergen J, Landy M (1991) Computational modeling of visual texture segregation. In computational models of visual perception. MIT, Cambridge Massachusetts, 1991, pp 253–271
Chellappa R, Wilson C, Sirohey S (1995) Human and machine recognition of faces: a survey. Proc IEEE 83(5):705–740
Ekman P, Friesen W (1976) Pictures of facial affect. In Consulting psychologist, Palo Alto, California
Fisher R (1936) The use of multiple measurements in taxonomic problems. In Annals of Eugenics 7:179–188
Gevers T, Smeulders AWM (1998) Image indexing using composite color and shape invariant features. In ICCV, pp 576–581
Hancock P, Burton A, Bruce V (1996) Face processing: human perception and principal components analysis. Mem Cogn 24:26–40
Harris C, Stephens M (1988) A combined corner and edge detector. In Proc. 4th Alvey Vision Conference, Manchester, pp 147–151
Huber P (1981) Robust statistics. Wiley
Jolliffe I (1986) Principle component analysis. J Educ Psychol 24:417–441
Joyce D, Lewis P, Tansley R, Dobie M, Hall W (2000) Semiotics and agents for integrating and navigating through multimedia representations of concepts. In Proceedings of SPIE Vol. 3972, Storage and Retrieval for Media Databases 2000, pp 132–143
Lin W-H, Hauptmann A (2002) News video classification using SVM-based multimodal classifiers and combination strategies. In ACM Multimedia, Juan-les-Pins, France, pp 323–326
Loan CV (1976) Generalizing the singular value decomposition. SIAM J Numer Anal 13(1):76–83
Loupias E, Sebe N (1999) Wavelet-based salient points for image retrieval. In RR 99.11, Laboratoire Reconnaissance de Formes et Vision, INSA Lyon, November
Lu Y, Hu C, Zhu X, Zhang H, Yang Q (2000) A unified framework for semantics and feature based relevance feedback in image retrieval systems. In ACM Multimedia, pp 31–37
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, pp 674–679
Lyons M, Budynek J, Akamatsu S (1999) Automatic classification of single facial images. IEEE transcations on PAMI 21(12):1357–1362
Martinez A, Benavente R (1998) The AR face database. Technical Report CVC Tech. Report No. 24
Martinez A, Kak A (2001) PCA versus LDA. IEEE TPAMI 23(2):228–233
Howland P, Jeon M, Park H (2003) Cluster structure preserving dimension reduction based on the generalized singular value decomposition. SIAM J Matrix Anal Appl 25(1):165–179
Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest point detectors. Int J Comput Vis 37(2):151–172
Sim T, Sukthankar R, Mullin M, Baluja S Memory-based face recognition for visitor identification. In Proc. 4th Intl. Conf. on FG'00, pp 214–220
Smith J (1997) Integrated spatial and feature image systems: retrieval and compression. In PhD thesis, Graduate School of Arts and Sciences, Columbia University, New York, New York
Swain M, Ballard D (1991) Color indexing. Int J Comput Vis 7:11–32
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86
Ye J, Janardan R, Park C, Park H (2003) A new optimization criterion for generalized discriminant analysis on undersampled problems. Technical Report TR-026-03, Department of Computer Science and Engineering University of Minnesota, Twin Cities, U.S.A., 2003
Ye J, Janardan R, Park C, Park H (2003) A new optimization criterion for generalized discriminant analysis on undersampled problems. In IEEE Intl. Conf. on Data Mining, pp 419–426
Zhang Z (1999) Feature-based facial expression recognition: experiments with a multi-layer perceptron. Int J Pattern Recogn Artif Intell 13(6):893–911
Zhao W, Chellappa R, Rosenfeld A, Phillips P (2000) Face recognition: a literature survey. Technical Report CAR-TR-948
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Generalized discriminant analysis using GSVD
In this Appendix, we will first complete the formulation of the optimization problem (5.10), and then give the proof of Theorem 5.1.
From Eq. 5.9, we have
Hence
where the matrix \(\tilde{G} =(X^{-1} G)^T\).
We will use the above representations for \(S_b^L\) and \(S_w^L\) in Section A for the minimizations of \(F\).
We first formulate the optimization problem in Eq. 5.8 as the following:
Recall by Eq. A.11,
Let \(u_{ij}\) be the \(ij\)-th term of the matrix \(\tilde{G}^T \tilde{G}\), then
Since \(\alpha_i^2 + \beta_i^2 = 1\), for \( r+1 \le i \le r+s \), we have
hence \(\mbox{trace} ( S_w^L)=\sum_{i=1}^t u_{ii}-\mbox{trace} ( S_b^L)=\sum_{i=1}^t u_{ii} -1\).
Therefore the original optimization (5.8) is equivalent to the following
Now we begin the proof of Theorem 5.1. First, note that \(u_{ii}\)'s are diagonal elements of a positive semi-definite matrix, hence nonnegative. Since \(u_{ii}\) is the diagonal element of a positive semi-definite matrix \(\tilde{G}^T \tilde{G}\), if \(u_{ii} = 0\) for some \(i\), then \(u_{ij} = u_{ji}= 0\) for every \(j\).
Recall the matrix \(\tilde{G}^T \tilde{G}\) is an \(m\) by \(m\) matrix with \(m\) diagonal entries. However only the first \(t\) diagonal entries \(\{u_{ii}\}_{i=1}^t\) appear in the optimization problem (5.10), hence the last \(m-t\) diagonal entries of the matrix \(\tilde{G}^T \tilde{G}\) don't affect the optimization problem (5.10). For simplicity, we set the last \(m-t\) diagonal entries of the matrix \(\tilde{G}^T \tilde{G}\) to be zero, i.e., \(u_{ii}=0\), for \(i = t+1, \cdots, m\).
For \(\{u_{ii}\}_{i=r+s+1}^t\), any positive value for \(u_{ii}\), when \(r+s+1 \le i \le t\) would increase the objective function in Eq. 5.10, while keeping the constraint unchanged. Hence we have \(u_{ii}=0\), for \(r+s+1 \le i \le t\). Thus we obtain Theorem 5.1.
Rights and permissions
About this article
Cite this article
Li, Q., Ye, J. & Kambhamettu, C. Spatial interest pixels (SIPs): useful low-level features of visual media data. Multimed Tools Appl 30, 89–108 (2006). https://doi.org/10.1007/s11042-006-0009-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-006-0009-3