Skip to main content
Log in

Spatial interest pixels (SIPs): useful low-level features of visual media data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual media data such as an image is the raw data representation for many important applications. Reducing the dimensionality of raw visual media data is desirable since high dimensionality degrades not only the effectiveness but also the efficiency of visual recognition algorithms. We present a comparative study on spatial interest pixels (SIPs), including eight-way (a novel SIP detector), Harris, and Lucas‐Kanade, whose extraction is considered as an important step in reducing the dimensionality of visual media data. With extensive case studies, we have shown the usefulness of SIPs as low-level features of visual media data. A class-preserving dimension reduction algorithm (using GSVD) is applied to further reduce the dimension of feature vectors based on SIPs. The experiments showed its superiority over PCA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We will strictly distinct the term feature from the term feature vector in the context of media-based classification applications. The former means color, texture, shape and pixel, whereas the latter means the representation of an image/video instance that are ready to feed into some classifier.

  2. In the context of image retrieval or 3D computer vision, they are called interest points. Renaming them as interest pixels is to avoid confusion between the image point and data point (i.e., feature vector).

  3. http://www.mis.atr.co.jp/~mlyons/jaffe.html

  4. http://cvc.yale.edu/projects/yalefaces/yalefaces.html

  5. http://rvl1.ecn.purdue.edu/~aleix/aleix_face_DB.html

  6. http://pics.psych.stir.ac.uk/

  7. In ten-fold cross validation, an entire dataset will first be split into ten pieces. Then the test will be run ten times. In each time, nine pieces are used as training data, and remaining one piece is used as test data. The final accuracy estimation is the mean estimation.

  8. Eigenspace and Fisherspace refers to the reduced spaces via PCA and LDA (either classical or generalized), respectively.

References

  1. Arya S (1995) Nearest neighbor searching and applications. In Ph. D. Thesis, University of Maryland, College Park, Maryland

  2. Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE TPAMI 19(7):711–720

    Google Scholar 

  3. Bergen J, Landy M (1991) Computational modeling of visual texture segregation. In computational models of visual perception. MIT, Cambridge Massachusetts, 1991, pp 253–271

    Google Scholar 

  4. Chellappa R, Wilson C, Sirohey S (1995) Human and machine recognition of faces: a survey. Proc IEEE 83(5):705–740

    Article  Google Scholar 

  5. Ekman P, Friesen W (1976) Pictures of facial affect. In Consulting psychologist, Palo Alto, California

  6. Fisher R (1936) The use of multiple measurements in taxonomic problems. In Annals of Eugenics 7:179–188

    Google Scholar 

  7. Gevers T, Smeulders AWM (1998) Image indexing using composite color and shape invariant features. In ICCV, pp 576–581

  8. Hancock P, Burton A, Bruce V (1996) Face processing: human perception and principal components analysis. Mem Cogn 24:26–40

    Google Scholar 

  9. Harris C, Stephens M (1988) A combined corner and edge detector. In Proc. 4th Alvey Vision Conference, Manchester, pp 147–151

  10. Huber P (1981) Robust statistics. Wiley

  11. Jolliffe I (1986) Principle component analysis. J Educ Psychol 24:417–441

    Google Scholar 

  12. Joyce D, Lewis P, Tansley R, Dobie M, Hall W (2000) Semiotics and agents for integrating and navigating through multimedia representations of concepts. In Proceedings of SPIE Vol. 3972, Storage and Retrieval for Media Databases 2000, pp 132–143

  13. Lin W-H, Hauptmann A (2002) News video classification using SVM-based multimodal classifiers and combination strategies. In ACM Multimedia, Juan-les-Pins, France, pp 323–326

  14. Loan CV (1976) Generalizing the singular value decomposition. SIAM J Numer Anal 13(1):76–83

    Article  MATH  MathSciNet  Google Scholar 

  15. Loupias E, Sebe N (1999) Wavelet-based salient points for image retrieval. In RR 99.11, Laboratoire Reconnaissance de Formes et Vision, INSA Lyon, November

  16. Lu Y, Hu C, Zhu X, Zhang H, Yang Q (2000) A unified framework for semantics and feature based relevance feedback in image retrieval systems. In ACM Multimedia, pp 31–37

  17. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In International Joint Conference on Artificial Intelligence, pp 674–679

  18. Lyons M, Budynek J, Akamatsu S (1999) Automatic classification of single facial images. IEEE transcations on PAMI 21(12):1357–1362

    Google Scholar 

  19. Martinez A, Benavente R (1998) The AR face database. Technical Report CVC Tech. Report No. 24

  20. Martinez A, Kak A (2001) PCA versus LDA. IEEE TPAMI 23(2):228–233

    Google Scholar 

  21. Howland P, Jeon M, Park H (2003) Cluster structure preserving dimension reduction based on the generalized singular value decomposition. SIAM J Matrix Anal Appl 25(1):165–179

    Google Scholar 

  22. Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest point detectors. Int J Comput Vis 37(2):151–172

    Article  MATH  Google Scholar 

  23. Sim T, Sukthankar R, Mullin M, Baluja S Memory-based face recognition for visitor identification. In Proc. 4th Intl. Conf. on FG'00, pp 214–220

  24. Smith J (1997) Integrated spatial and feature image systems: retrieval and compression. In PhD thesis, Graduate School of Arts and Sciences, Columbia University, New York, New York

  25. Swain M, Ballard D (1991) Color indexing. Int J Comput Vis 7:11–32

    Article  Google Scholar 

  26. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86

    Article  Google Scholar 

  27. Ye J, Janardan R, Park C, Park H (2003) A new optimization criterion for generalized discriminant analysis on undersampled problems. Technical Report TR-026-03, Department of Computer Science and Engineering University of Minnesota, Twin Cities, U.S.A., 2003

  28. Ye J, Janardan R, Park C, Park H (2003) A new optimization criterion for generalized discriminant analysis on undersampled problems. In IEEE Intl. Conf. on Data Mining, pp 419–426

  29. Zhang Z (1999) Feature-based facial expression recognition: experiments with a multi-layer perceptron. Int J Pattern Recogn Artif Intell 13(6):893–911

    Article  Google Scholar 

  30. Zhao W, Chellappa R, Rosenfeld A, Phillips P (2000) Face recognition: a literature survey. Technical Report CAR-TR-948

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Li.

Appendix

Appendix

1.1 Generalized discriminant analysis using GSVD

In this Appendix, we will first complete the formulation of the optimization problem (5.10), and then give the proof of Theorem 5.1.

From Eq. 5.9, we have

$$\begin{array}{*{20}c} {X^{T} H_{b} H^{T}_{b} X = {\left[ {\begin{array}{*{20}l} {{{\sum\nolimits_1^T {} }} \hfill} \\ {0 \hfill} \\ \end{array} } \right]}U^{T} U{\left[ {{\sum\nolimits_1 {\quad 0} }} \right]}} \\ { = {\left[ {\begin{array}{*{20}l} {{{\sum\nolimits_1^T {{\sum\nolimits_1 {} }} }} \hfill} & {0 \hfill} \\ {0 \hfill} & {0 \hfill} \\ \end{array} } \right]} \equiv D_{1} ,} \\ {X^{T} H_{w} H^{T}_{w} X = {\left[ {\begin{array}{*{20}l} {{{\sum\nolimits_2^T {} }} \hfill} \\ {0 \hfill} \\ \end{array} } \right]}V^{T} V{\left[ {{\sum\nolimits_2 {\quad 0} }} \right]}} \\ { = {\left[ {\begin{array}{*{20}l} {{{\sum\nolimits_2^T {} }} \hfill} & {0 \hfill} \\ {0 \hfill} & {0 \hfill} \\ \end{array} } \right]} \equiv D_{2} .} \\ \end{array} $$

Hence

$$\begin{array}{*{20}c} {S^{L}_{b} = G^{T} S_{b} G = G^{T} H_{b} H^{T}_{b} G = \widetilde{G}D_{1} \widetilde{G}^{T} ,} \\ {S^{L}_{w} = G^{T} S_{w} G = G^{T} H_{w} H^{T}_{w} G = \widetilde{G}D_{2} \widetilde{G}^{T} ,} \\ \end{array} $$
(A.11)

where the matrix \(\tilde{G} =(X^{-1} G)^T\).

We will use the above representations for \(S_b^L\) and \(S_w^L\) in Section A for the minimizations of \(F\).

We first formulate the optimization problem in Eq. 5.8 as the following:

$$\begin{aligned} & \min {\text{imize}}\quad {\text{trace}}{\left( {S^{L}_{w} } \right)} \\ & {\text{subject to }}\;{\text{trace}}{\left( {S^{L}_{b} } \right)} = 1. \\ \end{aligned} $$
(A.12)

Recall by Eq. A.11,

$$\begin{array}{*{20}c} {{\text{trace}}{\left( {S^{L}_{b} } \right)} = {\text{trace}}{\left( {\widetilde{{\text{G}}}D_{1} \widetilde{{\text{G}}}^{T} } \right)} = {\text{trace}}{\left( {D_{1} \widetilde{{\text{G}}}^{T} \widetilde{{\text{G}}}} \right)},} \\ {{\text{trace}}{\left( {S^{L}_{w} } \right)}{\text{ = trace}}{\left( {\widetilde{{\text{G}}}D_{2} \widetilde{{\text{G}}}^{T} } \right)} = {\text{trace}}{\left( {D_{2} \widetilde{{\text{G}}}^{T} \widetilde{{\text{G}}}} \right)}.} \\ \end{array} $$

Let \(u_{ij}\) be the \(ij\)-th term of the matrix \(\tilde{G}^T \tilde{G}\), then

$${\text{trace}}{\left( {S^{L}_{b} } \right)} = {\sum\limits_{i = 1}^r {u_{{ii}} + } }{\sum\limits_{i = r + 1}^{r + s} {\alpha ^{2}_{i} u_{{ii}} = 1} }$$
(A.13)
$${\text{trace}}{\left( {S^{L}_{w} } \right)} = {\sum\limits_{i = r + 1}^{r + s} {\beta ^{2}_{i} u_{{ii}} + {\sum\limits_{i = r + s + 1}^t {u_{{ii}} } }.} }$$
(A.14)

Since \(\alpha_i^2 + \beta_i^2 = 1\), for \( r+1 \le i \le r+s \), we have

$$\begin{array}{*{20}c} {{\text{trace}}{\left( {S^{L}_{w} } \right)} + {\text{trace}}{\left( {S^{L}_{b} } \right)}} \\ { = {\sum\limits_{i = 1}^r {u_{{ii}} + {\sum\limits_{i = r + 1}^{r + 2} {{\left( {\alpha ^{2}_{i} + \beta ^{2}_{i} } \right)}u_{{ii}} {\sum\limits_{i = r + s + 1}^t {u_{{_{{ii}} }} } }} }} }} \\ { = {\sum\limits_{i = 1}^t {u_{{ii}} ,} }} \\ \end{array} $$

hence \(\mbox{trace} ( S_w^L)=\sum_{i=1}^t u_{ii}-\mbox{trace} ( S_b^L)=\sum_{i=1}^t u_{ii} -1\).

Therefore the original optimization (5.8) is equivalent to the following

$$\begin{array}{*{20}c} {\min {\text{imize trace}}{\left( {S^{L}_{w} } \right)} = {\sum\limits_{i = 1}^t {u_{{ii}} - 1} }} \\ {{\text{subject to trace}}{\left( {S^{L}_{b} } \right)} = {\sum\limits_{i = 1}^r {u_{{ii}} + {\sum\limits_{i = r + 1}^{r + 2} {\alpha ^{2}_{i} u_{{ii}} = 1} }.} }} \\ \end{array} $$
(A.15)

Now we begin the proof of Theorem 5.1. First, note that \(u_{ii}\)'s are diagonal elements of a positive semi-definite matrix, hence nonnegative. Since \(u_{ii}\) is the diagonal element of a positive semi-definite matrix \(\tilde{G}^T \tilde{G}\), if \(u_{ii} = 0\) for some \(i\), then \(u_{ij} = u_{ji}= 0\) for every \(j\).

Recall the matrix \(\tilde{G}^T \tilde{G}\) is an \(m\) by \(m\) matrix with \(m\) diagonal entries. However only the first \(t\) diagonal entries \(\{u_{ii}\}_{i=1}^t\) appear in the optimization problem (5.10), hence the last \(m-t\) diagonal entries of the matrix \(\tilde{G}^T \tilde{G}\) don't affect the optimization problem (5.10). For simplicity, we set the last \(m-t\) diagonal entries of the matrix \(\tilde{G}^T \tilde{G}\) to be zero, i.e., \(u_{ii}=0\), for \(i = t+1, \cdots, m\).

For \(\{u_{ii}\}_{i=r+s+1}^t\), any positive value for \(u_{ii}\), when \(r+s+1 \le i \le t\) would increase the objective function in Eq. 5.10, while keeping the constraint unchanged. Hence we have \(u_{ii}=0\), for \(r+s+1 \le i \le t\). Thus we obtain Theorem 5.1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Q., Ye, J. & Kambhamettu, C. Spatial interest pixels (SIPs): useful low-level features of visual media data. Multimed Tools Appl 30, 89–108 (2006). https://doi.org/10.1007/s11042-006-0009-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-006-0009-3

Keywords

Navigation