Skip to main content
Log in

Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present new methods for fast Gaussian process (GP) inference in large-scale scenarios including exact multi-class classification with label regression, hyperparameter optimization, and uncertainty prediction. In contrast to previous approaches, we use a full Gaussian process model without sparse approximation techniques. Our methods are based on exploiting generalized histogram intersection kernels and their fast kernel multiplications. We empirically validate the suitability of our techniques in a wide range of scenarios with tens of thousands of examples. Whereas plain GP models are intractable due to both memory consumption and computation time in these settings, our results show that exact inference can indeed be done efficiently. In consequence, we enable every important piece of the Gaussian process framework—learning, inference, hyperparameter optimization, variance estimation, and online learning—to be used in realistic scenarios with more than a handful of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. Note that in the remainder of the article, the term uncertainty refers to classification uncertainty, and not to the query strategy introduced by Kapoor et al. (2010).

  2. http://www.image-net.org/challenges/LSVRC/2011/ILSVRC2011_devkit-2.0.tar.gz.

References

  • Ablavsky, V., & Sclaroff, S. (2011). Learning parameterized histogram kernels on the simplex manifold for image and action classification. In IEEE international conference of computer vision (ICCV) pp. 1473–1480.

  • Bai, Z., & Golub, G. H. (1997). Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Annals of Numerical Mathematics, 4(1–4), 29–38.

    MathSciNet  MATH  Google Scholar 

  • Barla, A., Odone, F., & Verri, A. (2003). Histogram intersection kernel for image classification. In IEEE international conference on image processing (ICIP), pp. 513–516.

  • Berg, A.C., Deng, J., & Fei-Fei, L. (2010). Large scale visual recognition challenge. http://www.imagenet.org/challenges/LSVRC/2010/.

  • Bo, L., & Sminchisescu, C. (2008). Greedy block coordinate descent for large scale gaussian process regression. In Uncertainty in artificial intelligence (UAI).

  • Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal of Computer Vision (IJCV), 87(1–2), 28–52.

    Article  Google Scholar 

  • Bo, L., & Sminchisescu, C. (2012). Greedy block coordinate descent for large scale gaussian process regression. Computing Research Repository (CoRR) abs/1206.3238, previous publication in Uncertainty for Artificial Intelligence (UAI).

  • Bonilla, E.V., Chai, K.M.A., & Williams, C.K.I. (2008). Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems (NIPS) (pp. 153–160). Cambridge: MIT Press.

  • Bottou, L., Chapelle, O., DeCoste, D., & Weston, J. (Eds.). (2007). Large-scale kernel machines. Cambridge: MIT Press.

    Google Scholar 

  • Boughorbel, S., Tarel, J.P., & Boujemaa, N. (2005). Generalized histogram intersection kernel for image recognition. In IEEE international conference on image processing (ICIP), pp. 161–164.

  • Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2, 1–27.

    Article  Google Scholar 

  • Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In European Conference on Computer Vision (ECCV), pp. 71–84.

  • Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531.

  • Ebert, S., Fritz, M., & Schiele, B. (2012). Ralf: A reinforced active learning formulation for object class recognition. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3626–3633.

  • Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research (JMLR), 9, 1871–1874.

    MATH  Google Scholar 

  • Freytag, A., Fröhlich, B., Rodner, E., & Denzler, J. (2012a). Efficient semantic segmentation with gaussian processes and histogram intersection kernels. In International conference on pattern recognition (ICPR), pp. 3313–3316.

  • Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2012b). Rapid uncertainty computation with gaussian processes and histogram intersection kernels. In: Asian conference on computer vision (ACCV), pp. 511–524.

  • Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2013). Labeling examples that matter: Relevance-based active learning with gaussian processes. In German conference on pattern recognition (GCPR), pp. 282–291.

  • Freytag, A., Rodner, E., & Denzler, J. (2014a). Selecting influential examples: Active learning with expected model output changes. In: European conference on computer vision (ECCV).

  • Freytag, A., Rühle, J., Bodesheim, P., Rodner, E., & Denzler, J. (2014b). Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods. In: International conference on pattern recognition (ICPR)—FEAST workshop.

  • Grauman, K., & Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research (JMLR), 8(Apr), 725–760.

    MATH  Google Scholar 

  • He, H., & Siu, W.C. (2011). Single image super-resolution using gaussian process regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 449–456.

  • Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409–436.

    Article  MathSciNet  MATH  Google Scholar 

  • Käding, C., Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2015). Active learning and discovery of object categories in the presence of unnameable instances. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 4343–4352.

  • Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2010). Gaussian processes for object categorization. International Journal of Computer Vision (IJCV), 88(2), 169–188.

    Article  Google Scholar 

  • Kemmler, M., Rodner, E., & Denzler, J. (2010). One-class classification with gaussian processes. In: Asian conference on computer vision (ACCV), vol. 2, pp. 489–500.

  • Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS).

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 2169–2178.

  • Maji, S., Berg, A.C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8.

  • Maji, S., Berg, A. C., & Malik, J. (2013). Efficient classification for additive kernel svms. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(1), 66–77.

    Article  Google Scholar 

  • Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7(4), 308–313.

    Article  MathSciNet  MATH  Google Scholar 

  • Nickisch, H., & Rasmussen, C. E. (2008). Approximations for binary gaussian process classification. Journal of Machine Learning Research, 9(10), 2035–2078.

    MathSciNet  MATH  Google Scholar 

  • Nocedal, J., & Wright, S. J. (2006). Conjugate gradient methods. New York: Springer.

    Google Scholar 

  • Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3482–3489.

  • Pillonetto, G., Dinuzzo, F., & Nicolao, G. D. (2010). Bayesian online multitask learning of gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(2), 193–205.

    Article  Google Scholar 

  • Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 413–420.

  • Quiñonero Candela, J., & Rasmussen, C. E. (2005). A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research (JMLR), 6, 1939–1959.

    MathSciNet  MATH  Google Scholar 

  • Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge: The MIT Press.

    MATH  Google Scholar 

  • Rodner, E. (2011). Learning from few examples for visual recognition problems. Hut Verlag München, http://herakles.inf-cv.uni-jena.de/biborb/bibs/cv/papers/Rodner11:Diss.pdf.

  • Rodner, E., Hegazy, D., & Denzler, J. (2010). Multiple kernel gaussian process classification for generic 3d object recognition from time-of-flight images. In: International conference on image and vision computing New Zealand (IVCNZ), pp. 1–8.

  • Rodner, E., Freytag, A., Bodesheim, P., & Denzler, J. (2012). Large-scale gaussian process classification with flexible adaptive histogram kernels. In European conference on computer vision (ECCV), vol. 4, pp. 85–98.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115, 1–42.

    Article  MathSciNet  Google Scholar 

  • Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.

    Google Scholar 

  • Snoek, J., Larochelle, H., & Adams, R.P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (NIPS), pp. 2951–2959.

  • Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In IEEE International conference on computer vision (ICCV).

  • Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research (JMLR), 2(Nov), 45–66.

    MATH  Google Scholar 

  • Urtasun, R., & Darrell, T. (2008). Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8.

  • Vapnik, V. (1998). Statistical learning theory (Vol. 2). New York: Wiley.

    MATH  Google Scholar 

  • Vázquez, D., Marin, J., Lopez, A. M., Geronimo, D., & Ponsa, D. (2014). Virtual and real world adaptationfor pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(4), 797–809.

    Article  Google Scholar 

  • Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3539–3546.

  • Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In IEEE international conference on computer vision (ICCV), pp. 606–613.

  • Wang, G., Hoiem, D., & Forsyth, D. (2012). Learning image similarity from flickr groups using fast kernel machines. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(11), 2177–2188.

    Article  Google Scholar 

  • Williams, C.K.I., & Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. In: International Conference on Machine Learning (ICML), pp. 1159–1166.

  • Wu, J. (2010). A fast dual method for hik svm learning. In: European conference on computer vision (ECCV), pp. 552–565.

  • Wu, J. (2012). Efficient hik svm learning for image classification. IEEE Transactions on Image Processing (TIP), 21(10), 4442–4453.

    Article  MathSciNet  Google Scholar 

  • Yuan, Q., Thangali, A., Ablavsky, V., & Sclaroff, S. (2008). Multiplicative kernels: Object detection, segmentation and pose estimation. In Computer vision and pattern recognition (CVPR), IEEE, pp. 1–8.

  • Yuster, R. (2008). Matrix sparsification for rank and determinant computations via nested dissection. In: IEEE symposium on foundations of computer science (FOCS), pp. 137–145.

Download references

Acknowledgments

This research was partially supported by grant DE 735/10-1 of the German Research Foundation (DFG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Rodner.

Additional information

Communicated by Raquel Urtasun.

Appendices

Quantization Error Analysis

Proof of Theorem 1

The quantization trick approximates a new example \(\mathbf {\varvec{x}}_*\) with a quantized version \(\tilde{\mathbf {\varvec{x}}}_*\). In the following, we assume \(D\) quantizations with \(q\) bins are given for each dimension. For each input value \(\mathbf {\varvec{x}}\left( d\right) \in [l_d, u_d ]\), we can compute a quantized value \(\tilde{z}\). The maximum quantization error \(\varepsilon _{q}\) is then defined as follows:

$$\begin{aligned} \varepsilon _{q}&= \max _{\mathbf {\varvec{x}}\left( d\right) \in [l, u]} \max _{1 \le d \le D} | \mathbf {\varvec{x}}\left( d\right) - \tilde{z} |. \end{aligned}$$
(36)

Our goal is to analyze the difference between the exact predictive mean \(\mu _*\) and the one computed with the quantization trick \(\tilde{\mu }_*\):

$$\begin{aligned} | \mu _*- \tilde{\mu }_*|&= | \mathbf {\varvec{k}}_{*}^{\mathrm {T}} \mathbf {\varvec{\alpha }} - \tilde{\mathbf {\varvec{k}}}_*^{\mathrm {T}} \mathbf {\varvec{\alpha }} | \nonumber \\&= | (\mathbf {\varvec{k}}_{*}- \tilde{\mathbf {\varvec{k}}}_*)^{\mathrm {T}} \mathbf {\varvec{\alpha }} |\le \sum \limits _{i=1}^N| \triangle k\left( i\right) | | \mathbf {\varvec{\alpha }}\left( i\right) | \end{aligned}$$
(37)

where we have defined \(\triangle k\left( i\right) = \kappa (\mathbf {\varvec{x}}_{i}, \mathbf {\varvec{x}}_*) - \kappa (\mathbf {\varvec{x}}_{i}, \tilde{\mathbf {\varvec{x}}}_*)\) as an abbreviation. Let us analyze this term in more detail using the relation \(\min (a,b) = \frac{1}{2} \cdot (a + b - |a-b|)\):

$$\begin{aligned} | \triangle k\left( i\right) |&= | \kappa (\mathbf {\varvec{x}}_{i}, \mathbf {\varvec{x}}_*) - \kappa (\mathbf {\varvec{x}}_{i}, \tilde{\mathbf {\varvec{x}}}_*) | \nonumber \\&= \left| \sum \limits _{d=1}^D\min (\mathbf {\varvec{x}}_i\left( d\right) , \mathbf {\varvec{x}}_*\left( d\right) ) - \min (\mathbf {\varvec{x}}_i\left( d\right) , \tilde{\mathbf {\varvec{x}}}_*\left( d\right) ) \right| \nonumber \\&= \frac{1}{2} \left| \sum \limits _{d=1}^D\mathbf {\varvec{x}}_i\left( d\right) + \mathbf {\varvec{x}}_*\left( d\right) - |\mathbf {\varvec{x}}_i\left( d\right) - \mathbf {\varvec{x}}_*\left( d\right) |\right. \ldots \nonumber \\&\quad \left. \ldots -\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) + |\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) | \right| \nonumber \\&= \frac{1}{2} \left| \sum \limits _{d=1}^D\mathbf {\varvec{x}}_*\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) - |\mathbf {\varvec{x}}_i\left( d\right) - \mathbf {\varvec{x}}_*\left( d\right) | + \right. \ldots \nonumber \\&\quad \left. \ldots |\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) | \right| . \end{aligned}$$
(38)

We can now exploit the fact that \(\mathbf {\varvec{x}}_*\) is normalized to sum up to a constant value, e.g., 1. Furthermore, we assume that this also holds for \(\tilde{\mathbf {\varvec{x}}}_*\). This assumption is reasonable for a high dimension \(D\) and a quantization error with zero mean. Putting both aspects together, we obtain the following equality:

$$\begin{aligned} | \triangle k\left( i\right) |&= \frac{1}{2} \left| \sum \limits _{d=1}^D|\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) | - |\mathbf {\varvec{x}}_i\left( d\right) - \mathbf {\varvec{x}}_*\left( d\right) | \right| \nonumber \\&= \frac{1}{2} \left| \Vert \mathbf {\varvec{x}}_{i} - \tilde{\mathbf {\varvec{x}}}_*\Vert _1 - \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 \right| \end{aligned}$$
(39)

Due to the symmetry in the inequality with respect to \(\mathbf {\varvec{x}}_*\) and \(\tilde{\mathbf {\varvec{x}}}_*\), we can assume without loss of generality that \(\Vert \mathbf {\varvec{x}}_{i} - \tilde{\mathbf {\varvec{x}}}_*\Vert _1 \ge \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1\). This allows us to apply the triangle inequality and we finally obtain:

$$\begin{aligned} | \triangle k\left( i\right) |&= \frac{1}{2} \left( \Vert \mathbf {\varvec{x}}_{i} - \tilde{\mathbf {\varvec{x}}}_*\Vert _1 - \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 \right) \nonumber \\&\le \frac{1}{2} \left( \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 + \Vert \mathbf {\varvec{x}}_*- \tilde{\mathbf {\varvec{x}}}_*\Vert _1 - \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 \right) \nonumber \\&= \frac{1}{2} \Vert \mathbf {\varvec{x}}_*- \tilde{\mathbf {\varvec{x}}}_*\Vert _1. \end{aligned}$$
(40)

This directly leads to \(|\triangle k\left( i\right) | \le \frac{D\cdot \varepsilon _{q}}{2}\). Combined with Eq. (37), this proofs the final result of the Theorem.

The bound can be also improved by considering the sum of quantization errors in each dimension, but we skipped this fact for brevity and ease of notation. \(\square \)

Proof of the Bound

Proof of Lemma 1

First note that due to the conditions of the lemma, the following holds: \(1 \le \mu _2 < \mu _1 \le N\) and \(\bar{t}> 0\). Furthermore, the bound is only valid for \(\beta \ne \bar{t}\), because otherwise the \(2 \times 2\) matrix within the bound would be singular.

We now start by deriving the coefficients for \(\mu _1\) and \(\mu _2\). The first part of Eq. (19) can be written as:

$$\begin{aligned}&\begin{bmatrix} \log \beta ,&\log \bar{t}\end{bmatrix} \begin{bmatrix} \beta&\overline{t} \\ \beta ^2&\overline{t}^2 \end{bmatrix}^{-1} \nonumber \\&\quad = \begin{bmatrix} \log \beta ,&\log \bar{t}\end{bmatrix} \left( \frac{1}{\beta \bar{t}^2 - \bar{t}\beta ^2} \begin{bmatrix} \bar{t}^2&-\bar{t}\\ -\beta ^2&\beta \end{bmatrix} \right) \nonumber \\&\quad = \frac{1}{\beta \bar{t}^2 - \bar{t}\beta ^2} \begin{bmatrix} \bar{t}^2 \log \beta - \beta ^2 \log \bar{t},&\beta \log \bar{t}- \bar{t}\log \beta \end{bmatrix} \nonumber \\&\quad = \frac{1}{\bar{t}- \beta } \begin{bmatrix} \dfrac{\log \beta }{\beta } \bar{t}- \dfrac{\log \bar{t}}{\bar{t}} \beta ,&\dfrac{\log \bar{t}}{\bar{t}} - \dfrac{\log \beta }{\beta } \end{bmatrix}. \end{aligned}$$
(41)

Therefore, we get the following short form of Eq. (19) with \(\beta =1\):

(42)

Let \(\tilde{\mu }_2\) with \(0 < \tilde{\mu }_2 \le \mu _2\) be a lower bound of the squared Frobenius norm. If we replace \(\mu _2\) with \(\tilde{\mu _2}\) in Eq. (42), we notice that the log-term increases and the denominator of the second part decreases. This directly leads us to the validity of the lemma. \(\square \)

Proof of Lemma 2

(43)

Now, we show that the second term equals to \(N\cdot \log \gamma \) by using the definition of \(\bar{t}\) and the calculation of the weights for \(\mu _1\) and \(\mu _2\) as done in the beginning of the proof of Lemma 1:

(44)

\(\square \)

Proof of Theorem 2

The first part of the inequality was proved by Bai and Golub (1997) and the proof for the second part is straightforward by applying Lemma 2 with \(\gamma = \frac{1}{\beta }\) followed by using Lemma 1:

figure a

\(\square \)

Bounds on Quadratic Forms

From linear algebra we know that any real-valued, symmetric matrix \(\varvec{M} \in {\mathbb {R}}^{N\times N}\) can be transformed into \(\varvec{M} = \varvec{U} \varvec{D} \varvec{U}^{\mathrm {T}}\) where \(\varvec{D}\) is a positive definite diagonal matrix containing the eigenvalues of \(\varvec{M}\) and \(\varvec{U}\) is an orthogonal matrix of the same size as \(\varvec{M}\). Therefore, we notice that for any vector \(\mathbf {\varvec{x}}\in {\mathbb {R}}^{N}\) the following holds:

$$\begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}= \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{U} \varvec{D} \varvec{U}^{\mathrm {T}} \mathbf {\varvec{x}}= \tilde{\mathbf {\varvec{x}}}^{\mathrm {T}} \varvec{D} \tilde{\mathbf {\varvec{x}}} = \sum _{i = 1}^{N} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2. \end{aligned}$$
(46)

We denoted with \(\lambda _i\) the decreasingly ordered eigenvalues of \(\varvec{M}\), i.e., \(\lambda _1 \ge \cdots \ge \lambda _{N}\), and \(\tilde{x}_i\) contains the projection of \(\mathbf {\varvec{x}}\) onto the ith column of \(\varvec{U}\), which is the ith eigenvector of \(\varvec{M}\). As a result, we can bound the quadratic form in Eq. (46) as follows:

$$\begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}&= \sum _{i = 1}^{k} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 + \sum _{j = k+1}^{N} \lambda _j \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2 \end{aligned}$$
(47)
$$\begin{aligned}&\le \sum _{i = 1}^{k} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 + \lambda _{k+1} \sum _{j = k+1}^{N} \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2. \end{aligned}$$
(48)

Since \(\varvec{U}\) is an orthonormal basis, it does not influence the length of vectors, i.e., \(\left| \left| \varvec{U} \mathbf {\varvec{x}}\right| \right| = \left| \left| \mathbf {\varvec{x}}\right| \right| \). Therefore, we can obtain the following upper bound:

$$\begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}\le \sum _{i = 1}^{k} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 + \lambda _{k+1} \Bigl (\left| \left| \mathbf {\varvec{x}}\right| \right| ^2 - \sum _{i = 1}^{k} \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 \Bigr ). \end{aligned}$$
(49)

Equivalently, we get the following lower bound considering the \(k\) smallest eigenvalues of \(\varvec{M}\) and bounding the remaining eigenvalues with the \((k+1)\)th smallest:

$$\begin{aligned} \begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}&\ge \sum _{j = N-k+1}^{N} \lambda _j \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2 \\&\quad + \lambda _{N-k} \Bigl (\left| \left| \mathbf {\varvec{x}}\right| \right| ^2 - \sum _{j = N-k+1}^{N} \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2 \Bigr ). \end{aligned} \end{aligned}$$
(50)

For the special cases of \(k= 0\) in Eq. (49) and Eq. (50), we obtain the well known bounds for any positive definite matrix \(\varvec{M}\) and any vector \(\mathbf {\varvec{x}}\) of corresponding size:

$$\begin{aligned} \lambda _{N}(\varvec{M}) \left| \left| \mathbf {\varvec{x}}\right| \right| ^2 \ \le \ \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}\ \le \ \lambda _{1}(\varvec{M}) \left| \left| \mathbf {\varvec{x}}\right| \right| ^2. \end{aligned}$$
(51)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodner, E., Freytag, A., Bodesheim, P. et al. Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks. Int J Comput Vis 121, 253–280 (2017). https://doi.org/10.1007/s11263-016-0929-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0929-y

Keywords

Navigation