Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks

Rodner, Erik; Freytag, Alexander; Bodesheim, Paul; Fröhlich, Björn; Denzler, Joachim

doi:10.1007/s11263-016-0929-y

Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks

Published: 27 July 2016

Volume 121, pages 253–280, (2017)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Erik Rodner^1,2,
Alexander Freytag^1,2,
Paul Bodesheim³,
Björn Fröhlich⁴ &
…
Joachim Denzler^1,2

670 Accesses
5 Citations
Explore all metrics

Abstract

We present new methods for fast Gaussian process (GP) inference in large-scale scenarios including exact multi-class classification with label regression, hyperparameter optimization, and uncertainty prediction. In contrast to previous approaches, we use a full Gaussian process model without sparse approximation techniques. Our methods are based on exploiting generalized histogram intersection kernels and their fast kernel multiplications. We empirically validate the suitability of our techniques in a wide range of scenarios with tens of thousands of examples. Whereas plain GP models are intractable due to both memory consumption and computation time in these settings, our results show that exact inference can indeed be done efficiently. In consequence, we enable every important piece of the Gaussian process framework—learning, inference, hyperparameter optimization, variance estimation, and online learning—to be used in realistic scenarios with more than a handful of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rapid Uncertainty Computation with Gaussian Processes and Histogram Intersection Kernels

Approximations of Gaussian Process Uncertainties for Visual Recognition Problems

Online Recognition via a Finite Mixture of Multivariate Generalized Gaussian Distributions

Notes

Note that in the remainder of the article, the term uncertainty refers to classification uncertainty, and not to the query strategy introduced by Kapoor et al. (2010).
http://www.image-net.org/challenges/LSVRC/2011/ILSVRC2011_devkit-2.0.tar.gz.

References

Ablavsky, V., & Sclaroff, S. (2011). Learning parameterized histogram kernels on the simplex manifold for image and action classification. In IEEE international conference of computer vision (ICCV) pp. 1473–1480.
Bai, Z., & Golub, G. H. (1997). Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Annals of Numerical Mathematics, 4(1–4), 29–38.
MathSciNet MATH Google Scholar
Barla, A., Odone, F., & Verri, A. (2003). Histogram intersection kernel for image classification. In IEEE international conference on image processing (ICIP), pp. 513–516.
Berg, A.C., Deng, J., & Fei-Fei, L. (2010). Large scale visual recognition challenge. http://www.imagenet.org/challenges/LSVRC/2010/.
Bo, L., & Sminchisescu, C. (2008). Greedy block coordinate descent for large scale gaussian process regression. In Uncertainty in artificial intelligence (UAI).
Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal of Computer Vision (IJCV), 87(1–2), 28–52.
Article Google Scholar
Bo, L., & Sminchisescu, C. (2012). Greedy block coordinate descent for large scale gaussian process regression. Computing Research Repository (CoRR) abs/1206.3238, previous publication in Uncertainty for Artificial Intelligence (UAI).
Bonilla, E.V., Chai, K.M.A., & Williams, C.K.I. (2008). Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems (NIPS) (pp. 153–160). Cambridge: MIT Press.
Bottou, L., Chapelle, O., DeCoste, D., & Weston, J. (Eds.). (2007). Large-scale kernel machines. Cambridge: MIT Press.
Google Scholar
Boughorbel, S., Tarel, J.P., & Boujemaa, N. (2005). Generalized histogram intersection kernel for image recognition. In IEEE international conference on image processing (ICIP), pp. 161–164.
Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2, 1–27.
Article Google Scholar
Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In European Conference on Computer Vision (ECCV), pp. 71–84.
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531.
Ebert, S., Fritz, M., & Schiele, B. (2012). Ralf: A reinforced active learning formulation for object class recognition. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3626–3633.
Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research (JMLR), 9, 1871–1874.
MATH Google Scholar
Freytag, A., Fröhlich, B., Rodner, E., & Denzler, J. (2012a). Efficient semantic segmentation with gaussian processes and histogram intersection kernels. In International conference on pattern recognition (ICPR), pp. 3313–3316.
Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2012b). Rapid uncertainty computation with gaussian processes and histogram intersection kernels. In: Asian conference on computer vision (ACCV), pp. 511–524.
Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2013). Labeling examples that matter: Relevance-based active learning with gaussian processes. In German conference on pattern recognition (GCPR), pp. 282–291.
Freytag, A., Rodner, E., & Denzler, J. (2014a). Selecting influential examples: Active learning with expected model output changes. In: European conference on computer vision (ECCV).
Freytag, A., Rühle, J., Bodesheim, P., Rodner, E., & Denzler, J. (2014b). Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods. In: International conference on pattern recognition (ICPR)—FEAST workshop.
Grauman, K., & Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research (JMLR), 8(Apr), 725–760.
MATH Google Scholar
He, H., & Siu, W.C. (2011). Single image super-resolution using gaussian process regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 449–456.
Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409–436.
Article MathSciNet MATH Google Scholar
Käding, C., Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2015). Active learning and discovery of object categories in the presence of unnameable instances. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 4343–4352.
Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2010). Gaussian processes for object categorization. International Journal of Computer Vision (IJCV), 88(2), 169–188.
Article Google Scholar
Kemmler, M., Rodner, E., & Denzler, J. (2010). One-class classification with gaussian processes. In: Asian conference on computer vision (ACCV), vol. 2, pp. 489–500.
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS).
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 2169–2178.
Maji, S., Berg, A.C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8.
Maji, S., Berg, A. C., & Malik, J. (2013). Efficient classification for additive kernel svms. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(1), 66–77.
Article Google Scholar
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7(4), 308–313.
Article MathSciNet MATH Google Scholar
Nickisch, H., & Rasmussen, C. E. (2008). Approximations for binary gaussian process classification. Journal of Machine Learning Research, 9(10), 2035–2078.
MathSciNet MATH Google Scholar
Nocedal, J., & Wright, S. J. (2006). Conjugate gradient methods. New York: Springer.
Google Scholar
Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3482–3489.
Pillonetto, G., Dinuzzo, F., & Nicolao, G. D. (2010). Bayesian online multitask learning of gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(2), 193–205.
Article Google Scholar
Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 413–420.
Quiñonero Candela, J., & Rasmussen, C. E. (2005). A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research (JMLR), 6, 1939–1959.
MathSciNet MATH Google Scholar
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge: The MIT Press.
MATH Google Scholar
Rodner, E. (2011). Learning from few examples for visual recognition problems. Hut Verlag München, http://herakles.inf-cv.uni-jena.de/biborb/bibs/cv/papers/Rodner11:Diss.pdf.
Rodner, E., Hegazy, D., & Denzler, J. (2010). Multiple kernel gaussian process classification for generic 3d object recognition from time-of-flight images. In: International conference on image and vision computing New Zealand (IVCNZ), pp. 1–8.
Rodner, E., Freytag, A., Bodesheim, P., & Denzler, J. (2012). Large-scale gaussian process classification with flexible adaptive histogram kernels. In European conference on computer vision (ECCV), vol. 4, pp. 85–98.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115, 1–42.
Article MathSciNet Google Scholar
Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.
Google Scholar
Snoek, J., Larochelle, H., & Adams, R.P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (NIPS), pp. 2951–2959.
Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In IEEE International conference on computer vision (ICCV).
Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research (JMLR), 2(Nov), 45–66.
MATH Google Scholar
Urtasun, R., & Darrell, T. (2008). Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8.
Vapnik, V. (1998). Statistical learning theory (Vol. 2). New York: Wiley.
MATH Google Scholar
Vázquez, D., Marin, J., Lopez, A. M., Geronimo, D., & Ponsa, D. (2014). Virtual and real world adaptationfor pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(4), 797–809.
Article Google Scholar
Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3539–3546.
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In IEEE international conference on computer vision (ICCV), pp. 606–613.
Wang, G., Hoiem, D., & Forsyth, D. (2012). Learning image similarity from flickr groups using fast kernel machines. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(11), 2177–2188.
Article Google Scholar
Williams, C.K.I., & Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. In: International Conference on Machine Learning (ICML), pp. 1159–1166.
Wu, J. (2010). A fast dual method for hik svm learning. In: European conference on computer vision (ECCV), pp. 552–565.
Wu, J. (2012). Efficient hik svm learning for image classification. IEEE Transactions on Image Processing (TIP), 21(10), 4442–4453.
Article MathSciNet Google Scholar
Yuan, Q., Thangali, A., Ablavsky, V., & Sclaroff, S. (2008). Multiplicative kernels: Object detection, segmentation and pose estimation. In Computer vision and pattern recognition (CVPR), IEEE, pp. 1–8.
Yuster, R. (2008). Matrix sparsification for rank and determinant computations via nested dissection. In: IEEE symposium on foundations of computer science (FOCS), pp. 137–145.

Download references

Acknowledgments

This research was partially supported by grant DE 735/10-1 of the German Research Foundation (DFG).

Author information

Authors and Affiliations

Friedrich Schiller University Jena, Jena, Germany
Erik Rodner, Alexander Freytag & Joachim Denzler
Michael Stifel Center Jena, Jena, Germany
Erik Rodner, Alexander Freytag & Joachim Denzler
Max Planck Institute for Biogeochemistry, Jena, Germany
Paul Bodesheim
Daimler AG Research & Development, Böblingen, Germany
Björn Fröhlich

Authors

Erik Rodner
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Freytag
View author publications
You can also search for this author in PubMed Google Scholar
Paul Bodesheim
View author publications
You can also search for this author in PubMed Google Scholar
Björn Fröhlich
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Denzler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erik Rodner.

Additional information

Communicated by Raquel Urtasun.

Appendices

Quantization Error Analysis

Proof of Theorem 1

The quantization trick approximates a new example $\mathbf {\varvec{x}}_*$ with a quantized version $\tilde{\mathbf {\varvec{x}}}_*$. In the following, we assume $D$ quantizations with $q$ bins are given for each dimension. For each input value $\mathbf {\varvec{x}}\left( d\right) \in [l_d, u_d ]$, we can compute a quantized value $\tilde{z}$. The maximum quantization error $\varepsilon _{q}$ is then defined as follows:

$$\begin{aligned} \varepsilon _{q}&= \max _{\mathbf {\varvec{x}}\left( d\right) \in [l, u]} \max _{1 \le d \le D} | \mathbf {\varvec{x}}\left( d\right) - \tilde{z} |. \end{aligned}$$

(36)

Our goal is to analyze the difference between the exact predictive mean $\mu _*$ and the one computed with the quantization trick $\tilde{\mu }_*$:

$$\begin{aligned} | \mu _*- \tilde{\mu }_*|&= | \mathbf {\varvec{k}}_{*}^{\mathrm {T}} \mathbf {\varvec{\alpha }} - \tilde{\mathbf {\varvec{k}}}_*^{\mathrm {T}} \mathbf {\varvec{\alpha }} | \nonumber \\&= | (\mathbf {\varvec{k}}_{*}- \tilde{\mathbf {\varvec{k}}}_*)^{\mathrm {T}} \mathbf {\varvec{\alpha }} |\le \sum \limits _{i=1}^N| \triangle k\left( i\right) | | \mathbf {\varvec{\alpha }}\left( i\right) | \end{aligned}$$

(37)

where we have defined $\triangle k\left( i\right) = \kappa (\mathbf {\varvec{x}}_{i}, \mathbf {\varvec{x}}_*) - \kappa (\mathbf {\varvec{x}}_{i}, \tilde{\mathbf {\varvec{x}}}_*)$ as an abbreviation. Let us analyze this term in more detail using the relation $\min (a,b) = \frac{1}{2} \cdot (a + b - |a-b|)$:

$$\begin{aligned} | \triangle k\left( i\right) |&= | \kappa (\mathbf {\varvec{x}}_{i}, \mathbf {\varvec{x}}_*) - \kappa (\mathbf {\varvec{x}}_{i}, \tilde{\mathbf {\varvec{x}}}_*) | \nonumber \\&= \left| \sum \limits _{d=1}^D\min (\mathbf {\varvec{x}}_i\left( d\right) , \mathbf {\varvec{x}}_*\left( d\right) ) - \min (\mathbf {\varvec{x}}_i\left( d\right) , \tilde{\mathbf {\varvec{x}}}_*\left( d\right) ) \right| \nonumber \\&= \frac{1}{2} \left| \sum \limits _{d=1}^D\mathbf {\varvec{x}}_i\left( d\right) + \mathbf {\varvec{x}}_*\left( d\right) - |\mathbf {\varvec{x}}_i\left( d\right) - \mathbf {\varvec{x}}_*\left( d\right) |\right. \ldots \nonumber \\&\quad \left. \ldots -\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) + |\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) | \right| \nonumber \\&= \frac{1}{2} \left| \sum \limits _{d=1}^D\mathbf {\varvec{x}}_*\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) - |\mathbf {\varvec{x}}_i\left( d\right) - \mathbf {\varvec{x}}_*\left( d\right) | + \right. \ldots \nonumber \\&\quad \left. \ldots |\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) | \right| . \end{aligned}$$

(38)

We can now exploit the fact that $\mathbf {\varvec{x}}_*$ is normalized to sum up to a constant value, e.g., 1. Furthermore, we assume that this also holds for $\tilde{\mathbf {\varvec{x}}}_*$. This assumption is reasonable for a high dimension $D$ and a quantization error with zero mean. Putting both aspects together, we obtain the following equality:

$$\begin{aligned} | \triangle k\left( i\right) |&= \frac{1}{2} \left| \sum \limits _{d=1}^D|\mathbf {\varvec{x}}_i\left( d\right) - \tilde{\mathbf {\varvec{x}}}_*\left( d\right) | - |\mathbf {\varvec{x}}_i\left( d\right) - \mathbf {\varvec{x}}_*\left( d\right) | \right| \nonumber \\&= \frac{1}{2} \left| \Vert \mathbf {\varvec{x}}_{i} - \tilde{\mathbf {\varvec{x}}}_*\Vert _1 - \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 \right| \end{aligned}$$

(39)

Due to the symmetry in the inequality with respect to $\mathbf {\varvec{x}}_*$ and $\tilde{\mathbf {\varvec{x}}}_*$, we can assume without loss of generality that $\Vert \mathbf {\varvec{x}}_{i} - \tilde{\mathbf {\varvec{x}}}_*\Vert _1 \ge \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1$. This allows us to apply the triangle inequality and we finally obtain:

$$\begin{aligned} | \triangle k\left( i\right) |&= \frac{1}{2} \left( \Vert \mathbf {\varvec{x}}_{i} - \tilde{\mathbf {\varvec{x}}}_*\Vert _1 - \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 \right) \nonumber \\&\le \frac{1}{2} \left( \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 + \Vert \mathbf {\varvec{x}}_*- \tilde{\mathbf {\varvec{x}}}_*\Vert _1 - \Vert \mathbf {\varvec{x}}_{i} - \mathbf {\varvec{x}}_*\Vert _1 \right) \nonumber \\&= \frac{1}{2} \Vert \mathbf {\varvec{x}}_*- \tilde{\mathbf {\varvec{x}}}_*\Vert _1. \end{aligned}$$

(40)

This directly leads to $|\triangle k\left( i\right) | \le \frac{D\cdot \varepsilon _{q}}{2}$. Combined with Eq. (37), this proofs the final result of the Theorem.

The bound can be also improved by considering the sum of quantization errors in each dimension, but we skipped this fact for brevity and ease of notation. $\square $

Proof of the Bound

Proof of Lemma 1

First note that due to the conditions of the lemma, the following holds: $1 \le \mu _2 < \mu _1 \le N$ and $\bar{t}> 0$. Furthermore, the bound is only valid for $\beta \ne \bar{t}$, because otherwise the $2 \times 2$ matrix within the bound would be singular.

We now start by deriving the coefficients for $\mu _1$ and $\mu _2$. The first part of Eq. (19) can be written as:

$$\begin{aligned}&\begin{bmatrix} \log \beta ,&\log \bar{t}\end{bmatrix} \begin{bmatrix} \beta&\overline{t} \\ \beta ^2&\overline{t}^2 \end{bmatrix}^{-1} \nonumber \\&\quad = \begin{bmatrix} \log \beta ,&\log \bar{t}\end{bmatrix} \left( \frac{1}{\beta \bar{t}^2 - \bar{t}\beta ^2} \begin{bmatrix} \bar{t}^2&-\bar{t}\\ -\beta ^2&\beta \end{bmatrix} \right) \nonumber \\&\quad = \frac{1}{\beta \bar{t}^2 - \bar{t}\beta ^2} \begin{bmatrix} \bar{t}^2 \log \beta - \beta ^2 \log \bar{t},&\beta \log \bar{t}- \bar{t}\log \beta \end{bmatrix} \nonumber \\&\quad = \frac{1}{\bar{t}- \beta } \begin{bmatrix} \dfrac{\log \beta }{\beta } \bar{t}- \dfrac{\log \bar{t}}{\bar{t}} \beta ,&\dfrac{\log \bar{t}}{\bar{t}} - \dfrac{\log \beta }{\beta } \end{bmatrix}. \end{aligned}$$

(41)

Therefore, we get the following short form of Eq. (19) with $\beta =1$:

(42)

Let $\tilde{\mu }_2$ with $0 < \tilde{\mu }_2 \le \mu _2$ be a lower bound of the squared Frobenius norm. If we replace $\mu _2$ with $\tilde{\mu _2}$ in Eq. (42), we notice that the log-term increases and the denominator of the second part decreases. This directly leads us to the validity of the lemma. $\square $

Proof of Lemma 2

(43)

Now, we show that the second term equals to $N\cdot \log \gamma $ by using the definition of $\bar{t}$ and the calculation of the weights for $\mu _1$ and $\mu _2$ as done in the beginning of the proof of Lemma 1:

(44)

$\square $

Proof of Theorem 2

The first part of the inequality was proved by Bai and Golub (1997) and the proof for the second part is straightforward by applying Lemma 2 with $\gamma = \frac{1}{\beta }$ followed by using Lemma 1:

$\square $

Bounds on Quadratic Forms

From linear algebra we know that any real-valued, symmetric matrix $\varvec{M} \in {\mathbb {R}}^{N\times N}$ can be transformed into $\varvec{M} = \varvec{U} \varvec{D} \varvec{U}^{\mathrm {T}}$ where $\varvec{D}$ is a positive definite diagonal matrix containing the eigenvalues of $\varvec{M}$ and $\varvec{U}$ is an orthogonal matrix of the same size as $\varvec{M}$. Therefore, we notice that for any vector $\mathbf {\varvec{x}}\in {\mathbb {R}}^{N}$ the following holds:

$$\begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}= \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{U} \varvec{D} \varvec{U}^{\mathrm {T}} \mathbf {\varvec{x}}= \tilde{\mathbf {\varvec{x}}}^{\mathrm {T}} \varvec{D} \tilde{\mathbf {\varvec{x}}} = \sum _{i = 1}^{N} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2. \end{aligned}$$

(46)

We denoted with $\lambda _i$ the decreasingly ordered eigenvalues of $\varvec{M}$, i.e., $\lambda _1 \ge \cdots \ge \lambda _{N}$, and $\tilde{x}_i$ contains the projection of $\mathbf {\varvec{x}}$ onto the ith column of $\varvec{U}$, which is the ith eigenvector of $\varvec{M}$. As a result, we can bound the quadratic form in Eq. (46) as follows:

$$\begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}&= \sum _{i = 1}^{k} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 + \sum _{j = k+1}^{N} \lambda _j \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2 \end{aligned}$$

(47)

$$\begin{aligned}&\le \sum _{i = 1}^{k} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 + \lambda _{k+1} \sum _{j = k+1}^{N} \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2. \end{aligned}$$

(48)

Since $\varvec{U}$ is an orthonormal basis, it does not influence the length of vectors, i.e., $\left| \left| \varvec{U} \mathbf {\varvec{x}}\right| \right| = \left| \left| \mathbf {\varvec{x}}\right| \right| $. Therefore, we can obtain the following upper bound:

$$\begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}\le \sum _{i = 1}^{k} \lambda _i \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 + \lambda _{k+1} \Bigl (\left| \left| \mathbf {\varvec{x}}\right| \right| ^2 - \sum _{i = 1}^{k} \tilde{\mathbf {\varvec{x}}} \left( i\right) ^2 \Bigr ). \end{aligned}$$

(49)

Equivalently, we get the following lower bound considering the $k$ smallest eigenvalues of $\varvec{M}$ and bounding the remaining eigenvalues with the $(k+1)$th smallest:

$$\begin{aligned} \begin{aligned} \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}&\ge \sum _{j = N-k+1}^{N} \lambda _j \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2 \\&\quad + \lambda _{N-k} \Bigl (\left| \left| \mathbf {\varvec{x}}\right| \right| ^2 - \sum _{j = N-k+1}^{N} \tilde{\mathbf {\varvec{x}}} \left( j\right) ^2 \Bigr ). \end{aligned} \end{aligned}$$

(50)

For the special cases of $k= 0$ in Eq. (49) and Eq. (50), we obtain the well known bounds for any positive definite matrix $\varvec{M}$ and any vector $\mathbf {\varvec{x}}$ of corresponding size:

$$\begin{aligned} \lambda _{N}(\varvec{M}) \left| \left| \mathbf {\varvec{x}}\right| \right| ^2 \ \le \ \mathbf {\varvec{x}}^{\mathrm {T}} \varvec{M} \mathbf {\varvec{x}}\ \le \ \lambda _{1}(\varvec{M}) \left| \left| \mathbf {\varvec{x}}\right| \right| ^2. \end{aligned}$$

(51)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodner, E., Freytag, A., Bodesheim, P. et al. Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks. Int J Comput Vis 121, 253–280 (2017). https://doi.org/10.1007/s11263-016-0929-y

Download citation

Received: 10 November 2014
Accepted: 04 July 2016
Published: 27 July 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11263-016-0929-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks

Abstract

Access this article

Similar content being viewed by others

Rapid Uncertainty Computation with Gaussian Processes and Histogram Intersection Kernels

Approximations of Gaussian Process Uncertainties for Visual Recognition Problems

Online Recognition via a Finite Mixture of Multivariate Generalized Gaussian Distributions

Notes

References

Acknowledgments