A matter of notation: Several uses of the Kronecker product in 3D computer vision

doi:10.1016/j.patrec.2007.06.005

Pattern Recognition Letters

Volume 28, Issue 15, 1 November 2007, Pages 2127-2132

https://doi.org/10.1016/j.patrec.2007.06.005 Get rights and content

Abstract

This work presents a number of cases in Computer Vision where the introduction of the Kronecker product allows more elegant and compact derivations. We hold that a clear notation can enlighten properties and catalyze reasoning. In particular we introduce the trifocal matrix that allows to express the trilinear constraints among three views by using the familiar matrix algebra.

Introduction

The interest in the Kronecker product has grown recently, as witnessed by Van Loan (2000):

“The Kronecker product has a rich and very pleasing algebra that supports a wide range of fast, elegant, and practical algorithms. Several trends in scientific computing suggest that this important matrix operation will have an increasingly greater role to play in the future.”

In Computer Vision (CV), however, the Kronecker product appeared only sporadically and has not been widely used. One of the first appearances is in (Mendonça, 2001), where it is used mainly to compute derivatives of matrix functions (Magnus and Neudecker, 1999). For the same purpose it has been exploited later in (Fusiello et al., 2004). In Chojnacki et al., 2003, Izquierdo and Guerra, 2003 the Kronecker product arises in the study of the pre-conditioning of the eight-point-algorithm (Hartley, 1992). In (Brand, 2005) it is used in the context of non-rigid structure from motion. Albeit this sporadic appearances, the Kronecker product has not gained the attention that it probably deserves.

First we will describe the Kronecker product and some related matrix algebra tools. Then we will apply these tools to the derivation of some classical linear algorithm in CV, as the eight-point-algorithm and the Direct Linear Transform (DLT). Then we will re-derive the Zhang’s calibration method and the Fiore’s algorithm for exterior orientation. In all these cases the use of Kronecker product and related tools yields a compact derivation, where the matrices never need to be expanded in terms of their entries. This allows to reason about global properties of matrices – such as the rank.

The alternative derivations that we provide eventually attain to the same equations of the respective original algorithms. Hence, we refer the reader to the relevant papers for the discussion of their numerical properties.

In the last part we will introduce the trifocal matrix that – thanks to the Kronecker product – enables to express the trilinear constraints among three views with matrix algebra. Avoiding the tensorial notation is a great benefit in teaching, because in a typical CV course the exposition of tensor algebra is functional to the trifocal geometry only, hence it constitutes a substantial overhead. Moreover it is fairly unpalatable to the students, who, in our experience, are more proficient with the more familiar matrix algebra. All the previous attempts to avoid the tensorial notation (Ma et al., 2003, Hartley and Zisserman, 2003) sacrifices compactness, meaning that there is not a single algebraic object that “represents” the trilinearity, as our trifocal matrix does instead.

Section snippets

Some matrix tools

This section develops some matrix tools related to the Kronecker product that will prove useful in the rest of the paper. Further readings on this topic are (Horn and Johnson, 1994, Magnus and Neudecker, 1999).

The eight-point algorithm

A number of 2D–2D point correspondences $m_{ℓ}^{i} \leftrightarrow m_{r}^{i}$ (in homogeneous coordinates) is given, and we are required to find the fundamental matrix F that links corresponding points in the bilinear form: $m_{r}^{T} F m_{ℓ} = 0 .$ The eight-point algorithm (Hartley, 1992) exploits Eq. (9) to linearly compute F. Using the Kronecker product and Eq. (5), the derivation of the linear system of equations is particularly easy and elegant, given that one never needs to explode matrices into components

The direct linear transform algorithm

The Direct Linear Transform (DLT) algorithm (Hartley and Zisserman, 2003) solves – with small variations – two different problems:

•
Camera calibration (or resection);
•
Homography estimation.

In this section the reader will appreciate the use of the Kronecker notation not only for its compactness, but also because of its rank property (Eq. (3)).

Zhang’s internal calibration

Here we will re-derive the core of Zhang’s calibration algorithm (Zhang, 2000), i.e., the procedure for computing the internal parameters of a camera starting from world-image homographies.

Several images of a known planar pattern are available, and it is assumed that correspondences between image points and 3D points on the planar pattern have been established in each view. We are required to find the camera’s internal parameters matrix K.

It is easy to see that for a camera P = K[R∣t] the

Exterior orientation

Given a number of 2D–3D point correspondences mⁱ ↔ Mⁱ (in homogeneous coordinates) and the intrinsic camera parameters K, we are required to find a rotation matrix R and a translation vector t (which specify attitude and position of the camera) such that: $K^{- 1} m^{i} ≃ [R | t] M^{i} for all i .$ The problem can be cast as a camera resection and solved with the DLT algorithm, but the resulting rotation matrix R is not guaranteed to be orthonormal. Hence, Fiore’s algorithm (Fiore, 2001) is to be preferred, which is

The trifocal constraint

We have demonstrated how the Kronecker notation can yield compact and elegant derivations for some Computer Vision algorithm. We shall now demonstrate how the trifocal constraint can be introduced without resorting to trilinear tensors, thanks to the Kronecker product. This is probably the greatest merit of this notation.

Consider a point M in space projecting to m₁, m₂ and m₃ in the three cameras $P_{1} = [I | 0], P_{2} = [A_{2} | e_{2, 1}], and P_{3} = [A_{3} | e_{3, 1}] .$ Let us write the epipolar line of m₁ in the other two views: $ζ_{2} m$

Conclusions

We have shown some applications of the Kronecker notation to 3D Computer Vision problems. We argued that this compact notation, especially in the case of the trifocal constraint, can be a practical aid for teaching and a fruitful tool for reasoning about the properties of the matrices that are involved.

Acknowledgement

Michela Farenzena read the draft and her comments helped to improve the presentation.

References (20)

Brand, M., 2005. A direct method for 3D factorization of nonrigid motion observed in 2D. In: Proc. IEEE Conf. on...
W. Chojnacki et al.
Revisiting Hartley’s normalized eight-point algorithm
IEEE Trans. Pattern Anal Machine Intell.
(2003)
P.D. Fiore
Efficient linear solution of exterior orientation
IEEE Trans. Pattern Anal Machine Intell.
(2001)
A. Fusiello et al.
Globally convergent autocalibration using interval analysis
IEEE Trans. Pattern Anal Machine Intell.
(2004)
Hartley, R.I., 1992. Estimation of relative camera position for uncalibrated cameras. In: Proc. European Conf. on...
Hartley, R.I., 1995. In defence of the 8-point algorithm. In: Proc. Internat. Conf. on Computer Vision, pp....
R. Hartley
Lines and points in three views and the trifocal tensor
Internat. J. Comput. Vision
(1997)
R. Hartley et al.
Multiple View Geometry in Computer Vision
(2003)
R. Horn et al.
Topics in Matrix Analysis
(1994)
E. Izquierdo et al.
Estimating the essential matrix by efficient linear techniques
IEEE Trans. Circuits Systems Video Technol.
(2003)

There are more references available in the full text version of this article.

Cited by (12)

Refractive geometry for underwater domes
2022, ISPRS Journal of Photogrammetry and Remote Sensing
Underwater cameras are typically placed behind glass windows to protect them from the water. Spherical glass, a dome port, is well suited for high water pressures at great depth, allows for a large field of view, and avoids refraction if a pinhole camera is positioned exactly at the sphere’s center. Adjusting a real lens perfectly to the dome center is a challenging task, both in terms of how to actually guide the centering process (e.g. visual servoing) and how to measure the alignment quality, but also, how to mechanically perform the alignment. Consequently, such systems are prone to being decentered by some offset, leading to challenging refraction patterns at the sphere that invalidate the pinhole camera model. We show that the overall camera system becomes an axial camera, even for thick domes as used for deep sea exploration and provide a non-iterative way to compute the center of refraction without requiring knowledge of exact air, glass or water properties. We also analyze the refractive geometry at the sphere, looking at effects such as forward- vs. backward decentering, iso-refraction curves and obtain a 6th-degree polynomial equation for forward projection of 3D points in thin domes. We then propose a pure underwater calibration procedure to estimate the decentering from multiple images. This estimate can either be used during adjustment to guide the mechanical position of the lens, or can be considered in photogrammetric underwater applications.
Autocalibration for Structure from Motion
2017, Computer Vision and Image Understanding
This paper is about the estimation of calibration parameters of images to be used in Structure from Motion (SfM) pipelines and 3D reconstruction from image feature correspondences. It addresses the estimation of calibration parameters when they are not available, so that additional images may be included in the 3D reconstruction and so that the initial model may be closer to the true geometry of the scene. The approach is to take advantage of known calibration information of some of the images, to estimate calibration information of uncalibrated views, calibration information is therefore extended to images where visual features of the same objects are detected. The approach is based on the standard fundamental matrix, and extended versions of the fundamental matrix that embed the radial distortion model, named radial fundamental matrices. It is shown that the distortion model may be extracted from radial fundamental matrices, along with the standard fundamental matrix, and that the focal length may be subsequently estimated from it. By integrating a few of methods, the number of images that can be used in a large scale 3D reconstruction may be augmented and a better geometric model may be reconstructed. With this approach, the initial values of the parameters and the reconstructed geometry are close to the true solution, so that an optimization step may converge without getting stuck in local minima.
Refractive geometry for underwater domes
2021, arXiv
Estimation of Sparse Directional Connectivity With Expectation Maximization
2019, IEEE Transactions on Signal Processing
Applications of Anisotropic Procrustes Analysis
2019, CISM International Centre for Mechanical Sciences, Courses and Lectures
Enforcing consistency constraints in uncalibrated multiple homography estimation using latent variables
2015, Machine Vision and Applications

View all citing articles on Scopus

View full text

A matter of notation: Several uses of the Kronecker product in 3D computer vision

Abstract

Introduction

Section snippets

Some matrix tools

The eight-point algorithm

The direct linear transform algorithm

Zhang’s internal calibration

Exterior orientation

The trifocal constraint

Conclusions

Acknowledgement

Revisiting Hartley’s normalized eight-point algorithm

IEEE Trans. Pattern Anal Machine Intell.

Efficient linear solution of exterior orientation

IEEE Trans. Pattern Anal Machine Intell.

Globally convergent autocalibration using interval analysis

IEEE Trans. Pattern Anal Machine Intell.

Lines and points in three views and the trifocal tensor

Internat. J. Comput. Vision

Multiple View Geometry in Computer Vision

Topics in Matrix Analysis

Estimating the essential matrix by efficient linear techniques

IEEE Trans. Circuits Systems Video Technol.