A matter of notation: Several uses of the Kronecker product in 3D computer vision
Introduction
The interest in the Kronecker product has grown recently, as witnessed by Van Loan (2000):
“The Kronecker product has a rich and very pleasing algebra that supports a wide range of fast, elegant, and practical algorithms. Several trends in scientific computing suggest that this important matrix operation will have an increasingly greater role to play in the future.”
In Computer Vision (CV), however, the Kronecker product appeared only sporadically and has not been widely used. One of the first appearances is in (Mendonça, 2001), where it is used mainly to compute derivatives of matrix functions (Magnus and Neudecker, 1999). For the same purpose it has been exploited later in (Fusiello et al., 2004). In Chojnacki et al., 2003, Izquierdo and Guerra, 2003 the Kronecker product arises in the study of the pre-conditioning of the eight-point-algorithm (Hartley, 1992). In (Brand, 2005) it is used in the context of non-rigid structure from motion. Albeit this sporadic appearances, the Kronecker product has not gained the attention that it probably deserves.
First we will describe the Kronecker product and some related matrix algebra tools. Then we will apply these tools to the derivation of some classical linear algorithm in CV, as the eight-point-algorithm and the Direct Linear Transform (DLT). Then we will re-derive the Zhang’s calibration method and the Fiore’s algorithm for exterior orientation. In all these cases the use of Kronecker product and related tools yields a compact derivation, where the matrices never need to be expanded in terms of their entries. This allows to reason about global properties of matrices – such as the rank.
The alternative derivations that we provide eventually attain to the same equations of the respective original algorithms. Hence, we refer the reader to the relevant papers for the discussion of their numerical properties.
In the last part we will introduce the trifocal matrix that – thanks to the Kronecker product – enables to express the trilinear constraints among three views with matrix algebra. Avoiding the tensorial notation is a great benefit in teaching, because in a typical CV course the exposition of tensor algebra is functional to the trifocal geometry only, hence it constitutes a substantial overhead. Moreover it is fairly unpalatable to the students, who, in our experience, are more proficient with the more familiar matrix algebra. All the previous attempts to avoid the tensorial notation (Ma et al., 2003, Hartley and Zisserman, 2003) sacrifices compactness, meaning that there is not a single algebraic object that “represents” the trilinearity, as our trifocal matrix does instead.
Section snippets
Some matrix tools
This section develops some matrix tools related to the Kronecker product that will prove useful in the rest of the paper. Further readings on this topic are (Horn and Johnson, 1994, Magnus and Neudecker, 1999).
The eight-point algorithm
A number of 2D–2D point correspondences (in homogeneous coordinates) is given, and we are required to find the fundamental matrix F that links corresponding points in the bilinear form:The eight-point algorithm (Hartley, 1992) exploits Eq. (9) to linearly compute F. Using the Kronecker product and Eq. (5), the derivation of the linear system of equations is particularly easy and elegant, given that one never needs to explode matrices into components
The direct linear transform algorithm
The Direct Linear Transform (DLT) algorithm (Hartley and Zisserman, 2003) solves – with small variations – two different problems:
- •
Camera calibration (or resection);
- •
Homography estimation.
In this section the reader will appreciate the use of the Kronecker notation not only for its compactness, but also because of its rank property (Eq. (3)).
Zhang’s internal calibration
Here we will re-derive the core of Zhang’s calibration algorithm (Zhang, 2000), i.e., the procedure for computing the internal parameters of a camera starting from world-image homographies.
Several images of a known planar pattern are available, and it is assumed that correspondences between image points and 3D points on the planar pattern have been established in each view. We are required to find the camera’s internal parameters matrix K.
It is easy to see that for a camera P = K[R∣t] the
Exterior orientation
Given a number of 2D–3D point correspondences mi ↔ Mi (in homogeneous coordinates) and the intrinsic camera parameters K, we are required to find a rotation matrix R and a translation vector t (which specify attitude and position of the camera) such that:The problem can be cast as a camera resection and solved with the DLT algorithm, but the resulting rotation matrix R is not guaranteed to be orthonormal. Hence, Fiore’s algorithm (Fiore, 2001) is to be preferred, which is
The trifocal constraint
We have demonstrated how the Kronecker notation can yield compact and elegant derivations for some Computer Vision algorithm. We shall now demonstrate how the trifocal constraint can be introduced without resorting to trilinear tensors, thanks to the Kronecker product. This is probably the greatest merit of this notation.
Consider a point M in space projecting to m1, m2 and m3 in the three camerasLet us write the epipolar line of m1 in the other two views:
Conclusions
We have shown some applications of the Kronecker notation to 3D Computer Vision problems. We argued that this compact notation, especially in the case of the trifocal constraint, can be a practical aid for teaching and a fruitful tool for reasoning about the properties of the matrices that are involved.
Acknowledgement
Michela Farenzena read the draft and her comments helped to improve the presentation.
References (20)
- Brand, M., 2005. A direct method for 3D factorization of nonrigid motion observed in 2D. In: Proc. IEEE Conf. on...
- et al.
Revisiting Hartley’s normalized eight-point algorithm
IEEE Trans. Pattern Anal Machine Intell.
(2003) Efficient linear solution of exterior orientation
IEEE Trans. Pattern Anal Machine Intell.
(2001)- et al.
Globally convergent autocalibration using interval analysis
IEEE Trans. Pattern Anal Machine Intell.
(2004) - Hartley, R.I., 1992. Estimation of relative camera position for uncalibrated cameras. In: Proc. European Conf. on...
- Hartley, R.I., 1995. In defence of the 8-point algorithm. In: Proc. Internat. Conf. on Computer Vision, pp....
Lines and points in three views and the trifocal tensor
Internat. J. Comput. Vision
(1997)- et al.
Multiple View Geometry in Computer Vision
(2003) - et al.
Topics in Matrix Analysis
(1994) - et al.
Estimating the essential matrix by efficient linear techniques
IEEE Trans. Circuits Systems Video Technol.
(2003)
Cited by (12)
Refractive geometry for underwater domes
2022, ISPRS Journal of Photogrammetry and Remote SensingAutocalibration for Structure from Motion
2017, Computer Vision and Image UnderstandingRefractive geometry for underwater domes
2021, arXivEstimation of Sparse Directional Connectivity With Expectation Maximization
2019, IEEE Transactions on Signal ProcessingApplications of Anisotropic Procrustes Analysis
2019, CISM International Centre for Mechanical Sciences, Courses and LecturesEnforcing consistency constraints in uncalibrated multiple homography estimation using latent variables
2015, Machine Vision and Applications