1 Introduction

The aim of the paper is to clarify mathematical properties of pattern recognition of tensor data from the viewpoints of object oriented data analysis for volumetric data, which dealt with in biomedical image analysis, retrievals and recognition. In traditional pattern recognition, sampled patterns for numerical computation are embedded in an appropriate-dimensional Euclidean space as vectors. The other way is to deal with sampled patterns as three-way array data. These three-way array data are expressed as tensors [610] to preserve the linearity of the original pattern space.

The subspace method based on Karhunen-Loève transform is a fundamental technique in pattern recognition. Modern pattern recognition techniques for sampled value of patterns are described using linear algebra for sampled value embedded in vector space. Organs, cells and microstructures in cells dealt with in biomedical image analysis are volumetric data. We are required to process and analyse these data as volumetric data without embedding sampled values in vector space from the viewpoints of object oriented data analysis [5]. We express sampled values of volumetric data as three-way array data. These three-way array data are processed as the third order tensor. This expression of data requires to develop subspace method for tensor data.

We derive and clarify the mutual subspace and constrained mutual subspace methods for tensors using tensor PCA based on the Tucker-3 tensor decomposition. The mutual subspace method is stable against geometric perturbation of queries for pattern recognition, since the method assumes that a query is an element of a low-dimensional subspace. Furthermore, since the constrained subspace method eliminates the common parts of subspaces among categories, the method confirms robust recognition against global deformation of queries in Hilbert space. Since the constrained mutual subspace method is a combination of the mutual subspace method and constrained subspace method the method is stable and robust both against geometric perturbations and global deformations of queries for pattern recognition.

2 Pattern Recognition in Vector Space

A volumetric pattern is assumed to be a square integrable function in a linear space and to be defined on a finite support in three-dimensional Euclidean space [13] such that \(\int _{\varOmega }|f|^2d\varvec{x} <\infty \) for \(\varOmega \subset \mathbb {R}^3\). Furthermore, we assume \(\int _{\varOmega }|\nabla f|^2d\varvec{x}<\infty \) and \(\int _{\varOmega }tr \{ (\nabla \nabla ^\top f)^\top (\nabla \nabla ^\top f)\}d\varvec{x}<\infty \), where \(\nabla \nabla ^\top f\) is the Hessian matrix of f. For an orthogonal projection \(\varvec{P}_\perp =\varvec{I}-\varvec{P}\), \(f^\parallel =\varvec{P}f\) and \(f^\perp =\varvec{P}_\perp f\) are the canonical element and canonical form of f with respect to \(\varvec{P}\) and \(\varvec{P}_\perp \), respectively. If \(\varvec{P}\) is the projection to the space spanned by the constant element, the operation \(\varvec{P}_\perp f\) is called the constant canonicalisation. Let \(\varvec{P}_i\) be the orthogonal projection to the linear subspace corresponding to the category \(C_i\). For a pattern f, if \(|\varvec{P}_{i^*}(f/|f|)|\le \delta \) for an appropriately small positive number \(\delta \), we conclude that \(f\in C_{i^*}\).

Setting \(\varvec{\delta }\) and \(\varepsilon \) to be a small vector and a small positive number, we have the relation

$$\begin{aligned} |f(\varvec{x}+\varvec{\delta })-(f(\varvec{x})+\varvec{\delta }^\top \nabla f+ \frac{1}{2} \varvec{\delta }^\top (\nabla \nabla ^\top f)\varvec{\delta }) |< \varepsilon , \end{aligned}$$
(1)

for local geometric perturbations. All f, \(f_x\), \(f_y\), \(f_z\), \(f_{xx}\), \(f_{yy}\), \(f_{zz}\), \(f_{xy}\), \(f_{yz}\) and \(f_{zx}\) are independent, if f is not sinusoidal in each direction. Therefore, Eq. (1) implies that, for a pattern defined on three-dimensional Euclidean space, the local dimensions of a pattern are four and ten, if local geometric perturbations and local bending deformation of the pattern are assumed as local transformations to the pattern. This property of the local dimensionality allows us to establish the mutual subspace method, which deals with a query as a pattern in a subspace.

Setting (fg) to be the inner product in Hilbert space \(\mathfrak {H}\), the relation \(|f|^2=(f,f)\) is satisfied. Let \(\theta \) be the canonical angle between a pair of linear subspaces \(L_1\) and \(L_2\). Setting \(\varvec{P}_1\) and \(\varvec{P}_2\) to be the orthogonal projections to \(L_1\) and \(L_2\), respectively, \(\cos ^2\theta \) is the maximiser of \((\varvec{P}_1 f, \varvec{P}_2g)^2\) with respect to the conditions \(|f|=1\), \(|g|=1\) \(\varvec{P}_1f=f\) and \(\varvec{P}_2g=g\). The relation \(\cos ^2\theta =\lambda _{\text{ max }}^2\) is satisfied, where \(\lambda _{\text{ max }}\) is the maximal singular value of \(\varvec{P}_2\varvec{P}_1\).

Since, in mutual subspace method, a query f is expressed by using a set of local bases, we set that \(\varvec{Q}_f\) is the orthogonal projection to linear subspace expressing the query f. Then, if the canonical angle between \(\varvec{Q}_f\) and \(\varvec{P}_i\) satisfies the relation \(\angle (\varvec{Q}_f, \varvec{P}_i)<\angle (\varvec{Q}_f, \varvec{P}_{i}^*)\) for all \(C_i\), we conclude that \(f\in C_{i^*}\).

Setting \(\varvec{P}_i\) to be the orthogonal projection to linear subspace \(\mathfrak {L}_i\) corresponding to the category \(C_i\), the orthogonal projection which maximises the criterion \( J=\sum _{i=1}^n |\varvec{Q}\varvec{P}_i|_2^2 \) with respect to the condition \(\varvec{Q}^*\varvec{Q}=\varvec{I}\) where \(\varvec{Q}^*\) is the conjugate of \(\varvec{Q}\) and \(|\varvec{A}|\) is the trace norm of the operator \(\varvec{A}\) in Hilbert space \(\mathfrak {H}\). Though operation \(\varvec{Q}f\) removes common part for all categories from f, \((\varvec{I}-\varvec{Q})f\) preserves essentially significant parts for pattern recognition of f.

For f and g in \(\mathfrak {H}\), we define the metric d for \(\mu (f)\) and \(\mu (g)\), such that \(d(\mu (f), \mu (g))\), using an appropriate transform \(\mu \) from \(\mathfrak {H}\) to its subset. Furthermore, using an appropriate mapping \(\varPhi \), we define a measure

$$\begin{aligned} s(f,g)=\varPhi (d(\mu (f), \mu (g))). \end{aligned}$$
(2)

If we set \(\mu (f)=\frac{f}{|f|}\) and set d and \(\varPhi \) the geodesic distance on the unit sphere in \(\mathfrak {H}\) and \(\varPhi (x)=\cos x\), respectively, s(fg) becomes the similarity measure based on the angle between f and g. For \(f'=f+\delta _f\) and \(g'=g+\delta _g\), setting

$$\begin{aligned} \min (|f|,|g|)=\varLambda , \, \, \, \max (\delta _f,\delta _g)=\varDelta , \end{aligned}$$
(3)

we have the relation

$$\begin{aligned} \left| \left( \frac{f'}{|f'|}, \frac{g'}{|g'|}\right) -\left( \frac{f}{|f|}, \frac{g}{|g|}\right) \right| =c\frac{\varDelta }{\varLambda }, \end{aligned}$$
(4)

for a positive constant c. Therefore, s(fg) is stable and robust against perturbations and noises for f and g.

For patterns in \(\mathfrak {H}\), we have the following property.

Property 1

For \(|f|=1\) and \(|g|=1\), assuming \(|f-g|\le \frac{1}{3}\cdot \frac{\pi }{2}\) the geodesic distance \(\theta =d_{S}(f,g)\) between f and g satisfies the relation \(|\theta -|f-g||< \varepsilon \) for a positive small number \(\varepsilon \).

In traditional pattern recognition, these sampled patterns are embedded in an appropriate-dimensional Euclidean space as vectors. For \(\varvec{x}\in \mathbb {R}^n\) and \(\varvec{X}\in \mathbb {R}^{n\times n}\), \(|\varvec{x}|_2\) and \(|\varvec{X}|_F\) are the vector norm and Frobenius norm of \(\varvec{x}\) and \(\varvec{X}\), respectively.

Setting the data matrix \(\varvec{X}\) to be \(\varvec{X}= \left( \varvec{f}_1,\varvec{f}_2,\cdots , \varvec{f}_m \right) \) for data vectors \(\{\varvec{f}_i\}_{i=1}^m\) in \(\mathbb {R}^N\), whose mean is zero, the Karhunen-Loève transform is established by computing \(\hat{\varvec{f}}_i=\varvec{U}\varvec{f}_i\) for \(\varvec{U}\) which minimises \( J_1=|\varvec{U}\varvec{X}|_{F}^2 \) with the condition \(\varvec{U}^\top \varvec{U}=\varvec{I}_N\). The orthogonal matrix \(\varvec{U}\) is the minimiser of

$$\begin{aligned} J_{11}=|\varvec{U}\varvec{X}|_{F}^2 +\langle (\varvec{U}^\top \varvec{U}-\varvec{I})\varvec{\varLambda }\rangle \end{aligned}$$
(5)

where

$$\begin{aligned} \varvec{\varLambda }=Diag(\lambda _1,\lambda _2,\cdots ,\lambda _N) \end{aligned}$$
(6)

for

$$\begin{aligned} \lambda _1\ge \lambda _2\ge \lambda _2 \ge \cdots \ge \lambda _N \ge 0. \end{aligned}$$
(7)

The minimiser of Eq. (5) is the solution of the eigenmatrix problem

$$\begin{aligned} \varvec{M}\varvec{U}=\varvec{U}\varvec{\varLambda }, \, \, \varvec{M}=\varvec{X}\varvec{X}^\top \end{aligned}$$
(8)

The row vectors of \(\varvec{U}\) are the principal components.

The compression of \(\varvec{f}_i\) to a low-dimensional linear subspace is achieved by computing the transform \(\varvec{P}_n\varvec{U}\varvec{f}\), where \(\varvec{P}_n\), for \(n<N\), is the orthogonal projection such that

$$\begin{aligned} \varvec{P}_n= \left( \begin{array}{cc} \varvec{I}_n&{}\varvec{O}\\ \varvec{O}^\top &{}\varvec{O} \end{array}\right) . \end{aligned}$$
(9)

3 Pattern Recognition in Multi-linear Forms

For the triplet of positive integers \(I_1\), \(I_2\) and \(I_3\), the third-order tensor \(\mathbb {R}^{I_1 \times I_2 \times I_3}\) is expressed as \(\mathcal {X}=((x_{ijk}))\) Indices i, j and k are called the 1-mode, 2-mode and 3-mode of \(\mathcal {X}\), respectively. The tensor space \(\mathbb {R}^{I_1\times I_2\times I_3}\) is interpreted as the Kronecker product of three vector spaces \(\mathbb {R}^{I_1}\), \(\mathbb {R}^{I_2}\) and \(\mathbb {R}^{I_3}\) such that \(\mathbb {R}^{I_1} \otimes \mathbb {R}^{I_2} \otimes \mathbb {R}^{I_3}\). We set \(I=\max (I_1,I_2,I_3)\).

For a square integrable function \(f(\varvec{x})\), which is zero outside of a finite support \(\varOmega \) in three-dimensional Euclidean space, the sample \(Sf(\varDelta \mathbf{z})\) for \(\mathbf{z}\in \mathbf{Z}^3\) and \(|\mathbf{z}|_\infty \le I\) defines an \(I \times I \times I \) three-way array \(\mathbf{F}\). To preserve the multi-linearity of the function \(f(\varvec{x})\), we deal with the array \(\mathbf{F}\) as a third-order tensor \(\mathcal {F}\). The operation \(vec\mathcal {F}\) derives a vector \(\varvec{f}\in \mathbb {R}^{I_{123}}\) for \(I_{123}=I_2\cdot I_2\cdot I_3\). We can reconstruct f from \( \mathcal {F}\) using an interpolation procedure.

For \(\mathcal {X}\), the n-mode vectors, \(n=1,2,3\), are defined as the \(I_n\)-dimensional vectors obtained from \(\mathcal {X}\) by varying this index \(i_n\) while fixing all the other indices.

The unfolding of \(\mathcal {X}\) along the n-mode vectors of \(\mathcal {X}\) is defined as matrices such that

$$\begin{aligned} \mathcal {X}_{(1)} \in \mathbb {R}^{I_1 \times I_{23}}, \, \, \mathcal {X}_{(2)} \in \mathbb {R}^{I_2 \times I_{13}}, \, \, \mathcal {X}_{(3)} \in \mathbb {R}^{I_3 \times I_{12}} \end{aligned}$$
(10)

for \(I_{12}=I_1\cdot I_2\), \(I_{23}=I_\cdot I_3\) and \(I_{13}=I_1\cdot I_3\), where the column vectors of \(\mathcal {X}_{(j)}\) are the j-mode vectors of \(\mathcal {X}\) for \(i=1,2,3\). We express the j-mode unfolding of \(\mathcal {X}_i\) as \(\mathcal {X}_{i, (j)}\).

For matrices

$$\begin{aligned} \varvec{U}=((u_{ii'}))\in \mathbb {R}^{ I_1 \times I_1}, \, \, \varvec{V}=((v_{jj'}))\in \mathbb {R}^{ I_2 \times I_2}, \, \, \varvec{W}=((w_{kk'}))\in \mathbb {R}^{ I_3 \times I_3}, \end{aligned}$$
(11)

the n-mode products for \(n=1,2,3\) of a tensor \(\mathcal {X}\) are the tensors with entries

$$\begin{aligned} (\mathcal {X}\times _1 \varvec{U})_{ijk}= & {} \sum _{i'=1}^{I_1} x_{i'jk}u_{i'i}, \nonumber \\ (\mathcal {X}\times _2 \varvec{V})_{ijk}= & {} \sum _{j'=1}^{I_2} x_{ij'k}v_{j'j}, \\ (\mathcal {X}\times _3 \varvec{W})_{ijk}= & {} \sum _{k'=1}^{I_3} x_{ijk'}w_{k'k}, \nonumber \end{aligned}$$
(12)

where \((\mathcal {X})_{ijk}=x_{ijk}\) is the ijk-th element of the tensor \(\mathcal {X}\). The inner product of two tensors \(\mathcal {X}\) and \(\mathcal {Y}\) in \(\mathbb {R}^{I_1 \times I_2 \times I_3}\) is

$$\begin{aligned} \langle \mathcal {X}, \mathcal {Y} \rangle = \sum _{i=1}^{I_1} \sum _{j=1}^{I_2} \sum _{k=1}^{I_3} x_{ijk} y_{ijk}. \end{aligned}$$
(13)

Using this inner product, we have the Frobenius norm of a tensor \(\mathcal {X}\) as \(| \mathcal {X} |_{F} = \sqrt{\langle \mathcal {X}, \mathcal {X} \rangle }\). The Frobenius norm \(|\mathcal {X}|_{F}\) of the tensor \(\mathcal {X}\) satisfies the relation \(|\mathcal {X}|_{F}=|\varvec{f}|_2\), where \(|\varvec{f}|_2\) is the Euclidean norm of the vector \(\varvec{f}\).

To project a tensor \(\mathcal {X}\) in \(\mathbb {R}^{I_1\times I_2\times I_3}\) to the tensor \(\mathcal {Y}\) in a lower-dimensional tensor space \(\mathbb {R}^{P_1 \times P_2 \times P_3}\), where \(P_n \le I_n \), three projection matrices \(\{ \varvec{U}^{(n)} \}_{n=1}^3\), for \(\varvec{U}^{(n)}\in \mathbb {R}^{I_n\times P_n}\) are required for \(n=1,2,3\). Using these three projection matrices, we have the tensor orthogonal projection such that

$$\begin{aligned} \mathcal {Y} = \mathcal {X} \times _{1} \varvec{U}^{(1) \top } \times _{2} \varvec{U}^{(2) \top } \times _{3} \varvec{U}^{(3) \top }. \end{aligned}$$
(14)

This projection is established in three steps, where in each step, each n-mode vector is projected to a \(P_n\)-dimensional space by \(\varvec{U}^{(n)}\) for \(n=1,2,3\).

For a collection of matrices \(\{ \varvec{F}_i \}_{i=1}^N \in \mathbb {R}^{m \times n}\) satisfying \(\mathrm {E}_i( \varvec{F}_i) = 0\), the orthogonal-projection-based data reduction \(\hat{F}_i = \varvec{U}^{\top }\varvec{F}_i\varvec{V}\) is performed by maximising

$$\begin{aligned} J_{2}(\varvec{U}, \varvec{V}) = \mathrm {E}_i \left( |\varvec{U}\hat{\varvec{F}}_i\varvec{V}^{\top } |_{\mathrm {F}}^2 \right) \end{aligned}$$
(15)

with respect to the conditions \(\varvec{U}^{\top }\varvec{U}= \varvec{I}_m\) and \(\varvec{V}^{\top }\varvec{V}= \varvec{I}_n\). The solutions are the minimiser of the Euler-Lagrange equation

$$\begin{aligned} J_{22}(\varvec{U},\varvec{V})= \mathrm {E} \left( |\varvec{U}\hat{\varvec{F}}_i\varvec{V}^{\top } |_{\mathrm {F}}^2 \right) +\langle (\varvec{I}_m-\varvec{U}^\top \varvec{U}), \varvec{\Sigma }\rangle +\langle (\varvec{I}_n-\varvec{V}^\top \varvec{V}), \varvec{\varLambda }\rangle \end{aligned}$$
(16)

For diagonal matrices \(\varvec{\varLambda }\) and \(\varvec{\Sigma }\).

Setting

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^N \varvec{F}_i^{\top } \varvec{F}_i = \varvec{M}, \, \, \frac{1}{N}\sum _{i=1}^N \varvec{F}_i \varvec{F}_i^{\top } = \varvec{N}, \end{aligned}$$
(17)

\(\varvec{U}\) and \(\varvec{V}\) are the solutions of the eigendecomposition problems

$$\begin{aligned} \varvec{M}\varvec{V} =\varvec{V}\varvec{\varLambda }, \, \, \varvec{N}\varvec{U} = \varvec{U}\varvec{\Sigma }, \end{aligned}$$
(18)

where \(\varvec{\Sigma }\in \mathbb {R}^{m \times m}\) and \(\varvec{\varLambda }\in \mathbb {R}^{n \times n}\) are diagonal matrices satisfying the relationships \(\lambda _i = \sigma _i\) for

$$\begin{aligned} \varvec{\Sigma }= & {} \mathrm {diag}(\sigma _1,\sigma _2,\dots ,\sigma _K,0,\dots ,0),\end{aligned}$$
(19)
$$\begin{aligned} \varvec{\varLambda }= & {} \mathrm {diag}(\lambda _1, \lambda _2,\dots ,\lambda _K,0,\dots ,0). \end{aligned}$$
(20)

The equation

$$\begin{aligned} (\varvec{P}_1\varvec{U})^{\top }\varvec{X}(\varvec{P}_2\varvec{V}) = \varvec{Y} \end{aligned}$$
(21)

is rewritten as

$$\begin{aligned} (\varvec{P}_2\varvec{V}\otimes \varvec{P}_1\varvec{U})\mathrm {vec}\varvec{X} =(\varvec{P_2}\otimes \varvec{P_1})(\varvec{V}\otimes \varvec{U})\varvec{X} =\varvec{P}\mathrm {vec}\varvec{X} =\mathrm {vec}\varvec{Y}. \end{aligned}$$
(22)

Using three projection matrices \(\varvec{U}^{(i)}\) for \(i=1,2,3\), we have the tensor orthogonal projection for a third-order tensor as

$$\begin{aligned} \mathcal {Y} = \mathcal {X} \times _1 \varvec{U}^{(1)\top } \times _2 \varvec{U}^{(2)\top } \times _3 \varvec{U}^{(3)\top }. \end{aligned}$$
(23)

For a collection \(\{\mathcal {X}_k\}_{k=1}^m\) of the third-order tensors, the orthogonal-projection-based dimension reduction procedure is achieved by maximising the criterion

$$\begin{aligned} J_3=E_{k}( |\mathcal {X}_k\times _1\varvec{U}^{(1)}\times _2\varvec{U}^{(2)}\times _3\varvec{U}^{(3)} |_{F}^2) \end{aligned}$$
(24)

with respect to the conditions \(\varvec{U}^{(i)^\top }\varvec{U}^{(i)}=\varvec{I}\) for \(i=1,2,3\). The Euler-Lagrange equation of this conditional optimisation problem is

$$\begin{aligned} J_{33}(\varvec{U}^{(1)}, \varvec{U}^{(2)}, \varvec{U}^{(3)} )= & {} E_{k}( |\mathcal {X}_k\times _1\varvec{U}^{(1)}\times _2\varvec{U}^{(2)}\times _3\varvec{U}^{(3)} |_{F}^2)\nonumber \\&+\sum _{i=1}^3\langle (\varvec{I}-\varvec{U}^{(i)^\top }\varvec{U}^{(i)}), \varvec{\varLambda }^{(i)}\rangle . \end{aligned}$$
(25)

This minimisation problem is solved by the iteration procedure [11].

figure a

As an relaxation of the iteration algorithm, we define the system of optimisation problems

$$\begin{aligned} J_j= E(|\varvec{U}^{(j)\top }\mathcal {X}_{i,(j)}\varvec{U}^{(j)}|_{F}^2) +\langle (\varvec{U}^{(j)\top }\varvec{U}^{(j)}-\varvec{I}_j), \varvec{\varLambda }^{(j)}\rangle \end{aligned}$$
(26)

for \(i=1,2,3\), where \(\mathcal {X}_{i,(j)}\) is the ith column vector of the unfolding matrix \(\mathcal {X}_{(j)}\). These optimisation problems derive a system of eigenmatrix problems

$$\begin{aligned} \varvec{M}^{(j)}\varvec{U}^{(j)}=\varvec{U}^{(j)}\varvec{\varLambda }^{(j)},\, \, \, \varvec{M}^{(j)}= \frac{1}{N}\sum _{i=1}^N \mathcal {X}_{i,(j)}\mathcal {X}_{i,(j)}^\top \end{aligned}$$
(27)

for \(j=1,2,3\).

Setting \(\varvec{P}^{(j)}\) to be an orthogonal projection in the linear space \(\mathcal {L}(\{ \varvec{u}_i^{(j)} \}_{i=1}^{I_j})\) spanned by the column vectors od \(\varvec{U}^{(j)}\), data reduction is computed by

$$\begin{aligned} \mathcal {Y}= \mathcal {X} \times _1 \varvec{P}^{(1)} \varvec{U}^{(1)} \times _2 \varvec{P}^{(2)} \varvec{U}^{(2)} \times _3 \varvec{P}^{(3)} \varvec{U}^{(3)}. \end{aligned}$$
(28)

This expression is equivalent to the vector form as

$$\begin{aligned} vec \mathcal {Y}= (\varvec{P}^{(3)}\otimes \varvec{P}^{(2)} \otimes \varvec{P}^{(1)}) (\varvec{U}^{(3)}\otimes \varvec{U}^{(2)} \otimes \varvec{U}^{(1)}) vec\mathcal {X}, \end{aligned}$$
(29)

The eigenvalues of eigenmatrices of Tucker-3 orthogonal decomposition satisfy the following theorem.

Theorem 1

The eigenvalues of \(\varvec{U}=\varvec{U}^{(1)}\otimes \varvec{U}^{(2)}\otimes \varvec{U}^{(3)}\) define a semi-order.

(Proof). For the eigenvalues \(\lambda _i^{(1)}, \lambda _j^{(2)}, \lambda _k^{(3)}\) of the 1-, 2- and 3-modes of tensors, the inequalities \( \lambda _i^{(1)} \lambda _j^{(2)} \lambda _k^{(3)} \ge \lambda _i^{(1)} \lambda _j^{(2)} \lambda _{k+1}^{(3)}\), \(\lambda _i^{(1)} \lambda _j^{(2)} \lambda _k^{(3)} \ge \lambda _i^{(1)} \lambda _{j+1}^{(2)} \lambda _{k}^{(3)}\), \(\lambda _i^{(1)} \lambda _j^{(2)} \lambda _k^{(3)} \ge \lambda _{i+1}^{(1)} \lambda _j^{(2)} \lambda _{k}^{(3)}\) define the semi-orders among eigenvalues as

   \(\square \)

Regarding the selection of the dimension of tensor subspace, Theorem 1 implies the following theorem.

Theorem 2

The dimension of the subspace of the tensor space for data compression is \(\frac{1}{6}n(n+1)(n+2)\) if we select n principal components in each mode of three-way array data.

(Proof). For a positive integer n, the number \(s_n\) of eigenvalues \(\lambda _i^{(1)}\lambda _j^{(2)}\lambda _k^{(3)}\) is

$$\begin{aligned} s_n= & {} \sum _{i+j+k=0,i,j,k\ge 0}^{n-1}(i+j+k) =\sum _{l=1}^{n}\sum _{m=1}^l m \nonumber \\= & {} \frac{1}{2}\left( \frac{1}{6}n(n+1)(2n+1)+\frac{1}{2}n(n+1)\right) =\frac{1}{6}n(n+1)(n+2) \end{aligned}$$
(30)

(Q.E.D)

If \(n=1,2,3,4\), we have \(N=1,4,10, 20\), respectively, for tensors \(mathcal{X}=((x_{ijk}))\) in \(\mathbb {R}^{I\times I\times I}\).

Since the discrete cosine transform (DCT) [4] is asymptotically equivalent to the matrix of K-L transform [13] for data observed from a first-order Markov model [12, 13], the dimension reduction by PCA is performed using DCT as

$$\begin{aligned} f_{ijk}^n=\sum _{i'j'k'=0}^{n-1} g_{i'j'k'}\varphi _{i'i}\varphi _{j'j}\varphi _{k'k}, \, \ g_{ijk}=\sum _{i'j'k'=0}^{N-1} f_{i'j'k'}\varphi _{ii'}\varphi _{jj'}\varphi _{kk'} \end{aligned}$$
(31)

for \(n\le N\), where

$$\begin{aligned} \varvec{\varPhi }_{(N)} =(( \epsilon \cos \frac{(2j+1)i }{2\pi N} ))=((\varphi _{ij})), \, \, \, \epsilon =\left\{ \begin{array}{ll} 1 &{}\text{ if } j=0\\ \frac{1}{\sqrt{2}}&{}\text{ otherwise } \end{array}\right. \end{aligned}$$
(32)

is the DCT-II matrix of the order N. If we apply the fast cosine transform to the computation of the 3D-DCT-II matrix, the computational complexity is \(\mathcal {O}(3 n \log n)\).

Since

$$\begin{aligned} vec( \varvec{u}\circ \varvec{v} \circ \varvec{w}) =\varvec{u} \otimes \varvec{v} \otimes \varvec{w} \end{aligned}$$
(33)

outer products of vectors redescribes the DCT-based transform as

$$\begin{aligned} \mathcal {F}=\sum _{i,j,k=0}^{N-1} a_{ijk} \varvec{\varphi }_i \circ \varvec{\varphi }_j \circ \varvec{\varphi }_k, \, \, a_{ijk}=\langle \mathcal {F}, (\varvec{\varphi }_i \circ \varvec{\varphi }_j \circ \varvec{\varphi }_k)\rangle \end{aligned}$$
(34)

where

$$\begin{aligned} \varvec{\varPhi }^{(N)} =\left( \varvec{\varphi }_0,\varvec{\varphi }_1,\cdots ,\varvec{\varphi }_{N-1}\right) . \end{aligned}$$
(35)

4 Mutual Tensor Subspace Method

The mutual subspace method is stable against geometric perturbation of queries for pattern recognition. Furthermore, the constrained subspace method confirms robust recognition against global perturbations of queries. Therefore, as the combination of two methods, the constrained mutual subspace method is stable and robust both against geometric and global perturbations of queries.

The angle \(\theta =\angle (\mathcal {A}, \mathcal {B})\) between two tensor \(\mathcal {A}\) and \(\mathcal {B}\) is computed as \(\cos \theta = \frac{\langle \mathcal {A}, \mathcal {B}\rangle }{|\mathcal {A}|_F |\mathcal {B}|_F}\). The angle between two spaces defined by \(\{\varvec{U}^{(i)}, \varvec{V}^{(i)} \}_{i=1}^3\) is the extremal of the criterion

$$\begin{aligned} \cos ^2\theta =|\langle \mathcal {A} \times _1 \varvec{U}^{(1)} \times _2\varvec{U}^{(2)} \times _3\varvec{U}^{(3)}, \mathcal {B} \times _1 \varvec{V}^{(1)} \times _2\varvec{V}^{(2)} \times _3\varvec{V}^{(3)}\rangle |^2 \end{aligned}$$
(36)

with respect to the conditions \(|\mathcal {A}|_F=1\) and \(|\mathcal {B}|_F=1\). Therefore, the minimiser of Eq. (36) is the solution of the Euler-Lagrange equation

$$\begin{aligned} T_1=\cos ^2\theta +\lambda (1- |\mathcal {A}|_F^2)+\mu (1-|\mathcal {B}|_F^2) \end{aligned}$$
(37)

Since, we have the system of equations

$$\begin{aligned}&\mathcal {A} \times _1\varvec{U}^{(1)} \times _2\varvec{U}^{(2)} \times _3\varvec{U}^{(3)} =\mu \mathcal {B}\end{aligned}$$
(38)
$$\begin{aligned}&\mathcal {B} \times _1\varvec{V}^{(1)} \times _2\varvec{V}^{(2)} \times _3\varvec{V}^{(3)} =\lambda \mathcal {A}, \end{aligned}$$
(39)

for the tensor singular value problem

$$\begin{aligned}&\mathcal {A} \times _1\varvec{P}_1\times _2\varvec{P}_2 \times _3\varvec{P}_3 =\lambda \mu \mathcal {A}\end{aligned}$$
(40)
$$\begin{aligned}&\mathcal {B} \times _1\varvec{Q}_1\times _2\varvec{Q}_2 \times _3\varvec{Q}_3 =\lambda \mu \mathcal {B}, \end{aligned}$$
(41)

where \(\varvec{P}_i=\varvec{U}^{(1)}\varvec{V}^{(i)}\) and \(\varvec{Q}_i=\varvec{V}^{(i)}\varvec{U}^{(i)}\), the maximiser of Eq. (36) is

$$\begin{aligned} T_{\max } =\lambda _{\max }\mu _{\max } \end{aligned}$$
(42)

with the condition \(\lambda =\mu \). This mathematical property implies the following theorem.

Theorem 3

The canonical angle between a pair of linear subspaces spanned by triples of tensors \(\{\varvec{U}^{(i)}\}_{i=1}^3\) and \(\{\varvec{V}^{(i)}\}_{i=1}^3\) is \(\cos ^{-1}\sigma \) where \(\sigma \) is the maximum eigenvalue of tensor \(\varvec{P}_1\times _2\varvec{P}_2 \times _3\varvec{P}_3\).

To shrink the size of linear problem, we evaluate the difference between two subspaces using the perpendicular length of the normalised tensors.

For a collection of tensors \(\{\mathcal {X}_i \}_{i=1}^{M}\) with the condition and \(\mathrm {E}(\mathcal {X}_i)=0\), we define a collection of categories of volumetric data. We assume that we have \(N_\mathrm {C}\) categories For the kth category, we compute a system of orthogonal projections \( \{ \varvec{U}_{k,j}\}_{j=1,k=1}^{3,N_c}\). to define tensor subspaces, that is, a tensor subspace corresponds to a category of volumetric data.

Setting \(\{\mathcal {G}_{i'}\}_{i'=1}^{M'}\) to be queries, we compute projected tensor

$$\begin{aligned} \mathcal {A}_{i'}= \mathcal {G}_{i'} \times _1\varvec{U}_{k}^{(1)\top }\times _2\varvec{U}_{k}^{(2)\top }\times _3\varvec{U}_{k}^{(3)\top }. \end{aligned}$$
(43)

Furthermore, assuming that queries belong a category from \(N_{\mathrm {C}}\) categories, we have orthogonal projection matrices \(\{ \varvec{V}_j \}_{j=1}^3\), from a tensor subspace of queries. This system of orthogonal projections yields the projected tensor

$$\begin{aligned} \mathcal {B}_{i'}= \mathcal {G}_{i'} \times _1\varvec{V}^{(1)\top }\times _2\varvec{V}^{(2)\top }\times _3\varvec{V}^{(3)\top }. \end{aligned}$$
(44)

Using a tensor subspace \(\mathcal {C}_k\) corresponding to a category and a tensor subspace \(\mathcal {C}_{\mathrm {q}}\) computed from queries, we define the dissimilarity of subspaces \(d(\mathcal {C}_k, \mathcal {C}_{\mathrm {q}})\) by

$$\begin{aligned} \mathrm {E}\left( |\mathcal {A}_{i'} \times _1 \varvec{P}\varvec{U}_{k}^{(1)} \times _2 \varvec{P}\varvec{U}_{k}^{(2)} \times _3 \varvec{P}\varvec{U}_{k}^{(3)}- \mathcal {B}_{i'} \times _1 \varvec{P}\varvec{V}^{(1)} \times _2 \varvec{P}\varvec{V}^{(2)} \times _3 \varvec{P}\varvec{V}^{(3)} |_{\mathrm {F}}^2 \right) , \end{aligned}$$
(45)

for \(\mathcal {A}_{i'}:=\mathcal {A}_{i}/|\mathcal {A}_{i}|_F\) and \(\mathcal {B}_{i'}:=\mathcal {B}_{i}/|\mathcal {B}_{i}|_F\), where a unitary matrix \(\varvec{P}\) selects bases for each mode of tensors. For the dissimilarity of (45), the condition

$$\begin{aligned} \mathrm {arg} \left( \underset{l}{\mathrm {min}} \ d(\mathcal {C}_l,\mathcal {C}_{\mathrm {q}}) \right) = \mathcal {C}_k, \end{aligned}$$
(46)

leads to the property that \( \{ \mathcal {G}_{i'} \}_{i'=1}^{M'} \in \mathcal {C}_k(\delta )\) for \(k, l = 1, 2, \dots , N_{\mathrm {C}}\).

For the construction of the common space spanned by \(\{\varvec{P}_1^{(j)}\otimes \varvec{P}_2^{(j)}\otimes \varvec{P}_2^{(j)}\}_{j=1}^{N_C}\), the orthogonal projection \(\varvec{Q}=\varvec{Q}^{(1)}\otimes \varvec{Q}^{(2)}\otimes \varvec{Q}^{(3)}\) which minimises criterion

$$\begin{aligned} J_{\text{ CMS }1}=\sum _{i=1}^{N_C}|\varvec{Q}\varvec{P}_i|_F^2 \end{aligned}$$
(47)

with the conditions

$$\begin{aligned} \varvec{P}_i=\varvec{P}_i^{(1)}\otimes \varvec{P}_i^{(2)}\otimes \varvec{P}_i^{(3)} \end{aligned}$$
(48)

and

$$\begin{aligned} \varvec{Q}_i^{(j)\top }\varvec{Q}_i^{(j)}=\varvec{I}_{n_i}, \, \, \, j=1,2,3. \end{aligned}$$
(49)

Therefore, we minimise the Euler-Lagrange equation

$$\begin{aligned} J_{\text{ CMS }2}=\sum _{i=1}^{N_C}|\varvec{Q}\varvec{P}_i|_F^2 +\sum _{j=1}^3\langle \varvec{\varLambda }^{(j)}(\varvec{I}_{m_j}-\varvec{Q}^{(j)^\top }\varvec{Q}^{(j)})\rangle \end{aligned}$$
(50)

Same with the minimisation for the tensor PCA, we solve the system of optimisation problems.

In each mode, the orthogonal projection \(\varvec{Q}^{(j)}\), which maximises the criterion

$$\begin{aligned} J_{\text{ CMSM }}=\sum _{i=1}^{N_j}|\varvec{Q}^{(j)}\varvec{U}_i^{(j)}|_F^2, \end{aligned}$$
(51)

for the collection of orthogonal projections \(\{\varvec{U}_i^{(j)}\}_{i=1}^{N_j}\), approximates the common space of spaces spanned by \(\{\varvec{U}_i^{(j)}\}_{i=1}^{N_j}\).

The projection \(\varvec{Q}^{(j)}\) is the solution of the variational problem

$$\begin{aligned} J_{\text{ CMSM-EL }}=\sum _{i=1}^N|\varvec{Q}\varvec{U}_i^{(j)}|_F^2, +\langle (\varvec{I}_{m_{j}}-\varvec{Q}^{(j)\top }\varvec{Q}^{(j)}), \varvec{\Sigma }\rangle . \end{aligned}$$
(52)

Therefore, the \(\varvec{Q}^{(j)}\) is the eigenmatrix of

$$\begin{aligned} \sum _{i=1}^{N_j}\varvec{U}_{i}^{(j)}\varvec{Q}^{(j)}=\varvec{Q}^{(j)}\varvec{\Sigma }^{(j)}. \end{aligned}$$
(53)

Let \(\{\mathcal {X}_i^j\}_{i=1}^{j(n)}\) be elements of category \(\mathcal {C}_j\) and For \(\varPi _i\in 2^{N}\), where \(N=\{0,1,2,\cdots ,N-1\}\), we define the minimisation criterion

$$\begin{aligned} J(\varPi _k,\varPi _m,\varPi _n) =E_i|\mathcal {X}_i^{j} -\mathcal {X}_i^{j} \times _1\varvec{P}_{\varPi _k}\varPhi _{(N)} \times _2\varvec{P}_{\varPi _m}\varPhi _{(N)} \times _3\varvec{P}_{\varPi _n}\varPhi _{(N)} |_F^2 \end{aligned}$$
(54)

for the selection of the ranges of orthogonal projections \(\varvec{P}_{\varPi _k}\), \(\varvec{P}_{\varPi _m}\) and \(\varvec{P}_{\varPi _n}\). The minimisation of Eq. (54) defines the base of the category of \(\mathcal {C}_i\) as

$$\begin{aligned} \mathcal {L}(\mathcal {C}_i) =\{\varvec{\varphi }_p\}_{p\in \varPi _k} \otimes \{\varvec{\varphi }_q\}_{q\in \varPi _m} \otimes \{\varvec{\varphi }_r\}_{r\in \varPi _n} \end{aligned}$$
(55)

using row vectors of DCT-II matrix.

Setting

$$\begin{aligned} \bar{F}_{pqr}^{j}=E_i(F_{ipqr}^j), \end{aligned}$$
(56)

where \(F_{pqr}^j\) is the DCT coefficient of \(f_{ipqr}^j\), the ranges are selected

$$\begin{aligned} \varPi _k\times \varPi _m\times \varPi _n =\{(p,q,r)\, |\, \bar{F}_{ pqr}^j\ge T \} \end{aligned}$$
(57)

for an appropriate threshold T. Since the low frequency parts are common for almost all data, the canonicalisation of patterns is achieved by separating data to low and high frequency parts. The separation of the constrain subspace \(\cap _{i=1}^n\mathcal {C}_i\) for \(\{\mathcal {C}_i\}_{i=1}^n\) is achieved by estimating common low frequency parts to categories.

5 Conclusions

We have reviewed several fundamental and well-established methodologies in pattern recognition in vector space to unify data procession for higher-dimensional space using tensor expressions and multi-way principal component analysis.

In traditional method in medical image analysis outline shapes of objects such as organs and statistical properties of interior textures are independently extracted using separate methods. However, the tensor PCA for volumetric data allows us to simultaneously extract both outline shapes of volumetric objects and statistical properties of interior textures of volumetric data from data projected onto a low-dimensional linear subspace spanned by tensors. We also showed frequency-based pattern recognition methodologies for tensor subspace method.

This research was supported by the “Multidisciplinary Computational Anatomy and Its Application to Highly Intelligent Diagnosis and Therapy” project funded by a Grant-in-Aid for Scientific Research on Innovative Areas from MEXT, Japan, and by Grants-in-Aid for Scientific Research funded by the Japan Society for the Promotion of Science.