Skip to main content
Log in

FF-SKPCCA: Kernel probabilistic canonical correlation analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Several information fusion methods are developed for increasing the recognition accuracy in multimodal systems. Canonical correlation analysis (CCA), cross-modal factor analysis (CFA) and their kernel versions are known as successful fusion techniques but they cannot digest the data variability. Probabilistic CCA (PCCA) is suggested as a linear fusion method to capture input variability. A new kernel PCCA (KPCCA) is proposed here to capture both the nonlinear correlations of sources and input variability. The functionality of KPCCA decreases when the number of samples, which determines the size of kernel matrix increases. In the conventional fusion methods the latent variables of different modalities are concatenated; consequently, a large-scale covariance matrix with just limited number of samples must be estimated To overcome this drawback, a sparse KPCCA (SKPCCA) is introduced which scarifies the covariance matrix elements at the cost of decreasing its rank. In the final stage of the gradual evolution of KPCCA, a new feature fusion manner is proposed for SKPCCA (FF-SKPCCA) as a second stage fusion. This proposed method unifies the latent variables of two modalities into a feature vector with an acceptable size. Audio-visual databases like M2VTS (for speech recognition) eNTERFACE and RML (for emotion recognition) are applied to assess FF-SKPCCA compared to state-of-the-art fusion methods. The comparative results indicate the superiority of the proposed method in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Shivappa S, Trivedi M, Rao B (2010) Audiovisual information fusion in human computer interfaces and intelligent environments: A survey. Proc IEEE 98(10):1692–1715

    Article  Google Scholar 

  2. Zeng Z, Pantic M, Roisman G I, Huang T S (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell:39–58

  3. Ayadi M E, Kamel M, Karray F (2011) Survey on speech emotion recognition: features, classication schemes and databases. Pattern Recogn 44(3):572–587

    Article  MATH  Google Scholar 

  4. Pradeep K A, M.Anwar H, Abdulmotaleb E S, Mohan S K (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Systems:345–379

  5. Galatas G, Potamianos G, Makedon F (2012) Audio-visual speech recognition incorporating facial depth information captured by the Kinect. In: Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp 2714–2717

  6. Gupta R, Malandrakis N, Xiao B, Guha T, Van Segbroeck M, Black M, Potamianos A, Narayanan S (2014) Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions. In: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, pp. 33–40. Orlando Florida, USA: ACM

  7. Taouche C, Batouche M C, Berkane M, Taleb-Ahmed A (2014) Multimodal biometric systems. In: International Conference on Multimedia Computing and Systems (ICMCS), pp 301–308

  8. Xu C, Hero A O (2012) Savarese, s multimodal video indexing and retrieval using directed information. IEEE Trans Multimedia:3–16

  9. Ercan A O, Gamal A E, Guibas L J (2013) Object tracking in the presence of occlusions using multiple cameras: a sensor network approach. ACM trans Sen Netw:16:1–16:36

  10. Wagner J, Andre E, Lingenfelser F, Jonghwa K (2011) Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans Affect Comput:206–218

  11. Wang Y, Guan Y (2008) Recognizing human emotional state from audiovisual signals. IEEE Trans Multimedia 10(5):936–946

    Article  Google Scholar 

  12. Wang Y, Guan Y, Venetsanopoulos A N (2012) Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans Multimedia:597– 607

  13. Li B, Qi L, Gao L (2014) Multimodal emotion recognition based on kernel canonical correlation analysis

  14. Hotelling H (1936) Relations between two sets of variates. Biometrika:321–377

  15. Li D, Dimitrova N, Li N, Sethi I K (2003) Multimedia content processing through cross-modal association. In: Proceedings ACM International Conference, pp 604–611

  16. Bredin H, Chollet G (2007) Audio-visual speech synchrony measure for talking-face identity verification. In: Acoustics, Speech and Signal Processing, ICASSP 2007, pp II–233

  17. Abo-Zahhad M, Ahmed S M, Abbas S N (2014) PCG biometric identification system based on feature level fusion using canonical correlation analysis. In: 2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE), pp 1–6

  18. Metallinou A, Lee S, Narayanan S (2010) Decision level combination of multiple modalities for recognition and analysis of emotional expression. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp 2462–2465

  19. Li D, Taskiran C, Dimitrova N, Wang W, Li M, Sethi I K (2005) Cross-modal analysis of audio-visual programs for speaker detection. In: Proceedings IEEE Workshop Multimedia Signal Process., Shanghai, China, pp 1–4

  20. Kumar K, Potamianos G, Navratil J, Marcheret E, Libal V (2011) Audio-visual speech synchrony detection by a family of bimodal linear prediction models. Multibiometrics for Human Identification:31–50

  21. Lai P L, Fyfe C (2000) Kernel and nonlinear canonical correlation analysi73. Int J Neural Syst:365–377

  22. Shi Y, Ji H (2014) Kernel canonical correlation analysis for specific radar emitter identification. Electron Lett:1318–1320

  23. Chetty G, Göcke R, Wagner M (2009) Audio-Visual mutual dependency models for biometric liveness checks. AVSP 2009, Norwich, pp. 32–37

  24. Bach F, Jordan M I (2005) A probabilistic interpretation of canonical correlation analysis. Technical Report 688 Department of Statistics, University of California, Berkeley

  25. Archambeau C, Bach F R (2009) Sparse probabilistic projections. Adv Neural Inf Proces Syst 21:73–80

    Google Scholar 

  26. Klami A, Virtanen S, Kaski S (2010) Bayesian exponential family projections for coupled data sources. In: 26th Conference on Uncertainty in Artificial Intelligence (UAI), pp 286–293

  27. Koskinen M, Viinikanoja J, Kurimo M, Klami A, Kaski S, Hari R (2013) Identifying Fragments of natural speech from the listener’s MEG signals. Hum Brain Mapp 34(6):1477–1489

    Article  Google Scholar 

  28. Rudovic O, Petridis S, Pantic M (2013) Bimodal log-linear regression for fusion of audio and visual features. 21st ACM Int Conf Multimedia:789–792

  29. Wu C H, Lin J C, Wei W L (2014) Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Transactions on Signal and Information Processing, e12

  30. Hardoon D, Szedmak S, Shawe-taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput:2639–2664

  31. Blaschko M, Lampert C H (2008) Correlational spectral clustering. IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008:1–8

  32. Golub G H, Hansen P C, O’Leary D P (1999) Tikhonov Regularization and total least squaresc. SIAM J Matrix Anal Appl 21(1):185–194

    Article  MathSciNet  MATH  Google Scholar 

  33. Rohani R, Sobhanmanesh F, Alizadeh S, Boostani R (2011) Lip processing and modeling based on spatial fuzzy clustering in color images. Int J Fuzzy Syst 13(2):65–73

    Google Scholar 

  34. Hermansky H, Hanson B A, Wakita H (1985) Perceptually-based linear predictive analysis of speech. In: Proceedings IEEE ICASSP, vol 2, pp 509–512

  35. Bartlett A, Evans V, Frenkel I, Hobson C, Sumera E (2004) Digital Hearing Aids [Online]. Available www.clear.rice.edu/elec301/Projects01/dig_hear_aid

  36. Wu C H, Lin J C, Wei W L (2013) Two-level hierarchical alignment for semi-coupled HMM-based audiovisual emotion recognition with temporal course. IEEE Trans Multimedia:1880– 1895

  37. Jiang D, Cui Y, Zhang X, Fan P, Gonzalez I, Sahli H (2011) Audiovisual emotion recognition based on triple-stream dynamic Bayesian network models. Affective Computing and Intelligent Interaction:609–618

  38. Sing V, Shokeen V, Singh B (2013) Face detection by haar cascade classifier with simple and complex backgrounds images using opencv implementation. International Journal of Advanced Technology in Engineering and Science:33–38

  39. Lyons M J, Budynek J, Plante A, Akamatsu S (2000) Classifying facial attributes using a 2-D Gabor wavelet representation and discriminant analysis. 4th Int Conf Automatic Face and Gesture Recognition:202–207

  40. Manjunath B S, Ma W Y (1996) Texture features for browsing and Texture features for browsing and IEEE Trans. Pattern Anal Machine Intell, pp 837–842

  41. Pigeon S, Vandendorpe L (1997) The M2VTS multimodal face database (release 1.00). In Audio-and Video-Based Biometric Person Authentication. Springer, Berlin Heidelberg, pp 403– 409

    Book  Google Scholar 

  42. Martin O, Kotsia I (2006) Macq, B Pitas, I The eNTERFACE05 audiovisual emotion database. In: Proc. ICDEW, p 8

  43. Ekman P, Friesen W V, Press C P (1975) Pictures of facial affect. consulting psychologists press

    Google Scholar 

  44. Ekman P (1993) Facial expression and emotion. Am Psychol:384

  45. Sun Q S, Zeng S G, Liu Y, Heng P A, Xia D S (2005) A new method of feature fusion and its application in image recognition. Pattern Recogn:2437–2448

  46. Tipping M E, Bishop C M (1999) Probabilistic principal component analysis. Journal of the Royal Statistical Society B 61(3):611–622

    Article  MathSciNet  MATH  Google Scholar 

  47. Li Y O, Eichele T, Calhoun V D, Adali T (2012) Group study of simulated driving fMRI data by multiset canonical correlation analysis. Journal of signal processing systems:31–48

  48. Lin J C, Wu C H, Wei W L (2012) Error weighted semi-coupled hidden Markov model for audio-visual emotion recognition. IEEE Trans Multimedia:142–156

  49. Mello S, Kory J (2012) Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies. In: Proceedings of the 14th ACM international conference on Multimodal interaction, pp 31–38

  50. Morrison D, Wang R, De Silva L C (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Commun:98–112

  51. Muramatsu D, Iwama H, Makihara Y, Yagi Y (2013) Multi-view multi-modal person authentication from a single walking image sequence. 2013 International Conference on Biometrics (ICB):1–8

Download references

Acknowledgments

The authors of this paper acknowledge Dr. Homayounpour, professor of AmirKabir University, to let us using their M2VTS dataset in order to develop the experimental results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Boostani.

Appendix A

Appendix A

1.1 A-1. CCA Method

A proposed statistical method named Canonical Correlation Analysis (CCA) is proposed by [14], in order to find a shared structure between two sources of data. CCA is closely related to the mutual information method [45] but it has some differences in terms of objective function. A pair of feature vectors with zero is considered in the method as follows:

$$\begin{array}{@{}rcl@{}} \left( \mathbf{x},\mathbf{y} \right)=\left\{ \left( x_{1},y_{1} \right),\left( x_{2},y_{2} \right),\mathellipsis ,\left( x_{n},y_{n} \right) \right\} \end{array} $$
(15)

where x i and y i are the observation data (original features) of the two modalities with dimensions of p and q, respectively. CCA seeks to develop two transformation matrices of W x and W y with dimensions of p×d and q×d respectively, where d≤min(p q). The original features of these modalities are projected to the correlation subspace by W x and W y in a manner that the correlation between \(\hat {x}=\mathbf {x}\mathbf {W}_{x}\) and \(\hat {y}=\mathbf {y}\mathbf {W}_{y}\) is maximized. Maximizing the correlation between the projected feature vectors of \(\hat {x}\) and \(\hat {y}\) is the same as maximizing ρ(correlation coefficient) between them as follows:

$$\begin{array}{@{}rcl@{}} \mathrm{\rho }\!&=&\!\max\limits_{\mathbf{W}_{x},\mathbf{W}_{y}}\frac{E\left[ \hat{x}^{\mathbf{T}}\hat{y} \right]}{\sqrt {E\left[ \hat{x}^{\mathbf{2}} \right]E\left[ \hat{y}^{\mathbf{2}} \right]} } \end{array} $$
$$\begin{array}{@{}rcl@{}} \!&=&\!\max\limits_{\mathbf{W}_{x},\mathbf{W}_{y}} \frac{E\left[ \mathbf{W}_{\mathbf{x}}^{\mathbf{T}}\mathbf{x}^{\mathbf{T}}\mathbf{y}\mathbf{W}_{\mathbf{y}} \right]}{\sqrt {E\left[ \mathbf{W}_{\mathbf{x}}^{\mathbf{T}}\mathbf{x}^{\mathbf{T}}\mathbf{x}\mathbf{W}_{\mathbf{x}} \right]E\left[ \mathbf{W}_{\mathbf{y}}^{\mathbf{T}}\mathbf{y}^{\mathbf{T}}\mathbf{y}\mathbf{W}_{\mathbf{y}} \right]}} \end{array} $$
$$\begin{array}{@{}rcl@{}} &=&\max\limits_{\mathbf{W}_{x},\mathbf{W}_{y}}\frac{\mathbf{W}_{x}^{T}C_{xy}\mathbf{W}_{y}}{\sqrt {\mathbf{W}_{x}^{T}C_{xx}\mathbf{W}_{x}\mathbf{W}_{y}^{T}C_{yy}\mathbf{W}_{y}} } \end{array} $$
(16)

where C x y is the cross-covariance matrix of (x , y) and C x x , C y y are the covariance matrices of x and y respectively.

The above equation can be solved as an Eigen-value problem like:

$$\begin{array}{@{}rcl@{}} C_{xx}^{\mathrm{-1}}C_{xy}C_{yy}^{\mathrm{-1}}C_{yx}\mathbf{W}_{x}\mathrm{=}\rho^{\mathrm{2}}\mathbf{W}_{x} \end{array} $$
(17)
$$\begin{array}{@{}rcl@{}} C_{yy}^{-1}C_{yx}C_{xx}^{-1}C_{xy}\mathbf{W}_{y}= \rho^{2}\mathbf{W}_{y} \end{array} $$
(18)

1.2 A-2. CFA Method

The cross-modal factor analysis (CFA) method is proposed by [15], where, the features from different modalities are treated as two subsets where the same patterns between these two subsets are discovered. In this method, it is assumed that a pair of normalized feature vectors x and y with zero means are linearly projected into a joint space applying W x and W y transforms, in a manner that the following criterion can be minimizing:

$$\begin{array}{@{}rcl@{}} {\underset{{W}_{x},{W}_{y}}{\min}}\left\| \mathbf{x}\mathbf{W}_{x}-\mathbf{y}\mathbf{W}_{y} \right\|_{F}^{2} \end{array} $$
(19)

where, \(\mathbf {W}_{x}^{T}\mathbf {W}_{x}\) and \(\mathbf {W}_{y}^{T}\mathbf {W}_{y}\) are unit matrices and F is the Frobenius norm and is calculated by \(\left \| \mathbf {W} \right \|_{F}=\sqrt {\sum \nolimits _{ij} w_{ij}^{2}}\).

By solving the above equation for optimal transformation matrices W x and W y and decomposing cross-covariance matrix C x y through Singular Value Decomposition (SVD) method, the following equation is obtained:

$$\begin{array}{@{}rcl@{}} C_{xy}=S_{xy}{\Lambda}_{xy}D_{xy} \end{array} $$
(20)

Consequently,

$$\begin{array}{@{}rcl@{}} \mathbf{W}_{x}= S_{xy}\& \mathbf{W}_{y}=D_{xy} \end{array} $$
(21)

1.3 A-3. Probabilistic CCA

To deal with the uncertainty problem in the CCA performance, the probabilistic CCA (PCCA) is introduced by [24] through the projected latent variables provide maximum variance in the joint correlation space. To do this, they defined a Gaussian model for every single source of data as follow:

$$\begin{array}{@{}rcl@{}} z&\sim& \mathcal{N}\left( {\mathrm{0,}I}_{d} \right)\mathrm{ 1\le }d\mathrm{\le }\min \left( p\mathrm{,}q \right) \end{array} $$
$$\begin{array}{@{}rcl@{}} \mathbf{x\vert }z&=&\mathcal{N}\left( {z\mathbf{W}}_{x}^{T} +\mu_{x},\varphi_{x} \right) \end{array} $$
$$\begin{array}{@{}rcl@{}} \mathbf{y\vert }z&=&\mathcal{N}({z\mathbf{W}}_{y}^{T} +\mu_{y}\varphi_{y}) \end{array} $$
(22)

where, z is the latent variable, shared between the two modalities x and y and μ and φ are the mean and covariance of each data, respectively. Here, by maximizing the probability functions, the φ x and φ y should be minimized. By considering \(\mathbf {O=}\left [ {\begin {array}{*{20}c} \mathbf {x}\\ \mathbf {y}\\ \end {array} } \right ]\) , W = [ W x W y ] , \(\mu \mathbf {=}\left [ {\begin {array}{*{20}c} \mu _{x}\\ \mu _{y}\\ \end {array} } \right ]\textit {and}\varphi \mathbf {=}\left [ {\begin {array}{*{20}c} \varphi _{x} & \mathbf {0}\\ \mathbf {0} & \varphi _{y}\\ \end {array} } \right ] \)both probabilistic functions are merged into the following joint probabilistic function as:

$$\begin{array}{@{}rcl@{}} \mathbf{p}\left( \mathbf{O,z} \right)=\mathbf{p}(\mathbf{O}\vert \mathbf{z})\mathbf{p}(z)= N\left( z\mathbf{W}^{T} +\mu ,\varphi \right)\mathrm{N}\left( {\mathrm{0,}I}_{d} \right) \end{array} $$
(23)

They indicate that the posterior expectation of z given x and y are:

$$\begin{array}{@{}rcl@{}} {\varphi }_{x}&=&C_{xx}-\mathbf{W}_{x}\mathbf{W}_{x}^{T} \end{array} $$
$$\begin{array}{@{}rcl@{}} {\varphi }_{y}&=&C_{yy}-\mathbf{W}_{y}\mathbf{W}_{y}^{T} \end{array} $$
$$\begin{array}{@{}rcl@{}} E\left( z \vert \mathbf{x}\right) &=& \mathbf{x}\mathbf{W}_{x}M_{x}^{-1}, M_{x}=I+ \mathbf{W}_{x}^{T}\varphi_{x}^{-1}\mathbf{W}_{x} \end{array} $$
$$\begin{array}{@{}rcl@{}} E\left( z \vert \mathbf{y}\right) &=& \mathbf{y}\mathbf{W}_{y}M_{y}^{-1} , M_{y}=I+ \mathbf{W}_{y}^{T}\varphi_{y}^{-1}\mathbf{W}_{y} \end{array} $$
(24)

where, W x and W y are the first d canonical directions of x and y. The parameters C x x and C y y are the covariances of x and y, respectively.

However, this new method named the unified latent variable can identify a latent variable, given x and y as:

$$\begin{array}{@{}rcl@{}} E\left( z \vert {\mathbf{x,y}}\right) \!\!&=&\!\!\left[ {\begin{array}{*{20}c} \mathbf{x}\mathbf{W}_{x} & \mathbf{y}\mathbf{W}_{y}\\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {(I\,-\,{P_{d}^{2}})}^{-1} & {(I\,-\,{P_{d}^{2}})}^{-1}P_{d}\\ {(I\,-\,{P_{d}^{2}})}^{-1}P_{d} & {(I\,-\,{P_{d}^{2}})}^{-1}\\ \end{array} } \right]\\ &&\times\left[ {\begin{array}{*{20}c} M_{x}^{-1}\\M_{y}^{-1}\\ \end{array} } \right] \end{array} $$
(25)

where, \(P_{d}=M_{x}^{-1}\ast {(M_{y}^{-1})}^{T}\).

Another solution for (22) is based on the expectation maximization (EM) algorithm. Similarly, the Probabilistic Principle Component Analysis (PPCA) is proposed by [46] which iterates through the following expectation maximization (EM).

  • Expectation-step: finds the sufficient statistics of the latent variables given the current estimated parameter:

    $$\begin{array}{@{}rcl@{}} M_{t}&=&I+ \mathbf{W}_{t}^{T}\varphi_{t}^{-1}\mathbf{W}_{t} \end{array} $$
    $$\begin{array}{@{}rcl@{}} E(z_{t})&=& M_{t}^{-1}\mathbf{W}_{t}\varphi_{t}^{-1}\mathbf{O} \end{array} $$
    $$\begin{array}{@{}rcl@{}} E(z_{t}{z_{t}^{T}})&=&M_{t}^{-1}+E(z_{t}){E(z_{t})}^{T} \end{array} $$
    (26)

    where subscript t indicate the iteration number.

  • Maximization-step: updates the estimated parameter to maximize the likelihood function:

$$\begin{array}{@{}rcl@{}} \mathbf{W}_{\mathbf{t+1}}&=&\left[ \mathbf{O}{E(z_{t})}^{T} \right]\left[ E(z_{t}{z_{t}^{T}}) \right]^{-1} \end{array} $$
$$\begin{array}{@{}rcl@{}} \mathrm{\varphi }_{t+1}&=&\mathbf{O}\mathbf{O}^{\mathbf{T}}-2\mathbf{O}{E(z_{t})}^{T}\mathbf{W}_{t+1}^{T}\\ &&+ ~\mathbf{trace}(E(z_{t}{z_{t}^{T}}){\mathbf{W}_{t+1}^{T}\mathbf{W}}_{\mathbf{t+1}}\mathbf{)} \end{array} $$
(27)

By inserting (26) into (27), this method provides a general solution for PCCA scheme which yields the following updated equation:

$$\begin{array}{@{}rcl@{}} \mathbf{W}_{\mathbf{t}\mathrm{\mathbf{+1}}}\!\!&=&\!\!C\mathrm{\varphi }_{t}^{\mathrm{-1}}\mathbf{W}_{\mathbf{t}}M_{t}^{\mathrm{-1}}\left( M_{t}^{\mathrm{-1}}\mathrm{\!+}M_{t}^{\mathrm{-1}}\mathbf{W}_{t}^{T}\mathrm{\varphi }_{t}^{\mathrm{-1}}C\mathrm{\varphi }_{t}^{\mathrm{-1}}\mathbf{W}_{\mathbf{t}}M_{t}^{\mathrm{-1}} \right)^{\mathrm{-1}} \end{array} $$
$$\begin{array}{@{}rcl@{}} \mathrm{\varphi }_{t+1}\!\!&=&\!\! \left( {\begin{array}{*{20}c} \left( C\,-\,C\mathrm{\varphi }_{t}^{-1}\mathbf{W}_{\mathbf{t}}M_{t}^{-1}\mathbf{W}_{t+1}^{T} \right)_{11} & 0\\ 0 & \left( C\,-\,C\mathrm{\varphi }_{t}^{-1}\mathbf{W}_{\mathbf{t}}M_{t}^{-1}\mathbf{W}_{t+1}^{T} \right)_{22}\\ \end{array} } \right)\\ \end{array} $$
(28)

where \(M_{t}\mathrm {=}I\mathrm {+ }\mathbf {W}_{t}^{T}\mathrm {\varphi }_{t}^{\mathrm {-1}}\mathbf {W}_{t}\) .

1.4 A-4. KCCA Method

Kernel Canonical Correlation Analysis (KCCA) [21] is the kernelized version of CCA method that projects data into higher dimensional feature spaces and applies CCA to the data in the kernel space in order to find a nonlinear correlation between the two modalities. Let us consider ϕ and ψ as two mapping functions that map the input data into a space of higher dimension:

$$\begin{array}{@{}rcl@{}} \left( \mathrm{\phi \mathbf{(x)}},\psi (\mathrm{\mathbf{y)}} \right)&=&\left\{ \left( \mathrm{\phi (}\mathrm{x}_{1}),\psi (\mathrm{y}_{1}) \right),\left( \mathrm{\phi (}\mathrm{x}_{2}),\psi (\mathrm{y}_{2}) \right),\right.\\ &&\left.\mathellipsis ,\left( \mathrm{\phi (}\mathrm{x}_{n}),\psi (\mathrm{y}_{n}) \right) \right\} \end{array} $$
(29)

The KCCA seeks to develop the two matrices α and β that are applied in the following equations:

$$\begin{array}{@{}rcl@{}} \mathbf{W}_{x}={\mathrm{\phi }(\mathrm{\mathbf{x}})}^{T}\mathbf{\alpha } \end{array} $$
(30)
$$\begin{array}{@{}rcl@{}} \mathbf{W}_{y}={\mathrm{\psi }(\mathrm{\mathbf{y}})}^{T}\mathbf{\beta } \end{array} $$
(31)

This means that W x and W y are the projections of ϕ(x) and ψ(y) onto α and β, respectively. By inserting ϕ and ψ, into (16), the correlation function is applied as:

$$\begin{array}{@{}rcl@{}} \mathrm{\rho } \!\!&=&\!\! {\underset{\boldsymbol{\alpha,\beta}}{\max}}\frac{E\left[ \mathbf{\alpha }^{T}\mathrm{\phi }\left( \mathrm{x} \right){\mathrm{.\phi }\left( \mathrm{x} \right)}^{T}\mathrm{\psi }\left( \mathrm{y} \right){\mathrm{.\psi }\left( \mathrm{y} \right)}^{T}\mathbf{\beta } \right]}{\sqrt {E\left[ \mathbf{\alpha }^{T}\mathrm{\phi }\left( \mathrm{x} \right){\mathrm{.\phi }\left( \mathrm{x} \right)}^{T}\mathrm{\phi }\left( \mathrm{x} \right){\mathrm{.\phi }\left( \mathrm{x} \right)}^{T}\mathbf{\alpha } \right]E\left[ \mathbf{\beta }^{T}\mathrm{\psi }\left( \mathrm{y} \right){\mathrm{.\psi }\left( \mathrm{y} \right)}^{T}\mathrm{\psi }\left( \mathrm{y} \right){\mathrm{.\psi }\left( \mathrm{y} \right)}^{T}\mathbf{\beta } \right]} } \end{array} $$
$$\begin{array}{@{}rcl@{}} \!\!&=&\!\!{\underset{\alpha,\beta}{\max}}\frac{\mathbf{\alpha }^{T}\mathbf{K}_{\mathrm{x}}\mathbf{K}_{\mathrm{y}}\mathbf{\beta }}{\sqrt {\left[ \mathbf{\alpha }^{T}\mathbf{K}_{\mathrm{x}}\mathbf{K}_{\mathrm{x}}\mathbf{\alpha } \right]\left[ \mathbf{\beta }^{T}\mathbf{K}_{\mathrm{y}}\mathbf{K}_{\mathrm{y}}\mathbf{\beta } \right]} } \end{array} $$
(32)

where, K x = E[ϕ(x).ϕ(x)T] and K y = E[ψ(y).ψ(y)T].

This optimization problem can be solved through the generalized Eigen-value decomposition method. When kernel functions are non-invertible, conventional regularization technique can be applied, therefore, the following Equation is yield [30]:

$$\begin{array}{@{}rcl@{}} {\underset{\alpha,\beta}{\max}}\frac{\boldsymbol{\alpha }^{T}\mathbf{K}_{\mathrm{x}}\mathbf{K}_{\mathrm{y}}\boldsymbol{\beta }}{\sqrt {\left[ \boldsymbol{\alpha }^{T}\mathrm{(}\mathbf{K}_{\mathrm{x}}^{\mathrm{2}}\mathrm{ +\tau }\mathbf{K}_{\mathrm{x}}\mathrm{)}\boldsymbol{\alpha } \right]\left[ \boldsymbol{\beta }^{T}\mathrm{(}\mathbf{K}_{\mathrm{y}}^{\mathrm{2}}\mathrm{ +\tau }\mathbf{K}_{\mathrm{y}}\mathrm{)}\boldsymbol{\beta } \right]} } \end{array} $$
(33)

where, 0≤τ≤1.

1.5 A-5. KCFA Method

Kernel CFA [12] approach can provide correct information association provided that the two modalities are not linearly related. To illustrate this fact X=(ϕ(x 1), ϕ(x 2),…,ϕ(x n ))T, and Y =(ψ(y 1),ψ(y 2),…,ψ(y n ))T represent the two matrices with each row representing a sample in the nonlinearly mapped feature space; next, the X TY = S x y ##Λ## x y D x y should be solved through kernel method. The kernel matrices of the two subsets of features can be computed as K x = X X T and K y = Y Y T. By performing eigenvalue decomposition on the product of the kernel matrices K x K y , it becomes obvious that

$$\begin{array}{@{}rcl@{}} (K_{x}K_{y}) \boldsymbol{\beta }_{i}&=& \mathrm{\lambda }_{i}\boldsymbol{\beta }_{i} \end{array} $$
$$\begin{array}{@{}rcl@{}} \left( X X^{T}Y Y^{T} \right)\boldsymbol{\beta }_{i}&=&\mathrm{\lambda }_{i}\boldsymbol{\beta }_{i} \end{array} $$
$$\begin{array}{@{}rcl@{}} \left( Y^{T}X X^{T}Y \right)Y^{T}\boldsymbol{\beta }_{i}&=&\mathrm{\lambda }_{i}Y^{T}\boldsymbol{\beta }_{i} \end{array} $$
(34)

Since the right singular vectors of the SVD of X T Y,D x y correspond with the eigenvectors of Y T X X T Y= (X T Y)T(X T Y)Y T β i corresponds to the columns of D x y , which can be further normalized into a unit norm as:

$$\begin{array}{@{}rcl@{}} v_{i}=\frac{Y^{T}\boldsymbol{\beta }_{i}}{\left\| Y^{T}\boldsymbol{\beta }_{i} \right\|}= \frac{Y^{T}\boldsymbol{\beta }_{i}}{\sqrt {\boldsymbol{\beta }_{i}^{T}YY^{T}\boldsymbol{\beta }_{i}} }= \frac{Y^{T}\boldsymbol{\beta }_{i}}{\sqrt {\boldsymbol{\beta }_{i}^{T}K_{y}\boldsymbol{\beta }_{i}} } \end{array} $$
(35)

For a feature vector y with nonlinear mapping ψ(y ), the projection can be computed as

$$\begin{array}{@{}rcl@{}} {v_{i}^{T}}\mathrm{\psi }\left( y^{\prime} \right)&=&\left( \frac{Y^{T}\boldsymbol{\beta }_{i}}{\sqrt {\boldsymbol{\beta }_{i}^{T}K_{y}\boldsymbol{\beta }_{i}} } \right)^{T}\mathrm{\psi }\left( y^{\prime} \right) \end{array} $$
$$\begin{array}{@{}rcl@{}} &=& \frac{1}{\sqrt {\boldsymbol{\beta }_{i}^{T}K_{y}\boldsymbol{\beta }_{i}} }\boldsymbol{\beta }_{i}^{T}\left[ {\begin{array}{l} K\left( y^{\prime},y_{1} \right) \\ K\left( y^{\prime},y_{2} \right) \\ \mathellipsis \\ K(y^{\prime},y_{n}) \\ \end{array}} \right] \end{array} $$
(36)

Similarly, it can be illustrated that

$$\begin{array}{@{}rcl@{}} (K_{y}K_{x})\mathbf{\alpha }_{j}=\left( X^{T}Y Y^{T}X \right)X^{T}\mathbf{\alpha }_{j}= \mathrm{\lambda }_{j}X^{T}\mathbf{\alpha }_{j} \end{array} $$
(37)

The left singular vectors S x y are the eigenvectors of X T Y Y T X=(X T Y)(X T Y)T, hence X T α j corresponds to the S x y columns, which can be normalized into a unit norm as:

$$\begin{array}{@{}rcl@{}} \boldsymbol{\mu }_{j}=\frac{X^{T}\mathbf{\alpha }_{j}}{\left\| X^{T}\mathbf{\alpha }_{j} \right\|}= \frac{X^{T}\mathbf{\alpha }_{j}}{\sqrt {\mathbf{\alpha }_{i}^{T}XX^{T}\mathbf{\alpha }_{j}} }= \frac{X^{T}\mathbf{\alpha }_{j}}{\sqrt {\mathbf{\alpha }_{i}^{T}K_{x}\mathbf{\alpha }_{j}} } \end{array} $$
(38)

By allowing x to be a feature vector in the original domain where the nonlinear mapping is ϕ(x ), the feature vector in the cross-modal associated domain can be computed as:

$$\begin{array}{@{}rcl@{}} \boldsymbol{\mu }_{j}^{T}\mathrm{\phi (}x^{\prime}\mathrm{)}&=& \left( \frac{X^{T}\mathbf{\alpha }_{j}}{\sqrt {\mathbf{\alpha }_{j}^{T}K_{x}\mathbf{\alpha }_{j}} } \right)^{T}\mathrm{ \phi (}x^{\prime}\mathrm{)} \end{array} $$
$$\begin{array}{@{}rcl@{}} &=& \frac{1}{\sqrt {\mathbf{\alpha }_{j}^{T}K_{x}\mathbf{\alpha }_{j}} }\boldsymbol{\alpha }_{j}^{T}\left[ {\begin{array}{l} K\left( x^{\prime},x_{1} \right) \\ K\left( x^{\prime},x_{2} \right) \\ \mathellipsis \\ K(x^{\prime},x_{n}) \\ \end{array}} \right] \end{array} $$
(39)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarvestani, R.R., Boostani, R. FF-SKPCCA: Kernel probabilistic canonical correlation analysis. Appl Intell 46, 438–454 (2017). https://doi.org/10.1007/s10489-016-0823-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0823-x

Keywords

Navigation