A joint latent factor analyzer and functional subspace model for clustering multivariate functional data

Sharp, Alex; Browne, Ryan

doi:10.1007/s11222-022-10128-9

A joint latent factor analyzer and functional subspace model for clustering multivariate functional data

Published: 20 August 2022

Volume 32, article number 68, (2022)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

407 Accesses
1 Altmetric
Explore all metrics

Abstract

We introduce a model-based approach for clustering multivariate functional data observations. We utilize theoretical results regarding a surrogate density on the truncated Karhunen–Loeve expansions along with a direct sum specification of the functional space to define a matrix normal distribution on functional principal components. This formulation allows for individual parsimonious modelling of the function space and coefficient space of the univariate components of the multivariate functional observations in the form a subspace projection and latent factor analyzers, respectively. The approach facilitates interpretation at both the full multivariate level and the component level, which is of specific interest when the component functions have clear meaning. We derive an AECM algorithm for fitting the model, and discuss appropriate initialization strategies, convergence and model selection criteria. We demonstrate the model’s applicability through simulation and two data analyses on observations that have many functional components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimension-Reduced Clustering of Functional Data via Subspace Separation

Article 10 June 2017

Functional data clustering by projection into latent generalized hyperbolic subspaces

Article 07 January 2021

Model-Based Clustering with Nested Gaussian Clusters

Article 13 November 2023

References

Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000). https://doi.org/10.1109/34.865189
Article Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the em algorithm for getting the highest likelihood in multivariate gaussian mixture models. Comput. Stat. Data Anal. 41(3), 561–575 (2003). https://doi.org/10.1016/S0167-9473(02)00163-9, https://www.sciencedirect.com/science/article/pii/S0167947302001639, recent Developments in Mixture Model
Bongiorno, E., Goia, A.: Some insights about the small ball probability factorization for hilbert random elements. Stat. Sin. 27, 1949–1965 (2017). https://doi.org/10.5705/ss.202016.0128
Article MathSciNet MATH Google Scholar
Bongiorno, E.G., Goia, A.: Classification methods for hilbert data based on surrogate density. Comput. Stati. Data Anal. 99, 204–222 (2016). https://doi.org/10.1016/j.csda.2016.01.019. https://www.sciencedirect.com/science/article/pii/S0167947316300056
Bouveyron, C., Jacques, J.: Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal. Classif. 5, 281–300 (2011). https://doi.org/10.1007/s11634-011-0095-6
Article MathSciNet MATH Google Scholar
Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52(1), 502–519 (2007). https://doi.org/10.1016/j.csda.2007.02.009
Article MathSciNet MATH Google Scholar
Cardot, H., Ferraty, F., Sarda, P.: Functional linear model. Stat. Probab. Lett. 45, 11–22 (1999). https://doi.org/10.1016/S0167-7152(99)00036-X
Article MathSciNet MATH Google Scholar
Cardot, H., Ferraty, F., Sarda, P.: Spline estimators for the functional linear model. Stat. Sin. 13(3):571–591, (2003) . http://www.jstor.org/stable/24307112
Chen, D., Hall, P., Müller, H.G.: Single and multiple index functional regression models with nonparametric link. Ann. Stat. 39(3), 1720–1747 (2011). https://doi.org/10.1214/11-AOS882
Article MathSciNet MATH Google Scholar
Chiou, J.M., Li, P.L.: Functional clustering and identifying substructures of longitudinal data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 69(4), 679–699 (2007). https://doi.org/10.1111/j.1467-9868.2007.00605.x. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2007.00605.x
Dauxois, J., Pousse, A., Romain, Y.: Asymptotic theory for the principal component analysis of a vector random function: some applications to statistical inference. J. Multivar. Anal. 12(1), 136–154 (1982). https://doi.org/10.1016/0047-259X(82)90088-4. https://www.sciencedirect.com/science/article/pii/0047259X82900884
Dawid, A.P.: Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika 68(1), 265–274 (1981). https://doi.org/10.1093/biomet/68.1.265. https://academic.oup.com/biomet/article-pdf/68/1/265/652522/68-1-265.pdf
Delaigle, A., Hall, P.: Defining probability density for a distribution of random functions. Ann. Stat. 38(2), 1171–1193 (2010). https://doi.org/10.1214/09-AOS741
Article MathSciNet MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977). https://doi.org/10.1111/j.2517-6161.1977.tb01600.x. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1977.tb01600.x
Fremdt, S., Steinbach, J.G., Horvath, L., Kokoszka, P.: Testing the equality of covariance operators in functional samples. Scand. J. Stat. 40(1), 138–152 (2013). https://doi.org/10.1111/j.1467-9469.2012.00796.x. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9469.2012.00796.x
Glanz, H., Carvalho, L.: An expectation-maximization algorithm for the matrix normal distribution. J. Multivar. Anal. (2013). https://doi.org/10.1016/j.jmva.2018.03.010
Article MATH Google Scholar
Hall, P., Keilegom, IV.: Two-sample test in functional data analysis starting from discrete data. Stat. Sin. 17(4):1511–1531 (2007). http://www.jstor.org/stable/24307686
Hsing, T., Eubank, R.: Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, West Sussex (2015)
Book MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article MATH Google Scholar
Ieva, F., Paganoni, A.M., Pigoli, D., Vitelli, V.: Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J. R. Stat. Soc. Ser. C (Appl. Stat.) 62(3), 401–418 (2013). https://doi.org/10.1111/j.1467-9876.2012.01062.x. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9876.2012.01062.x
Jacques, J., Preda, C.: Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171 (2013). https://doi.org/10.1016/j.neucom.2012.11.042. https://www.sciencedirect.com/science/article/pii/S0925231213002233, advances in artificial neural networks, machine learning, and computational intelligence
Jacques, J., Preda, C.: Functional data clustering: a survey. Adv. Data Anal. Classif. 8(3), 24 (2014). https://doi.org/10.1007/s11634-013-0158-y. https://hal.inria.fr/hal-00771030
Jacques, J., Preda, C.: Model-based clustering for multivariate functional data. Comput. Stat. Data Anal. 71, 92–106 (2014). https://doi.org/10.1016/j.csda.2012.12.004. https://www.sciencedirect.com/science/article/pii/S0167947312004380
James, G.M., Sugar, C.A.: Clustering for sparsely sampled functional data. J. Am. Stat. Assoc. 98(462), 397–408 (2003). https://doi.org/10.1198/016214503000189
Article MathSciNet MATH Google Scholar
Kayano, M., Dozono, K., Konishi, S.: Functional cluster analysis via orthonormalized gaussian basis expansions and its application. J. Classif. 27(2), 211–230 (2010). https://doi.org/10.1007/s00357-010-9054-8
Article MathSciNet MATH Google Scholar
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā: Indian J. Stat. Ser. A (1961-2002) 62(1):49–66 (2000). http://www.jstor.org/stable/25051289
Lin, Z., Müller, H.G., Yao, F.: Mixture inner product spaces and their application to functional data analysis. Ann. Stat. 46(1), 370–400 (2018). https://doi.org/10.1214/17-AOS1553
Article MathSciNet MATH Google Scholar
Martino, A., Ghiglietti, A., Ieva, F., Paganoni, A.: A k-means procedure based on a Mahalanobis type distance for clustering multivariate functional data. Stat. Methods Appl. (2017). https://doi.org/10.1007/s10260-018-00446-6
Article MATH Google Scholar
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993). https://doi.org/10.1093/biomet/80.2.267. https://academic.oup.com/biomet/article-pdf/80/2/267/698085/80-2-267.pdf
Meng, X.L., Van Dyk, D.: The em algorithm-an old folk-song sung to a fast new tune. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 59(3), 511–567 (1997). https://doi.org/10.1111/1467-9868.00082. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/1467-9868.00082
Mercer, J.: Functions of positive and negative type, and their connection with the theory of integral equations. Philos. Trans. R. Soc. Lond. Ser. A, Contain. Pap. Math. Phys. Charact. 209:415–446, (1909). http://www.jstor.org/stable/91043
Nathan A (2008) Analysis of pitchf/x pitched baseball trajectories
Qiao, X., Guo, S., James, G.M.: Functional graphical models. J. Am. Stat. Assoc. 114(525), 211–222 (2019). https://doi.org/10.1080/01621459.2017.1390466
Article MathSciNet MATH Google Scholar
Ramsay, J., Silverman, B.: Functional Data Analysis. Springer Series in Statistics, Springer New York (2005). https://books.google.ca/books?id=REzuyz_V6OQC
Ramsay, J.O., Hooker, G., Graves, S.: Functional Data Analysis with R and MATLAB, 1st edn. Springer Publishing Company, Incorporated (2009)
Book MATH Google Scholar
Rice, J.A., Silverman, B.W.: Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. Ser. B (Methodological) 53(1), 233–243 (1991). https://doi.org/10.1111/j.2517-6161.1991.tb01821.x. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1991.tb01821.x
Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl.Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7. https://www.sciencedirect.com/science/article/pii/0377042787901257
Sangalli, L.M., Secchi, P., Vantini, S., Vitelli, V.: k-mean alignment for curve clustering. Comput. Stat. Data Anal 54(5), 1219–1233 (2010). https://doi.org/10.1016/j.csda.2009.12.008. https://www.sciencedirect.com/science/article/pii/S0167947309004605
Saporta, G.: Méthodes exploratoires d’analyse de données temporelles. Theses, Université Pierre et Marie Curie - Paris VI (1981). https://tel.archives-ouvertes.fr/tel-00711814
Schmutz, A., Jacques, J., Bouveyron, C., Cheze, L., Martin, P.: Clustering multivariate functional data in group-specific functional subspaces. Comput. Stat. (2020). https://doi.org/10.1007/s00180-020-00958-4
Article MathSciNet MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2):461–464 (1978). http://www.jstor.org/stable/2958889
Sharp, A., Browne, R.: Functional data clustering by projection into latent generalized hyperbolic subspaces. Adv. Data Anal. Classif. (2021). https://doi.org/10.1007/s11634-020-00432-5
Article MathSciNet MATH Google Scholar
Silverman, B.W.: Smoothed functional principal components analysis by choice of norm. Ann. Stat. 24(1), 1–24 (1996). https://doi.org/10.1214/aos/1033066196
Article MathSciNet MATH Google Scholar
Singhal, A., Seborg, D.: Clustering multivariate time-series data. J. Chemom. 19, 427–438 (2005). https://doi.org/10.1002/cem.945
Article Google Scholar
Srivastava, M., von Rosen, T., von Rosen, D.: Models with a Kronecker product covariance structure: estimation and testing. Math. Methods Stat. 17, 357–370 (2008). https://doi.org/10.3103/S1066530708040066
Article MathSciNet MATH Google Scholar
Steele, R., Raftery, A.: Performance of Bayesian model selection criteria for gaussian mixture models 1. Front. Stat. Decis. Making Bayesian Anal. (2010)
Tokushige, S., Yadohisa, H., Inada, K.: Crisp and fuzzy k-means clustering algorithms for multivariate functional data. Comput. Statistics 22, 1–16 (2007). https://doi.org/10.1007/s00180-006-0013-0
Article MathSciNet MATH Google Scholar
Vohra, K., Vodonos, A., Schwartz, J., Marais, E.A., Sulprizio, M.P., Mickley, L.J.: Global mortality from outdoor fine particle pollution generated by fossil fuel combustion: results from geos-chem. Environ. Res. 195, 110754 (2021). https://doi.org/10.1016/j.envres.2021.110754. https://www.sciencedirect.com/science/article/pii/S0013935121000487
Wang, J.L., Chiou, J.M., Müller, H.G.: Functional data analysis. Ann. Rev. Stat. Appl. 3(1), 257–295 (2016). https://doi.org/10.1146/annurev-statistics-041715-033624
Article Google Scholar
Wang, L.: Karhunen-Loeve expansions and their applications. PhD thesis (2008)
Zambom, A.Z., Collazos, J.A.A., Dias, R.: Function data clustering via hypothesis testing k-means. Comput. Stat. 34, 527–549 (2019). https://doi.org/10.1007/s00180-018-0808-9
Zhang, J.T., Liang, X., Xiao, S.: On the two-sample behrens-fisher problem for functional data. J. Stat. Theory Pract. (2011). https://doi.org/10.1080/15598608.2010.10412005
Zhu, H., Strawn, N., Dunson, DB.: Bayesian graphical models for multivariate functional data. J. Mach. Learn. Res. 17(204):1–27 (2016). http://jmlr.org/papers/v17/16-164.html

Download references

Author information

Authors and Affiliations

University of Waterloo, 200 University Ave W, Waterloo, ON, Canada
Alex Sharp & Ryan Browne

Authors

Alex Sharp
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Browne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alex Sharp.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Countries included in the energy sector analysis

We gather complete data on the following 97 countries (Tables 11, 12):

Table 11 Countries used in the energy sector analysis, sorted by geographical location and listed in alphabetical order

Full size table

The following table lists the countries assigned to each group in alphabetical order.

Table 12 Countries used in the energy sector analysis sorted by the best BIC model grouping and listed in alphabetical order

Full size table

Code for scraping pitch data

The following is an R function for scraping the pitch data form MLBAM’s Statcast database.

Parameter specification for the model selection and parameter recovery simulation

In our parameter recovery simulation, we specify three simulation parameters which we choose to vary the value of across different implementations. Every other model parameter which is not mentioned here is fixed across these implementations. In this section we give a brief overview of how these parameters were generated. In particular, a clever specification for these parameters eluded us, so we instead proceeded to generate the parameters randomly. The generation process was the same for each group, and proceeded in the following manner. The mean matrix $\varvec{M}_g^\star $ was generated from a matrix normal distribution specified as $\mathscr {N}_{p\times d_g}(\varvec{0}, \varvec{I}_p, 4\varvec{I}_{d_g})$. Next, we created a $2p \times p$ matrix filled with iid samples from a standard normal distribution. We then estimate the covariance matrix of these data and set $\varvec{\varLambda }_{1g}$ to be the first $q_g$ eigenvectors found in the corresponding spectral decomposition. Let $\mathscr {U}_1 \sim \text {Unif}(50,100)$ and $\mathscr {U}_2 \sim \text {Unif}(0.5,5)$ be two uniform random variables. Let $\varvec{\varOmega }_g$ be a $d_g \times d_g$ diagonal matrix with diagonal elements comprised of iid draws from $\mathscr {U}_1$. Let $\eta _g$ be the result of a single draw from $\mathscr {U}_2$. We proceeded to construct a b-dimensional diagonal matrix $\varvec{\varDelta }_g$ from these by specifying the diagonal to be $\varvec{\varOmega }_g$ followed by $p-d_g$ copies of $\eta _g$. We then set,

$$\begin{aligned} \varvec{\varOmega }_{2g}&= \left| \varvec{\varDelta }_g\right| ^{-1/b} \varvec{\varOmega }_g, \quad \text {and,}\\ \eta _{2g}&= \left| \varvec{\varDelta }_g\right| ^{-1/b} \eta _g. \end{aligned}$$

The associated matrix of eigenvalues, $\varvec{\varGamma }_{2g}$, was generated randomly from a uniform distribution over the $b \times b$ orthogonal matrices. This completes the parameter generation process.

Parameter specification for comparative analysis II

As mentioned in Sect. 4.4, MFSF and the funHDDC model overlap when we specify our factor loadings and specific variances to have the form $\varvec{\varLambda }_{1g} = \varvec{\varGamma }_{1g}(\varvec{\varOmega }_{1g} - \eta _{1g}\varvec{I}_{q_g})^\frac{1}{2}$ and $\varvec{\varXi }_{1g}=\eta _{1g}\varvec{I}_p$, respectively, where $\varvec{\varOmega }_{1g}$ is a diagonal $q_g \times q_g$ matrix and $\eta _{1g}$ is a positive real number which is less than any of the entries of $\varvec{\varOmega }_{1g}$. Under such a scenario we will have $\varvec{\Sigma }_{1g} = \varvec{\varLambda }_{1g}\varvec{\varLambda }_{1g}^\top + \varvec{\varXi }_{1g} = \varvec{\varGamma }_{1g}\varvec{\varDelta }_{1g}\varvec{\varGamma }_{1g}^\top $, where $\varvec{\varDelta }_{1g}$ has the subspace clustering form given in Eq. (28) with $\varvec{\varOmega }_{1g}$ in place of $\varvec{\varOmega }_{g}$ and $\eta _{1g}$ replacing $\eta _g$. Hence, $\varvec{\Sigma }_{1g}$ has both the factor analyzer and subspace clustering form. The resulting group covariance matrix then has the subspace clustering form with $\varvec{\varOmega }_g$ given by Eq. (31) and $\eta _g$ given by $\eta _{1g}\eta _{2g}$. Under such parameter specification, the MFSF and funHDDC overlap.

For our comparative analysis, this model specification serves as basis for M3, the situation in which parameter specification satisfies the requirements of both the funHDDC algorithm as well as MFSF. With this starting point, we devise a way to deterministically perturb these parameters so that they satisfy only one of the competing models, rather than both. To do this, we need to identify a defining characteristic of each model that is not important for the other. For the funHDDC model, that characteristic is the constant value of the trailing eigenvalues, while for MFSF it is the presence of the kronecker product form. We begin with the former. All subsequent discussion will pertain to a particular, but arbitrary, group g of the model. We hence drop the subscript g in the remainder.

The overlap model is characterized by the dual specification of a latent factor model and a latent subspace model through $\varvec{\Sigma }_{1}$. However, as we noted previously in Sect. 4.4, when $\varvec{\Sigma }_{1}$ does not exhibit the latent subspace form, then the funHDDC model no longer holds. Hence, our goal is to find a transformation that can be applied to $\varvec{\Sigma }_{1}$ which will weaken or remove its latent subspace structure, but preserve its latent factor structure. One obvious way to do this is to alter the specific variances $\varvec{\varXi }_1$. In particular, the latent subspace structure requires $\varvec{\varXi }_1$ to be spherical, so altering its diagonal values so that they differ from one another will subjugate $\varvec{\Sigma }_1$ to deviance from this model. To this end, let $\{\delta _{1i}\}$ denote the trailing $p - q$ eigenvalues of $\varvec{\Sigma }_{1}$. Under the latent subspace model $\delta _{1i} = \eta _1$ for each i. We let the following linear relationship define the $\delta _{1i}$,

$$\begin{aligned} \delta _{1i} = \eta _1 + a \left[ \frac{\omega _{1q} - \eta _1}{p - q -1} \right] (p - q - i ), \quad i =1,2,..., p-q, \end{aligned}$$

where $\omega _{1q}$ is the smallest element of $\varvec{\varOmega }_1$, and a is a value between 0 and 1. We see that when $a=0$, we recover the latent subspace structure, while $a=1$ results in equally spacing the $\delta _{1i}$ along the line between the point $\omega _{1q}$ and $\eta _1$. A graphical example of how this changes the eigenvalues of $\varvec{\Sigma }_{1}$ for different values of a is provided in Fig. 10.

In this figure, $q=2$, and hence the first two points plotted in the figure correspond to $\varvec{\varOmega }_1$. In the case that $a=0$, which corresponds to the black line, we get a constant line at $\eta _1$, recovering the latent subspace structure. As a increases the eigenvalues are lifted above $\eta _1$ at different rates, causing them to take different values and eliminating the latent subspace structure. For our simulation, the value $a=0.5$ corresponds to M4 and the value $a=1$ corresponds to M5.

A central component of MFSF is the assumption that the model covariance matrix is formed as the kronecker product of two lower dimensional matrices. When we consider parameter specifications that overlap with the funHDDC model, this causes the $\varvec{\varOmega }$ matrix to have the form given in Eq. (28). From this structure, we see that $\varvec{\varOmega }$ will always have repeated eigenvalues, thanks to the terms involving $\eta _1\varvec{\varOmega }_2$ and $\eta _2\varvec{\varOmega }_1$. This property is a direct result of the kronecker product assumption, hence, if we transform $\varvec{\varOmega }$ so that no repeated values appear, the resulting model will no longer satisfy the MFSF modelling assumptions. Note that under the funHDDC assumptions, $\varvec{\varOmega }$ is arbitrary (aside from being diagonal with nonnegative entries), so these assumptions will still be satisfied.

Let $(\omega _{ij})$ be the sorted vector of the repeated eigenvalues of $\varvec{\varOmega }$ under M3, where i indexes the unique eigenvalues, and j indexes the repetitions of each, and let $\{\omega _i\}$ be the corresponding set of unique values. We specify the relationship between the eigenvalues using a linear model. If $\omega _i$ belongs to $\eta _1\varvec{\varOmega }_2$ then,

$$\begin{aligned} \omega _{ij} = \omega _{i-1} + a\left[ \frac{\omega _{i-1} - \omega _i}{p-q}\right] (p-q-j) \end{aligned}$$

where a is again some value between 0 and 1. If $\omega _i$ belongs to $\eta _2\varvec{\varOmega }_1$, then,

$$\begin{aligned} \omega _{ij} = \omega _{i-1} + a\left[ \frac{\omega _{i-1} - \omega _i}{b-d}\right] (b-d-j). \end{aligned}$$

This approach works in exactly the same manner as the previous. By increasing a, we raise the set of repeated eigenvalues like a drawbridge to connect them with the preceding eigenvalue. By doing this, we remove all repetitions in the eigenvalues, and hence remove the kronecker structure. Setting the value of a to 0.5 corresponds to model M2 and setting the value of a to be 1 corresponds to model M1.

In our study, for each of the 12 scenarios, we generated one parameter set according to the specification M3 and then modified these according to the rules described above to obtain the parameters for models $M1-M5$. Lacking clever ideas for choosing the particular values of these parameters ourselves, we resigned to generating them randomly. Generation proceeded in the following manner. The mean matrix for the first group, denoted by $\varvec{M}_1$, was generated from $\mathscr {N}_{p\times b}(\varvec{0}, \varvec{I}_p, \varvec{I}_b)$, which is the standard matrix normal distribution. The mean of the other parameter group, denoted by $\varvec{M}_2$, was determined by adding a random pb-dimensional vector of length $\rho $ to $\varvec{M}_1$, where $\rho $ is the value such that $\left\Vert \varvec{M}_1 - \varvec{M}_2\right\Vert = \rho $, which is specified by each of the experimental conditions. Define the random variables $\mathscr {U}_1 \sim \text {Unif}(5,5.5)$ and $\mathscr {U}_2 \sim \text {Unif}(0.5,5)$. Let $\varvec{\varOmega }_g^\star $ be a $q_g \times q_g$ diagonal matrix with elements composed of iid draws from $\mathscr {U}_1$. Define $\eta _g^\star $ as a single draw from $\mathscr {U}_2$. Construct a p-dimensional diagonal matrix $\varvec{\varDelta }^\star $ from these by specifying the diagonal as $\varvec{\varOmega }^\star $ followed by $p-q_g$ copies of $\eta ^\star $. We then set,

$$\begin{aligned} \varvec{\varOmega }_{1g}&= \left| \varvec{\varDelta }^\star \right| ^{-1/p} \omega ^\star , \quad \text {and,}\\ \eta _{1g}&= \left| \varvec{\varDelta }^\star \right| ^{-1/p} \eta ^\star \end{aligned}$$

from which we can then construct $\varvec{\varGamma }_{1g}$ and $\varvec{\varXi }_{1g}$. The parameter $\varvec{\varDelta }_{2g}$ is generated similarly, but with p replaced by b and $q_g$ replaced with $d_g$. For simplicity, we specify that the eigenvector matrix is equal to identity for each group. In each of the 12 experimental conditions, the hyperparameters $q_g$ and $d_g$ are set to 2 and 3, respectively. This results in a value of k for the funHDDC model of 34 for the low-dimensional settings, and 114 for the high-dimensional settings.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sharp, A., Browne, R. A joint latent factor analyzer and functional subspace model for clustering multivariate functional data. Stat Comput 32, 68 (2022). https://doi.org/10.1007/s11222-022-10128-9

Download citation

Received: 01 August 2021
Accepted: 09 July 2022
Published: 20 August 2022
DOI: https://doi.org/10.1007/s11222-022-10128-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A joint latent factor analyzer and functional subspace model for clustering multivariate functional data

Abstract

Access this article

Similar content being viewed by others

Dimension-Reduced Clustering of Functional Data via Subspace Separation

Functional data clustering by projection into latent generalized hyperbolic subspaces

Model-Based Clustering with Nested Gaussian Clusters

References