Skip to main content
Log in

Fréchet distance-based cluster analysis for multi-dimensional functional data

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Multi-dimensional functional data analysis has become a contemporary research topic in medical research as patients’ various records are measured over time. We propose two clustering methods using the Fréchet distance for multi-dimensional functional data. The first method extends an existing K-means type approach from one-dimensional to multi-dimensional longitudinal data. The second method enforces sparsity on functional variables while grouping observed trajectories and enables us to assess the contribution from each variable. Both methods utilize the generalized Fréchet distance to measure the distance between trajectories with irregularly spaced and asynchronous measurements. We demonstrate the effectiveness of the proposed methods through a comparative study using various simulation examples. Then, we apply the sparse clustering method to multi-dimensional thyroid cancer data collected in South Korea. It produces interpretable clusters and weighs the importance of functional variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Abraham, C., Cornillon, P.-A., Matzner-Løber, E., Molinari, N.: Unsupervised curve clustering using b-splines. Scand. J. Stat. 30(3), 581–595 (2003)

    MathSciNet  MATH  Google Scholar 

  • Aneiros, G., Vieu, P.: Variable selection in infinite-dimensional problems. Stat. Probab. Lett. 94, 12–20 (2014)

    MathSciNet  MATH  Google Scholar 

  • Boelaert, K., Horacek, J., Holder, R., Watkinson, J., Sheppard, M., Franklyn, J.: Serum thyrotropin concentration as a novel predictor of malignancy in thyroid nodules investigated by fine-needle aspiration. J. Clin. Endocrinol. Metab. 91(11), 4295–4301 (2006)

    Google Scholar 

  • Bunea, F., Wegkamp, M.H., Auguste, A.: Consistent variable selection in high dimensional regression via multiple testing. J. Stat. Plann. Inference 136(12), 4349–4364 (2006)

    MathSciNet  MATH  Google Scholar 

  • Castagna, M.G., Maino, F., Cipri, C., Belardini, V., Theodoropoulou, A., Cevenini, G., Pacini, F.: Delayed risk stratification, to include the response to initial treatment (surgery and radioiodine ablation), has better outcome predictivity in differentiated thyroid cancer patients. Eur. J. Endocrinol. 165(3), 441 (2011)

    Google Scholar 

  • Cho, Y., Kong, S.-Y., Shin, A., Lee, J., Lee, E.K., Lee, Y.J., Kim, J.: Biomarkers of thyroid function and autoimmunity for predicting high-risk groups of thyroid cancer: a nested case-control study. BMC Cancer 14(1), 1–10 (2014)

    Google Scholar 

  • Clark, P., Franklyn, J.: Can we interpret serum thyroglobulin results? Ann. Clin. Biochem. 49(4), 313–322 (2012)

    Google Scholar 

  • Collazos, J.A.A., Dias, R., Zambom, A.Z.: Consistent variable selection for functional regression models. J. Multivar. Anal. 146, 63–71 (2016)

    MathSciNet  MATH  Google Scholar 

  • da Silveira Duval, M.A., Zanella, A.B., Cristo, A.P., Faccin, C.S., Graudenz, M.S., Maia, A.L.: Impact of serum tsh and anti-thyroglobulin antibody levels on lymph node fine-needle aspiration thyroglobulin measurements in differentiated thyroid cancer patients. European Thyroid Journal 6(6), 292–297 (2017)

    Google Scholar 

  • David, A., Blotta, A., Bondanelli, M., Rossi, R., Roti, E., Braverman, L.E., Busutti, L., Degli Uberti, E.C.: Serum thyroglobulin concentrations and 131i whole-body scan results in patients with differentiated thyroid carcinoma after administration of recombinant human thyroid-stimulating hormone. J. Nucl. Med. 42(10), 1470–1475 (2001)

    Google Scholar 

  • Dowson, D.C., Landau, B.V.: The fréchet distance between multivariate normal distributions. J. Multivariate Anal. 12(3), 450–455 (1982)

  • Edith, T., Starich, G.H., Mazzaferri, E.L.: Sensitivity, specificity, and cost-effectiveness of the sensitive thyrotropin assay in the diagnosis of thyroid disease in ambulatory patients. Arch. Intern. Med. 149(3), 526–532 (1989)

    Google Scholar 

  • Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics). Springer, Berlin, Heidelberg (2006)

    MATH  Google Scholar 

  • Fiore, E., Vitti, P.: Serum tsh and risk of papillary thyroid cancer in nodular thyroid disease. J. Clin. Endocrinol. Metab. 97(4), 1134–1145 (2012)

    Google Scholar 

  • Floriello, D., Vitelli, V.: Sparse clustering of functional data. J. Multivar. Anal. 154, 1–18 (2017)

    MathSciNet  MATH  Google Scholar 

  • Fraiman, R., Gimenez, Y., Marcela, S.: Feature selection for functional data. J. Multivar. Anal. 146, 191–208 (2016)

    MathSciNet  MATH  Google Scholar 

  • Genolini, C., Falissard, B.: Kml: k-means for longitudinal data 7. Comput. Statist. 25, 317–328 (2010)

    MathSciNet  MATH  Google Scholar 

  • Genolini, C., Falissard, B.: Kml: a package to cluster longitudinal data. Comput. Methods Programs Biomed. 104(3), 112–121 (2011)

    Google Scholar 

  • Genolini, C., Pingault, J.-B., Driss, T., Côté, S., Tremblay, R.E., Vitaro, F., Arnaud, C., Falissard, B.: Kml3d: a non-parametric algorithm for clustering joint trajectories. Comput. Methods Programs Biomed. 109(1), 104–111 (2013)

    Google Scholar 

  • Genolini, C., Ecochard, R., Benghezal, M., Driss, T., Andrieu, S., Subtil, F.: kmlshape: an efficient method to cluster longitudinal data (time-series) according to their shapes. PLoS ONE 11(6), 0150738 (2016)

    Google Scholar 

  • Gertheiss, J., Maity, A., Staicu, A.-M.: Variable selection in generalized functional linear models. Stat 2(1), 86–101 (2013)

    MathSciNet  Google Scholar 

  • Haymart, M.R., Repplinger, D.J., Leverson, G.E., Elson, D.F., Sippel, R.S., Jaume, J.C., Chen, H.: Higher serum thyroid stimulating hormone level in thyroid nodule patients is associated with greater risks of differentiated thyroid cancer and advanced tumor stage. J. Clin. Endocrinol. Metab. 93(3), 809–814 (2008)

    Google Scholar 

  • Hong, Z., Lian, H.: Inference of genetic networks from time course expression data using functional regression with lasso penalty. Commun. Stat.-Theory Methods 40(10), 1768–1779 (2011)

    MathSciNet  MATH  Google Scholar 

  • Horváth, L., Kokoszka, P.: Inference for Functional Data with Applications. Springer, New York (2012)

    MATH  Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)

    MATH  Google Scholar 

  • Hubert, M., Rousseeuw, P.J., Segaert, P.: Multivariate functional outlier detection. Stat. Methods Appl. 24, 177–202 (2015)

    MathSciNet  MATH  Google Scholar 

  • Ieva, F., Paganoni, A.M., Pigoli, D., Vitelli, V.: Multivariate functional clustering for the morphological analysis of electrocardiograph curves. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 62(3), 401–418 (2013)

    MathSciNet  Google Scholar 

  • Indrasena, B.S.H.: Use of thyroglobulin as a tumour marker. World J. Biol. Chem. 8(1), 81 (2017)

    Google Scholar 

  • James, G.M., Wang, J., Zhu, J.: Functional linear regression that’s interpretable. Ann. Stat. 37, 2083–2108 (2009)

    MathSciNet  MATH  Google Scholar 

  • Jeon, M.J., Kim, W.G., Park, W.R., Han, J.M., Kim, T.Y., Song, D.E., Chung, K.-W., Ryu, J.-S., Hong, S.J., Shong, Y.K., et al.: Modified dynamic risk stratification for predicting recurrence using the response to initial therapy in patients with differentiated thyroid carcinoma. Eur. J. Endocrinol. 170(1), 23–30 (2013)

    Google Scholar 

  • Jung, K.-W., Won, Y.-J., Hong, S., Kong, H.-J., Lee, E.S.: Prediction of cancer incidence and mortality in korea, 2020. Cancer Res. Treat. Off. J. Korean Cancer Assoc. 52(2), 351 (2020)

    Google Scholar 

  • Kim, S.-J., Koh, K., Boyd, S., Gorinevsky, D.: \(\ell _1\) trend filtering. SIAM Rev. 51(2), 339–360 (2009)

    MathSciNet  MATH  Google Scholar 

  • Komárek, A., Komárková, L.: Clustering for multivariate continuous and discrete longitudinal data. Annals Appl. Stat. 7(1), 177–200 (2013)

    MathSciNet  MATH  Google Scholar 

  • Lee, E.R., Park, B.U.: Sparse estimation in functional linear regression. J. Multivar. Anal. 105, 1–18 (2012)

    MathSciNet  MATH  Google Scholar 

  • Lim, Y., Cheung, Y.K., Oh, H.-S.: A generalization of functional clustering for discrete multivariate longitudinal data. Stat. Methods Med. Res. 29(11), 3205–3217 (2020)

    MathSciNet  Google Scholar 

  • Matsui, H.: Variable and boundary selection for functional data via multiclass logistic regression modeling. Comput. Stat. Data Anal. 78, 176–185 (2014)

    MathSciNet  MATH  Google Scholar 

  • Matsui, H., Konishi, S.: Variable selection for functional regression models via the l1 regularization. Comput. Stat. Data Anal. 55(12), 3304–3310 (2011)

    MATH  Google Scholar 

  • Navarro Silvera, S.A., Miller, A.B., Rohan, T.E.: Risk factors for thyroid cancer: a prospective cohort study. Int. J. Cancer 116(3), 433–438 (2005)

    Google Scholar 

  • Nguyen, Q.T., Lee, E.J., Huang, M.G., Park, Y.I., Khullar, A., Plodkowski, R.A.: Diagnosis and treatment of patients with thyroid cancer. Am. Health Drug Benefits 8(1), 30 (2015)

    Google Scholar 

  • Pellegriti, G., Frasca, F., Regalbuto, C., Squatrito, S., Vigneri, R.: Worldwide increasing incidence of thyroid cancer: update on epidemiology and risk factors. J. Cancer Epidemiol. 2013 (2013)

  • Pellegriti, G., Mannarino, C., Russo, M., Terranova, R., Marturano, I., Vigneri, R., Belfiore, A.: Increased mortality in patients with differentiated thyroid cancer associated with graves’ disease. J. Clin. Endocrinol. Metab. 98(3), 1014–1021 (2013)

    Google Scholar 

  • Peterson, E., De, P., Nuttall, R.: Bmi, diet and female reproductive factors as risks for thyroid cancer: a systematic review. PLoS ONE 7(1), 29177 (2012)

    Google Scholar 

  • Ramsay, J.O.: When the data are functions. Psychometrika 47, 379–396 (1982)

    MathSciNet  MATH  Google Scholar 

  • Ramsay, J.O., Dalzell, C.J.: Some tools for functional data analysis. J. Roy. Stat. Soc.: Ser. B (Methodol.) 53, 539–572 (1991)

    MathSciNet  MATH  Google Scholar 

  • Ramsay, J.O., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, New York (2002)

    MATH  Google Scholar 

  • Ramsay, J.O., Silverman, B.W.: Functional Data Analysis, 2nd edn. Springer, New York (2005)

    MATH  Google Scholar 

  • Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Google Scholar 

  • Rothacker, K.M., Brown, S.J., Hadlow, N.C., Wardrop, R., Walsh, J.P.: Reconciling the log-linear and non-log-linear nature of the tsh-free t4 relationship: intra-individual analysis of a large population. J. Clin. Endocrinol. Metab. 101(3), 1151–1158 (2016)

    Google Scholar 

  • Shivaraj, G., Prakash, B.D., Sonal, V., Shruthi, K., Vinayak, H., Avinash, M.: Thyroid function tests: a review. Eur. Rev. Med. Pharmacol. Sci. 13(5), 341–349 (2009)

    Google Scholar 

  • Soh, S.-B., Aw, T.-C.: Laboratory testing in thyroid conditions-pitfalls and clinical utility. Ann. Lab. Med. 39(1), 3–14 (2019)

    Google Scholar 

  • Tian, T.S., James, G.M.: Interpretable dimension reduction for classifying functional data. Comput. Stat. Data Anal. 57, 282–296 (2013)

    MathSciNet  MATH  Google Scholar 

  • Tuttle, R.M., Tala, H., Shah, J., Leboeuf, R., Ghossein, R., Gonen, M., Brokhin, M., Omry, G., Fagin, J.A., Shaha, A.: Estimating risk of recurrence in differentiated thyroid cancer after total thyroidectomy and radioactive iodine remnant ablation: using response to therapy variables to modify the initial risk estimates predicted by the new american thyroid association staging system. Thyroid 20(12), 1341–1349 (2010)

    Google Scholar 

  • Vaisman, F., Tala, H., Grewal, R., Tuttle, R.M.: In differentiated thyroid cancer, an incomplete structural response to therapy is associated with significantly worse clinical outcomes than only an incomplete thyroglobulin response. Thyroid 21(12), 1317–1322 (2011)

    Google Scholar 

  • Vaisman, F., Momesso, D., Bulzico, D.A., Pessoa, C.H., Dias, F., Corbo, R., Vaisman, M., Tuttle, R.M.: Spontaneous remission in thyroid cancer patients after biochemical incomplete response to initial therapy. Clin. Endocrinol. 77(1), 132–138 (2012)

    Google Scholar 

  • Van Deventer, H.E., Mendu, D.R., Remaley, A.T., Soldin, S.J.: Inverse log-linear relationship between thyroid-stimulating hormone and free thyroxine measured by direct analog immunoassay and tandem mass spectrometry. Clin. Chem. 57(1), 122–127 (2011)

    Google Scholar 

  • Verkooijen, H.M., Fioretta, G., Pache, J.-C., Franceschi, S., Raymond, L., Schubert, H., Bouchardy, C.: Diagnostic changes as a reason for the increase in papillary thyroid cancer incidence in geneva, switzerland. Cancer Causes Control 14(1), 13–17 (2003)

    Google Scholar 

  • Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080 (2009)

  • Vitelli, V.: A novel framework for joint sparse clustering and alignment of functional data. arXiv (2019). https://doi.org/10.48550/ARXIV.1912.00687

  • Wang, H., Kai, B.: Functional sparsity: global versus local. Stat. Sin. 25, 1337–1354 (2015)

    MathSciNet  MATH  Google Scholar 

  • Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105(490), 713–726 (2010)

    MathSciNet  MATH  Google Scholar 

  • Yaun, K.M., Kennedy, A.: Ft4 should replace tsh in diagnosing abnormal thyroid function. McGill J. Med. 15(1) (2017)

  • Yeh, N.-C., Chou, C.-W., Weng, S.-F., Yang, C.-Y., Yen, F.-C., Lee, S.-Y., Wang, J.-J., Tien, K.-J.: Hyperthyroidism and thyroid cancer risk: a population-based cohort study. Exp. Clin. Endocrinol. Diabetes 121(07), 402–406 (2013)

  • Zeng, P., Qing Shi, J., Kim, W.-S.: Simultaneous registration and clustering for multidimensional functional data. J. Comput. Graph. Stat. 28(4), 943–953 (2019)

    MathSciNet  MATH  Google Scholar 

  • Zhang, J.T.: Analysis of Variance for Functional Data. Chapman and Hall, London (2013)

    Google Scholar 

  • Zhang, L., Li, H., Ji, Q.-H., Zhu, Y.-X., Wang, Z.-Y., Wang, Y., Huang, C.-P., Shen, Q., Li, D.-S., Wu, Y.: The clinical features of papillary thyroid cancer in hashimoto’s thyroiditis patients from an area with a high prevalence of hashimoto’s disease. BMC Cancer 12(1), 1–8 (2012)

    Google Scholar 

Download references

Acknowledgements

We thank the two anonymous reviewers for their helpful comments. We also thank Jinwoo Jeong (Gunn High School) for proofreading.

Funding

Hosik Choi’s research was supported by the Basic Science Research Program through the NRF funded by the Ministry of Education (2017R1D1A1B05028565). Young Joo Yoon’s work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2020R1F1A1A01054878). Soon-Sun Kwon’s work was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2017R1E1A1A03070345, 2021R1A6A1A10044950). Cheolwoo Park’s work was supported in part by Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1A2C1092925).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Soon-Sun Kwon or Cheolwoo Park.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article. If any of the sections are not relevant to your manuscript, please include the heading and write ‘Not applicable’ for that section.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 615 KB)

Appendices

Appendix A: The Generalized Fréchet distance

Theorem 1

The generalized Fréchet distance is a metric for all \(\lambda > 0\).

Proof

By abuse of notation, we write

$$\begin{aligned} P(x) \leftarrow (x, P(x)) \in \mathbb {R}^2 \quad \text {and}\quad Q(x) \leftarrow (x, Q(x))\in \mathbb {R}^2. \end{aligned}$$

To show that \(\text {FD}_\lambda (\cdot , \cdot )\) is a metric, it is enough to verify the triangle inequality:

$$\begin{aligned} \text {FD}_\lambda (P,Q) + \text {FD}_\lambda (Q, R) \ge \text {FD}_\lambda (P,R) \end{aligned}$$

for all trajectories PQR.

Let \(\epsilon >0\) be given arbitrarily. By the definition of infimum, there exist reparametrizations \(\alpha _1, \alpha _2, \beta _1, \beta _2\) such that

$$\begin{aligned} \begin{aligned} \max _t d(P\circ \alpha _1(\lambda t), Q\circ \beta _1(\lambda t))&< \text {FD}_\lambda (P,Q) + \dfrac{\epsilon }{3} \\ \max _t d(Q\circ \alpha _2(\lambda t), R\circ \beta _2(\lambda t))&< \text {FD}_\lambda (Q,R) + \dfrac{\epsilon }{3}. \end{aligned} \end{aligned}$$
(A1)

To simplify our argument, we assume that the reparametrizations are strictly increasing so that, in particular, \(\alpha _2\) is invertible. Then also by the definition of maximum, there exists \(t_\epsilon \in [0, \frac{T}{\lambda }]\) such that

$$\begin{aligned}{} & {} d(P\circ \alpha _1(\lambda t_\epsilon ), R\circ \beta _2\circ \alpha _2^{-1}\circ \beta _1(\lambda t_\epsilon )) >\nonumber \\{} & {} \max _t d(P\circ \alpha _1(\lambda t), R\circ \beta _2\circ \alpha _2^{-1}\circ \beta _1(\lambda t)) - \frac{\epsilon }{3}. \end{aligned}$$
(A2)

Let \(u_\epsilon = \frac{1}{\lambda }\alpha _2^{-1}\circ \beta _1(\lambda t_\epsilon )\) so that \(\beta _1(\lambda t_\epsilon ) = \alpha _2 (\lambda u_\epsilon )\). Combining (A1) and (A2), we have

$$\begin{aligned}&\frac{2}{3}\epsilon + \text {FD}_\lambda (P,Q) + \text {FD}_\lambda (Q, R) \\&\quad> d(P\circ \alpha _1(\lambda t_\epsilon ), Q\circ \beta _1(\lambda t_\epsilon ))\\&\quad + d(Q\circ \alpha _2(\lambda u_\epsilon ), R\circ \beta _2(\lambda u_\epsilon ))\\&\quad \ge d(P\circ \alpha _1(\lambda t_\epsilon ), R\circ \beta _2(\lambda u_\epsilon )) \\&\quad > d(P\circ \alpha _1(\lambda t), R\circ \beta _2\circ \alpha _2^{-1}\circ \beta _1(\lambda t)) - \frac{\epsilon }{3} \qquad \quad \ \ \, \\&\quad \ge \text {FD}_\lambda (P,R) - \frac{\epsilon }{3}, \end{aligned}$$

and by letting \(\epsilon \rightarrow 0\), we obtain the desired result.

Appendix B: Derivation of Eq. (9)

Let us consider the following problem with linear and quadratic constraints:

$$\begin{aligned}{} & {} \min _{\varvec{w}} \left( -\sum _{j=1}^{p}a_jw_j \right) \text{ subject } \text{ to } \Vert \varvec{w}\Vert _1 \le s, \nonumber \\{} & {} \quad \Vert \varvec{w}\Vert _2^2 \le 1, w_j\ge 0, ~\forall j. \end{aligned}$$
(B3)

Note that the problem (B3) is well-defined when \(1 \le s \le \sqrt{p}\); if \(s > \sqrt{p},\) the \(l_1\) constraint is redundant, and if \(s < 1,\) the \(l_2\) constraint is redundant.

For \(1 \le s \le \sqrt{p}\), the Lagrangian function L for (B3) is given as:

$$\begin{aligned} L= & {} -\sum _{j=1}^{p}a_jw_j +\lambda _1(\Vert \varvec{w}\Vert _1-s) +\lambda _2(\Vert \varvec{w}\Vert _2^2-1) \\{} & {} -\sum _{j=1}^p\lambda _{3j}w_j. \end{aligned}$$

Then, the Karush-Kuhn-Tucker (KKT) conditions of the problem are as follows:

$$\begin{aligned} -a_j + \lambda _1 \text{ sgn }(w_j) + 2\lambda _2w_j-\lambda _{3j}=0{} & {} \text{ for } w_j \ne 0, \\ |-a_j -\lambda _{3j}|\le \lambda _1{} & {} \text{ for } w_j = 0. \nonumber \end{aligned}$$
(B4)

In this case, the complementary slackness conditions are given as:

$$\begin{aligned}{} & {} \lambda _1(\Vert \varvec{w}\Vert _1-s) = 0, \quad \lambda _2(\Vert \varvec{w}\Vert ^2_2-1) = 0, \end{aligned}$$
(B5)
$$\begin{aligned}{} & {} \lambda _{3j}w_j = 0 \text{ for } \forall j, \end{aligned}$$
(B6)

and the primal feasibility conditions are \(w_j \ge 0,\) \(\Vert \varvec{w}\Vert _1 \le s\) and \(\Vert \varvec{w}\Vert _2^2 \le 1.\) Also, the Dual feasibility conditions are \(\lambda _1\ge 0, \lambda _2 \ge 0, \lambda _{3j} \ge 0, \forall j.\)

If \(\Vert \varvec{w}\Vert _2^2 < 1,\) \(\lambda _2=0\) by (B5). Then, for \(w_j \ne 0\), we have \(a_j + \lambda _{3j} =\lambda _1\) by (B4), which means \(\lambda _{3j} =-a_j+\lambda _1\), but \(\lambda _{3j}\) cannot be zero for all j. Hence, \(w_j=0\) from (B6), but this contradicts the initial assumption, \(w_j \ne 0\). Therefore, the optimal solution should lie in \(\Vert \varvec{w}\Vert _2^2 = 1.\)

We note that if \(a_j \le 0\), then \(w_j=0\) from the following argument. For \(w_j \ne 0\), \(\lambda _{3j}=0\) by (B6), and in turn \(-a_j + \lambda _1 \text{ sgn }(w_j) + 2\lambda _2w_j=0\) in (B4), which cannot be met if \(w_j>0\). Next, we consider the case \(a_j>0\). From (B4), \({w}_j=\frac{1}{2\lambda _2}\max (a_j-\lambda _1,0).\) Since \(\Vert \varvec{w}\Vert _2^2=1,\) we normalize \(\varvec{w}\) with its \(l_2\) norm. This satisfies the unit \(l_2\) norm, and it should also satisfy the condition, \(\Vert \varvec{w}\Vert _1 \le s\). From here on, \(\varvec{w}\) has the unit \(l_2\) norm.

The final step is to determine \(\lambda _1\). When \(\lambda _1>0\), \(\Vert \varvec{w}\Vert _1=s\) from (B5). Set \(w_j=0\) when the corresponding \(a_j\) is non-positive as argued before. So, without loss of generality, assume that all \(a_j\) values are positive. Let the order statistics of \(a_j\) be \(a_{(1)}> a_{(2)}> \cdots> a_{(p)}> 0\), and \(\mathcal {A}= \{j |a_j > \lambda _1, j =1, \ldots ,p\}.\) If we assume \(a_{(l)}>\lambda _1>a_{(l+1)}\), \(|\mathcal {A}|=l\). Because \(\Vert w \Vert _1=s\) and \(\Vert w \Vert _2^2=1\), \(\sum _{j=1}^p w_j=\sum _{j=1}^l (a_{(j)}-\lambda _1)=s \sqrt{\sum _{j=1}^{l} (a_{(j)}-\lambda _1)^2}\), and then we have

$$\begin{aligned}{} & {} \lambda ^{(l)}_{1\pm }=\frac{\sum _{j=1}^{l} a_{(j)}}{l}\\{} & {} \quad \pm \frac{1}{\sqrt{l}}\sqrt{\frac{(\sum _{j=1}^{l} a_{(j)})^2}{l}-\frac{(\sum _{j=1}^{l} a_{(j)})^2-(\sum _{j=1}^{l} a_{(j)}^2)s^2}{l-s^2}}. \end{aligned}$$

Thus,

$$\begin{aligned} \lambda _1=\min _{1\le l \le p} \min (\lambda ^{(l)}_{1+}, \lambda ^{(l)}_{1-}). \end{aligned}$$

Here, we take the minimum to avoid too sparse solutions.

Lastly, because the condition \(\Vert \varvec{w}\Vert _1 \le s\) is unnecessary when \(\lambda _1=0\), we just normalize \(\varvec{w}\) such that \(\Vert \varvec{w}\Vert _2=1.\)

Appendix C: Simulation settings

1.1 Setting 1

There are two variables, and the profile shapes are given as below:

  • Variable 1

 

Population trajectory

Sample trajectory

Group 1

\(\frac{1}{19}(16t+22)\)

\(\frac{1}{19}(16t+22)+Z, ~ Z \sim N(U(-1,1),2^2)\)

Group 2

10

\(10+Z, ~ Z \sim N(U(-1,1),2^2)\)

Group 3

\(-\frac{1}{19}(10t-295)\)

\(-\frac{1}{19}(10t-295)+Z, ~ Z \sim N(U(-1,1),2^2)\)

  • Variable 2

 

Population trajectory

Sample trajectory

Group 1

\(5\phi _{X}(t)+10,~ X \sim N(5,1)\)

\(Z_1 \phi _{X}(t)+Z_2,~X \sim N(U(2,8),1),~ \)

  

\(Z_1 \sim N(5,3^2), ~ Z_2 \sim Z(10,2^2)\)

Group 2

\(30\sum _{i=1}^{3}\phi _{X_i}(t), ~ X_1 \sim N(5,1)\)

\(Z\sum _{i=1}^{3}\phi _{X_i}(t), ~ X_1 \sim N(U(2,8),1)\),

 

\(X_2 \sim N(10,1), X_3 \sim N(15,1)\)

\(X_2 \sim N(U(7,13),1), X_3 \sim N(U(12,18),1),\)

  

\(Z \sim N(30,3^2)\)

Group 3

\(45\phi _{X_1}(t)+35\phi _{X_2}(t), \)

\(Z_{1}\phi _{X_1}(t)+Z_{2}\phi _{X_2}(t), ~ \)

 

\( X_1 \sim N(5,1), X_2 \sim N(15,1)\)

\(X_1 \sim N(U(2,8),1), X_2 \sim N(U(12,18),1), \)

  

\(Z_1 \sim N(45,3^2),~ Z_2 \sim N(35,3^2)\)

1.2 Setting 2

There are two variables, and the profile shapes are given as below and displayed in Fig. 2:

  • Variable 1

 

Population trajectory

Sample trajectory

Group 1

\([1-\Phi _{X}(t)]\times 5, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(5,2^2)\)

Group 2

\([1-\Phi _{X}(t)]\times 10, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(10,2^2)\)

Group 3

\([1-\Phi _{X}(t)]\times 15, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(15,2^2)\)

  • Variable 2

 

Population trajectory

Sample trajectory

Group 1

\(\phi _{X}(t)\times 15, ~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(2,8),1), ~Z \sim N(15,2^2)\)

Group 2

\(\phi _{X}(t)\times 30, ~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(2,8),1), ~Z \sim N(30,2^2)\)

Group 3

\(\phi _{X}(t)\times 45, ~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(2,8),1), ~Z \sim N(45,2^2)\)

1.3 Setting 2a

There are two variables, and the profile shapes are given as below and displayed in Fig. 5. The setting is the same as Setting 2 except for the range in y-axis in Variable 1.

  • Variable 1

 

Population trajectory

Sample trajectory

Group 1

\([1-\Phi _{X}(t)]\times 10, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(10,6^2)\)

Group 2

\([1-\Phi _{X}(t)]\times 20, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(20,6^2)\)

Group 3

\([1-\Phi _{X}(t)]\times 30, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(30,6^2)\)

1.4 Setting 2b

There are two variables, and the profile shapes are given as below and displayed in Fig. 5. The setting is the same as Setting 2 except for the range in y-axis in Variable 2:

  • Variable 2

 

Population trajectory

Sample trajectory

Group 1

\(\phi _{X}(t)\times 15, ~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(2,8),1), ~Z \sim N(40,10^2)\)

Group 2

\(\phi _{X}(t)\times 30, ~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(2,8),1), ~Z \sim N(80,10^2)\)

Group 3

\(\phi _{X}(t)\times 45, ~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(2,8),1), ~Z \sim N(120,10^2)\)

1.5 Setting 3

We add one more variable to Setting 2:

  • Variable 3

 

Population trajectory

Sample trajectory

Group 1

\(\phi _{X}(t)\times 35, ~ X \sim N(4,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(3,5),1), ~ Z \sim N(35,2^2)\)

Group 2

\(\phi _{X}(t)\times 45, ~ X \sim N(7,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(6,8),1), ~ Z \sim N(45,2^2)\)

Group 3

\(\phi _{X}(t)\times 40,\)

\(\phi _{X}(t)\times Z,\)

 

\(X \sim 0.4N(3,1)+0.6N(7,1)\)

\(X \sim 0.4N(3+U,1)+0.6N(7+U,1),\)

  

\(U \sim U(-1,1), ~Z \sim N(40,2^2)\)

1.6 Setting 4

There are two variables, and the profile shapes are given as below:

  • Variable 1

 

Population trajectory

Sample trajectory

Group 1

\([1-\Phi _{X}(t)]\times 10, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)] \times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(10,1)\)

Group 2

\([1-\Phi _{X}(t)]\times 15, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(15,1)\)

Group 3

\([1-\Phi _{X}(t)]\times 15, ~ X \sim N(5,1)\)

\([1-\Phi _{X}(t)]\times Z, ~ X \sim N(U(3,7),1), ~Z \sim N(15,1)\)

  • Variable 2

 

Population trajectory

Sample trajectory

Group 1

\(\phi _{X}(t)\times 45,~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z,~ X \sim N(U(2,8),1), Z \sim N(45,1)\)

Group 2

\(\phi _{X}(t)\times 45,~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z,~ X \sim N(U(2,8),1),~ Z \sim N(45,1)\)

Group 3

\(\phi _{X}(t)\times 35, ~ X \sim N(5,1)\)

\(\phi _{X}(t)\times Z, ~ X \sim N(U(2,8),1), ~Z \sim N(35,1)\)

1.7 Setting 5

In addition to the two variables defined in Setting 2, we include three additional noise variables. All points in the trajectories are generated from \(N(7.5, 2.8^2)\) to match all ranges of the noise variables around 15 (see Fig. 9).

1.8 Setting 6

In addition to the three variables in Setting 3, we include seven additional noise variables. All points in the trajectories are generated from \(N(7.5, 2.8^2)\) to match all ranges of the noise variables around 15.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, I., Choi, H., Yoon, Y.J. et al. Fréchet distance-based cluster analysis for multi-dimensional functional data. Stat Comput 33, 75 (2023). https://doi.org/10.1007/s11222-023-10237-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10237-z

Keywords

Navigation