Abstract
The growth curve clustering problem is analyzed and its connection with the spectral relaxation method is described. For a given set of growth curves and similarity function, a similarity matrix is defined, from which the corresponding similarity graph is constructed. It is shown that a nearly optimal growth curve partition can be obtained from the eigendecomposition of a specific matrix associated with a similarity graph. The results are illustrated and analyzed on the set of synthetically generated growth curves. One real-world problem is also given.
Similar content being viewed by others
References
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Bolla M (2011) Penalized versions of the Newman-Girvan modularity and their relation to normalized cuts and \(k\)-means clustering. Phys Rev E 84:016108
Chamroukhi F (2016) Piecewise regression mixture for simultaneous functional data clustering and optimal segmentation. J Classif 33:374–411
Ćwik M, Józefczyk J (2017) Heuristic algorithms for the minmax regret flow-shop problem with interval processing times. Cent Eur J Oper Res. https://doi.org/10.1007/s10100-017-0485-8
Diestel R (2000) Graph theory. Electronic edition 2000. Springer, New York
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, Hoboken
Gaffney S (2004) Probabilistic curve-aligned clustering and prediction with mixture models. Ph.D. thesis, Department of Computer Science, University of California, Irvine, USA
Gan G, Wu J, Ma C (2007) Data clustering: theory, algorithms, and applications. SIAM, Philadelphia
Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins University Press, Baltimore, p 320
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Jacques J, Preda C (2014) Functional data clustering: a survey. Adv Data Anal Classif 8:231–255
Jain AK (2010) Data clustering: 50 years beyond \(k\)-means. Pattern Recognit Lett 31:651–666
Janković M, Leko A, Šuvak N (2016) Application of lactation models on dairy cow farms. Croat Oper Res Rev 7:217–227
Jukić D, Scitovski R (2003) Solution of the least-squares problem for logistic function. J Comput Appl Math 156:159–177
Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, Cambridge
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416
Majstorović S, Stevanović D (2014) A note on graphs whose largest eigenvalues of the modularity matrix equals zero. Electron J Linear Algebra 27:611–618
Marošević T, Sabo K, Taler P (2013) A mathematical model for uniform distribution of voters per constituencies. Croat Oper Res Rev 4:63–64
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856
Park J, Ahn J (2017) Clustering multivariate functional data with phase variation. Biometrics 73:324–333
Ratkowsky DA (1990) Handbook of nonlinear regression models. Marcel Dekker, New York
Sabo K (2014) Center-based \(l_1\)-clustering method. Int J Appl Math Comput Sci 24:151–163
Sangalli LM, Secchi P, Vantini S, Veneziani A (2009) A case study in exploratory functional data analysis: geometrical features of the internal carotid artery. J Am Stat Assoc 104:37–48
Sangalli LM, Secchi P, Vantini S, Vitelli V (2010) \(K\)-mean alignment for curve clustering. Comput Stat Data Anal 54:1219–1233
Scitovski R, Scitovski S (2013) A fast partitioning algorithm and its application to earthquake investigation. Comput Geosci 59:124–131
Su Z, Kogan J, Nicholas C (2010) Constrained clustering with \(k\)-means type algorithms. In: Berry MW, Kogan J (eds) Text mining applications and theory. Wiley, Chichester, pp 81–103
Tarpey T, Kinateder KKJ (2003) Clustering functional data. J Classif 20:93–114
Turkalj Ž, Markulak D, Singer S, Scitovski R (2016) Research project grouping and ranking by using adaptive Mahalanobis clustering. Croat Oper Res Rev 7:81–96
Vincek D, Kralik G, Kušec G, Sabo K, Scitovski R (2012a) Application of growth functions in the prediction of live weight of domestic animals. Cent Eur J Oper Res 20:719–733
Vincek D, Sabo K, Kušec G, Kralik G, Djurkin I, Scitovski R (2012b) Modeling of pig growth by S-function—least absolute deviation approach for parameter estimation. Archiv für Tierzucht 55:364–374
Wagner S, Wagner D (2007) Comparing clusterings - an overview. https://publikationen.bibliothek.kit.edu/1000011477/812079. Accessed 14 Sept 2016
Zhang Z, Pati D, Srivastava A (2015) Bayesian clustering of shapes of curves. J Stat Plan Inference 166:171–186
Author information
Authors and Affiliations
Corresponding author
Appendix A: Adjusted rand index
Appendix A: Adjusted rand index
Let S be the set of n data items, \(S=\{x_1,x_2,\ldots ,x_n\}\), and let \(U=\{U_1,U_2,\ldots ,U_k\}\) and \(V=\{V_1,V_2,\ldots ,V_r\}\) be two partitions of S. The measure of similarity between partitions U and V is based on counting the pairs of data items that are in the same/different partition sets in U and V. Each pair \((x_i,x_j)\) of data items is classified into one of four groups based on their comembership in U and V, which results in the pair-counts derived using the contingency table. The contingency table is a \(k\times r\) matrix of all possible overlaps between each pair of clusters U and V, where its ijth element shows the intersection of cluster \(U_i\) and \(V_j\), that is, \(n_{ij}=|U_i\cap V_j|\).
\(V_1\) | \(V_2\) | \(\ldots \) | \(V_r\) | Marginal sums | |
---|---|---|---|---|---|
\(U_1\) | \(n_{11}\) | \(n_{12}\) | \(\ldots \) | \(n_{1r}\) | \(n_{1.}\) |
\(U_2\) | \(n_{21}\) | \(n_{22}\) | \(\ldots \) | \(n_{2r}\) | \(n_{2.}\) |
\(\vdots \) | \(\vdots \) | \(\vdots \) | \(\ddots \) | \(\vdots \) | \(\vdots \) |
\(U_k\) | \(n_{k1}\) | \(n_{k2}\) | \(\ldots \) | \(n_{kr}\) | \(n_{k.}\) |
Marginal sums | \(n_{.1}\) | \(n_{.2}\) | \(\ldots \) | \(n_{.r}\) | n |
In the case of disjoint clusters we have \(n_{i.}=|U_i|\) and \(n_{.j}=|V_j|\). Let us define the following numbers:
-
a—the number of pairs of elements in S that are in the same set in U and in the same set in V;
-
b—the number of pairs of elements in S that are in different sets in U and in different sets in V;
-
c—the number of pairs of elements in S that are in the same set in U and in different sets in V;
-
d—the number of pairs of elements in S that are in different sets in U and in the same set in V.
The Rand index is a measure of similarity between U and V defined as
The numbers a, b, c, d can be easily obtained by using simple combinatorial methods.
The Rand index has a fixed range of [0, 1]. A problem with this type of similarity measure is that the expected value of this index of two random partitions does not take a constant value (say zero). The adjusted Rand index ARI is proposed in Hubert and Arabie (1985) assuming that the contingency table is constructed randomly when the size of the clusters in U and V is fixed. The ARI is calculated based on an upper bound 1 on the RI and its expected value
It is the normalized difference of the Rand Index and its expected value under the null hypothesis (Wagner and Wagner 2007).
Rights and permissions
About this article
Cite this article
Majstorović, S., Sabo, K., Jung, J. et al. Spectral methods for growth curve clustering. Cent Eur J Oper Res 26, 715–737 (2018). https://doi.org/10.1007/s10100-017-0515-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10100-017-0515-6