Spectral methods for growth curve clustering

Majstorović, Snježana; Sabo, Kristian; Jung, Johannes; Klarić, Matija

doi:10.1007/s10100-017-0515-6

Spectral methods for growth curve clustering

Original Paper
Published: 03 January 2018

Volume 26, pages 715–737, (2018)
Cite this article

Central European Journal of Operations Research Aims and scope Submit manuscript

Snježana Majstorović ORCID: orcid.org/0000-0002-3083-0932¹,
Kristian Sabo¹,
Johannes Jung² &
…
Matija Klarić¹

307 Accesses
2 Citations
Explore all metrics

Abstract

The growth curve clustering problem is analyzed and its connection with the spectral relaxation method is described. For a given set of growth curves and similarity function, a similarity matrix is defined, from which the corresponding similarity graph is constructed. It is shown that a nearly optimal growth curve partition can be obtained from the eigendecomposition of a specific matrix associated with a similarity graph. The results are illustrated and analyzed on the set of synthetically generated growth curves. One real-world problem is also given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Selective Overview of Recent Advances in Spectral Clustering and Their Applications

Growth Mixture Modeling with Measurement Selection

Article 29 October 2018

Combinatorial Optimization Approaches for Data Clustering

References

Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Book Google Scholar
Bolla M (2011) Penalized versions of the Newman-Girvan modularity and their relation to normalized cuts and $k$-means clustering. Phys Rev E 84:016108
Article Google Scholar
Chamroukhi F (2016) Piecewise regression mixture for simultaneous functional data clustering and optimal segmentation. J Classif 33:374–411
Article Google Scholar
Ćwik M, Józefczyk J (2017) Heuristic algorithms for the minmax regret flow-shop problem with interval processing times. Cent Eur J Oper Res. https://doi.org/10.1007/s10100-017-0485-8
Diestel R (2000) Graph theory. Electronic edition 2000. Springer, New York
Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, Hoboken
Google Scholar
Gaffney S (2004) Probabilistic curve-aligned clustering and prediction with mixture models. Ph.D. thesis, Department of Computer Science, University of California, Irvine, USA
Gan G, Wu J, Ma C (2007) Data clustering: theory, algorithms, and applications. SIAM, Philadelphia
Book Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins University Press, Baltimore, p 320
Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article Google Scholar
Jacques J, Preda C (2014) Functional data clustering: a survey. Adv Data Anal Classif 8:231–255
Article Google Scholar
Jain AK (2010) Data clustering: 50 years beyond $k$-means. Pattern Recognit Lett 31:651–666
Article Google Scholar
Janković M, Leko A, Šuvak N (2016) Application of lactation models on dairy cow farms. Croat Oper Res Rev 7:217–227
Article Google Scholar
Jukić D, Scitovski R (2003) Solution of the least-squares problem for logistic function. J Comput Appl Math 156:159–177
Article Google Scholar
Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, Cambridge
Google Scholar
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416
Article Google Scholar
Majstorović S, Stevanović D (2014) A note on graphs whose largest eigenvalues of the modularity matrix equals zero. Electron J Linear Algebra 27:611–618
Google Scholar
Marošević T, Sabo K, Taler P (2013) A mathematical model for uniform distribution of voters per constituencies. Croat Oper Res Rev 4:63–64
Google Scholar
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Article Google Scholar
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856
Google Scholar
Park J, Ahn J (2017) Clustering multivariate functional data with phase variation. Biometrics 73:324–333
Article Google Scholar
Ratkowsky DA (1990) Handbook of nonlinear regression models. Marcel Dekker, New York
Google Scholar
Sabo K (2014) Center-based $l_1$-clustering method. Int J Appl Math Comput Sci 24:151–163
Article Google Scholar
Sangalli LM, Secchi P, Vantini S, Veneziani A (2009) A case study in exploratory functional data analysis: geometrical features of the internal carotid artery. J Am Stat Assoc 104:37–48
Article Google Scholar
Sangalli LM, Secchi P, Vantini S, Vitelli V (2010) $K$-mean alignment for curve clustering. Comput Stat Data Anal 54:1219–1233
Article Google Scholar
Scitovski R, Scitovski S (2013) A fast partitioning algorithm and its application to earthquake investigation. Comput Geosci 59:124–131
Article Google Scholar
Su Z, Kogan J, Nicholas C (2010) Constrained clustering with $k$-means type algorithms. In: Berry MW, Kogan J (eds) Text mining applications and theory. Wiley, Chichester, pp 81–103
Chapter Google Scholar
Tarpey T, Kinateder KKJ (2003) Clustering functional data. J Classif 20:93–114
Article Google Scholar
Turkalj Ž, Markulak D, Singer S, Scitovski R (2016) Research project grouping and ranking by using adaptive Mahalanobis clustering. Croat Oper Res Rev 7:81–96
Article Google Scholar
Vincek D, Kralik G, Kušec G, Sabo K, Scitovski R (2012a) Application of growth functions in the prediction of live weight of domestic animals. Cent Eur J Oper Res 20:719–733
Article Google Scholar
Vincek D, Sabo K, Kušec G, Kralik G, Djurkin I, Scitovski R (2012b) Modeling of pig growth by S-function—least absolute deviation approach for parameter estimation. Archiv für Tierzucht 55:364–374
Google Scholar
Wagner S, Wagner D (2007) Comparing clusterings - an overview. https://publikationen.bibliothek.kit.edu/1000011477/812079. Accessed 14 Sept 2016
Zhang Z, Pati D, Srivastava A (2015) Bayesian clustering of shapes of curves. J Stat Plan Inference 166:171–186
Article Google Scholar

Download references

Author information

Authors and Affiliations

J.J. Strossmayer University of Osijek, Trg Ljudevita Gaja 6, 31000, Osijek, Croatia
Snježana Majstorović, Kristian Sabo & Matija Klarić
Technical University of Berlin, Strasse des 17. Juni 135, 10623, Berlin, Germany
Johannes Jung

Authors

Snježana Majstorović
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Sabo
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Jung
View author publications
You can also search for this author in PubMed Google Scholar
Matija Klarić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Snježana Majstorović.

Appendix A: Adjusted rand index

Let S be the set of n data items, $S=\{x_1,x_2,\ldots ,x_n\}$, and let $U=\{U_1,U_2,\ldots ,U_k\}$ and $V=\{V_1,V_2,\ldots ,V_r\}$ be two partitions of S. The measure of similarity between partitions U and V is based on counting the pairs of data items that are in the same/different partition sets in U and V. Each pair $(x_i,x_j)$ of data items is classified into one of four groups based on their comembership in U and V, which results in the pair-counts derived using the contingency table. The contingency table is a $k\times r$ matrix of all possible overlaps between each pair of clusters U and V, where its ijth element shows the intersection of cluster $U_i$ and $V_j$, that is, $n_{ij}=|U_i\cap V_j|$.

	$V_1$	$V_2$	$\ldots $	$V_r$	Marginal sums
$U_1$	$n_{11}$	$n_{12}$	$\ldots $	$n_{1r}$	$n_{1.}$
$U_2$	$n_{21}$	$n_{22}$	$\ldots $	$n_{2r}$	$n_{2.}$
$\vdots $	$\vdots $	$\vdots $	$\ddots $	$\vdots $	$\vdots $
$U_k$	$n_{k1}$	$n_{k2}$	$\ldots $	$n_{kr}$	$n_{k.}$
Marginal sums	$n_{.1}$	$n_{.2}$	$\ldots $	$n_{.r}$	n

In the case of disjoint clusters we have $n_{i.}=|U_i|$ and $n_{.j}=|V_j|$. Let us define the following numbers:

a—the number of pairs of elements in S that are in the same set in U and in the same set in V;
b—the number of pairs of elements in S that are in different sets in U and in different sets in V;
c—the number of pairs of elements in S that are in the same set in U and in different sets in V;
d—the number of pairs of elements in S that are in different sets in U and in the same set in V.

The Rand index is a measure of similarity between U and V defined as

$$\begin{aligned} RI=\frac{a+b}{a+b+c+d}=\frac{a+b}{{n\atopwithdelims ()2}}. \end{aligned}$$

The numbers a, b, c, d can be easily obtained by using simple combinatorial methods.

The Rand index has a fixed range of [0, 1]. A problem with this type of similarity measure is that the expected value of this index of two random partitions does not take a constant value (say zero). The adjusted Rand index ARI is proposed in Hubert and Arabie (1985) assuming that the contingency table is constructed randomly when the size of the clusters in U and V is fixed. The ARI is calculated based on an upper bound 1 on the RI and its expected value

$$\begin{aligned} ARI=\frac{\sum _{i=1}^k\sum _{j=1}^r{n_{ij}\atopwithdelims ()2}-\sum _{i=1}^k{n_{i.}\atopwithdelims ()2}\sum _{j=1}^r{n_{.j}\atopwithdelims ()2}/{n\atopwithdelims ()2}}{\frac{1}{2}\left[ \sum _{i=1}^k{n_{i.}\atopwithdelims ()2}+\sum _{j=1}^r{n_{.j}\atopwithdelims ()2}\right] -\sum _{i=1}^k{n_{i.}\atopwithdelims ()2}\sum _{j=1}^r{n_{.j}\atopwithdelims ()2}/{n\atopwithdelims ()2}}. \end{aligned}$$

It is the normalized difference of the Rand Index and its expected value under the null hypothesis (Wagner and Wagner 2007).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Majstorović, S., Sabo, K., Jung, J. et al. Spectral methods for growth curve clustering. Cent Eur J Oper Res 26, 715–737 (2018). https://doi.org/10.1007/s10100-017-0515-6

Download citation

Published: 03 January 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10100-017-0515-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

	\(V_1\)	\(V_2\)	\(\ldots \)	\(V_r\)	Marginal sums
\(U_1\)	\(n_{11}\)	\(n_{12}\)	\(\ldots \)	\(n_{1r}\)	\(n_{1.}\)
\(U_2\)	\(n_{21}\)	\(n_{22}\)	\(\ldots \)	\(n_{2r}\)	\(n_{2.}\)
\(\vdots \)	\(\vdots \)	\(\vdots \)	\(\ddots \)	\(\vdots \)	\(\vdots \)
\(U_k\)	\(n_{k1}\)	\(n_{k2}\)	\(\ldots \)	\(n_{kr}\)	\(n_{k.}\)
Marginal sums	\(n_{.1}\)	\(n_{.2}\)	\(\ldots \)	\(n_{.r}\)	n

Spectral methods for growth curve clustering

Abstract

Access this article

Similar content being viewed by others

A Selective Overview of Recent Advances in Spectral Clustering and Their Applications

Growth Mixture Modeling with Measurement Selection

Combinatorial Optimization Approaches for Data Clustering

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Adjusted rand index

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectral methods for growth curve clustering

Abstract

Access this article

Similar content being viewed by others

A Selective Overview of Recent Advances in Spectral Clustering and Their Applications

Growth Mixture Modeling with Measurement Selection

Combinatorial Optimization Approaches for Data Clustering

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Adjusted rand index

Appendix A: Adjusted rand index

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation