Skip to main content
Log in

Spectral methods for growth curve clustering

  • Original Paper
  • Published:
Central European Journal of Operations Research Aims and scope Submit manuscript

Abstract

The growth curve clustering problem is analyzed and its connection with the spectral relaxation method is described. For a given set of growth curves and similarity function, a similarity matrix is defined, from which the corresponding similarity graph is constructed. It is shown that a nearly optimal growth curve partition can be obtained from the eigendecomposition of a specific matrix associated with a similarity graph. The results are illustrated and analyzed on the set of synthetically generated growth curves. One real-world problem is also given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell

    Book  Google Scholar 

  • Bolla M (2011) Penalized versions of the Newman-Girvan modularity and their relation to normalized cuts and \(k\)-means clustering. Phys Rev E 84:016108

    Article  Google Scholar 

  • Chamroukhi F (2016) Piecewise regression mixture for simultaneous functional data clustering and optimal segmentation. J Classif 33:374–411

    Article  Google Scholar 

  • Ćwik M, Józefczyk J (2017) Heuristic algorithms for the minmax regret flow-shop problem with interval processing times. Cent Eur J Oper Res. https://doi.org/10.1007/s10100-017-0485-8

  • Diestel R (2000) Graph theory. Electronic edition 2000. Springer, New York

    Google Scholar 

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, Hoboken

    Google Scholar 

  • Gaffney S (2004) Probabilistic curve-aligned clustering and prediction with mixture models. Ph.D. thesis, Department of Computer Science, University of California, Irvine, USA

  • Gan G, Wu J, Ma C (2007) Data clustering: theory, algorithms, and applications. SIAM, Philadelphia

    Book  Google Scholar 

  • Golub GH, Van Loan CF (1996) Matrix computations. Johns Hopkins University Press, Baltimore, p 320

    Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  Google Scholar 

  • Jacques J, Preda C (2014) Functional data clustering: a survey. Adv Data Anal Classif 8:231–255

    Article  Google Scholar 

  • Jain AK (2010) Data clustering: 50 years beyond \(k\)-means. Pattern Recognit Lett 31:651–666

    Article  Google Scholar 

  • Janković M, Leko A, Šuvak N (2016) Application of lactation models on dairy cow farms. Croat Oper Res Rev 7:217–227

    Article  Google Scholar 

  • Jukić D, Scitovski R (2003) Solution of the least-squares problem for logistic function. J Comput Appl Math 156:159–177

    Article  Google Scholar 

  • Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, Cambridge

    Google Scholar 

  • Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416

    Article  Google Scholar 

  • Majstorović S, Stevanović D (2014) A note on graphs whose largest eigenvalues of the modularity matrix equals zero. Electron J Linear Algebra 27:611–618

    Google Scholar 

  • Marošević T, Sabo K, Taler P (2013) A mathematical model for uniform distribution of voters per constituencies. Croat Oper Res Rev 4:63–64

    Google Scholar 

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113

    Article  Google Scholar 

  • Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856

    Google Scholar 

  • Park J, Ahn J (2017) Clustering multivariate functional data with phase variation. Biometrics 73:324–333

    Article  Google Scholar 

  • Ratkowsky DA (1990) Handbook of nonlinear regression models. Marcel Dekker, New York

    Google Scholar 

  • Sabo K (2014) Center-based \(l_1\)-clustering method. Int J Appl Math Comput Sci 24:151–163

    Article  Google Scholar 

  • Sangalli LM, Secchi P, Vantini S, Veneziani A (2009) A case study in exploratory functional data analysis: geometrical features of the internal carotid artery. J Am Stat Assoc 104:37–48

    Article  Google Scholar 

  • Sangalli LM, Secchi P, Vantini S, Vitelli V (2010) \(K\)-mean alignment for curve clustering. Comput Stat Data Anal 54:1219–1233

    Article  Google Scholar 

  • Scitovski R, Scitovski S (2013) A fast partitioning algorithm and its application to earthquake investigation. Comput Geosci 59:124–131

    Article  Google Scholar 

  • Su Z, Kogan J, Nicholas C (2010) Constrained clustering with \(k\)-means type algorithms. In: Berry MW, Kogan J (eds) Text mining applications and theory. Wiley, Chichester, pp 81–103

    Chapter  Google Scholar 

  • Tarpey T, Kinateder KKJ (2003) Clustering functional data. J Classif 20:93–114

    Article  Google Scholar 

  • Turkalj Ž, Markulak D, Singer S, Scitovski R (2016) Research project grouping and ranking by using adaptive Mahalanobis clustering. Croat Oper Res Rev 7:81–96

    Article  Google Scholar 

  • Vincek D, Kralik G, Kušec G, Sabo K, Scitovski R (2012a) Application of growth functions in the prediction of live weight of domestic animals. Cent Eur J Oper Res 20:719–733

    Article  Google Scholar 

  • Vincek D, Sabo K, Kušec G, Kralik G, Djurkin I, Scitovski R (2012b) Modeling of pig growth by S-function—least absolute deviation approach for parameter estimation. Archiv für Tierzucht 55:364–374

    Google Scholar 

  • Wagner S, Wagner D (2007) Comparing clusterings - an overview. https://publikationen.bibliothek.kit.edu/1000011477/812079. Accessed 14 Sept 2016

  • Zhang Z, Pati D, Srivastava A (2015) Bayesian clustering of shapes of curves. J Stat Plan Inference 166:171–186

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snježana Majstorović.

Appendix A: Adjusted rand index

Appendix A: Adjusted rand index

Let S be the set of n data items, \(S=\{x_1,x_2,\ldots ,x_n\}\), and let \(U=\{U_1,U_2,\ldots ,U_k\}\) and \(V=\{V_1,V_2,\ldots ,V_r\}\) be two partitions of S. The measure of similarity between partitions U and V is based on counting the pairs of data items that are in the same/different partition sets in U and V. Each pair \((x_i,x_j)\) of data items is classified into one of four groups based on their comembership in U and V, which results in the pair-counts derived using the contingency table. The contingency table is a \(k\times r\) matrix of all possible overlaps between each pair of clusters U and V, where its ijth element shows the intersection of cluster \(U_i\) and \(V_j\), that is, \(n_{ij}=|U_i\cap V_j|\).

 

\(V_1\)

\(V_2\)

\(\ldots \)

\(V_r\)

Marginal sums

\(U_1\)

\(n_{11}\)

\(n_{12}\)

\(\ldots \)

\(n_{1r}\)

\(n_{1.}\)

\(U_2\)

\(n_{21}\)

\(n_{22}\)

\(\ldots \)

\(n_{2r}\)

\(n_{2.}\)

\(\vdots \)

\(\vdots \)

\(\vdots \)

\(\ddots \)

\(\vdots \)

\(\vdots \)

\(U_k\)

\(n_{k1}\)

\(n_{k2}\)

\(\ldots \)

\(n_{kr}\)

\(n_{k.}\)

Marginal sums

\(n_{.1}\)

\(n_{.2}\)

\(\ldots \)

\(n_{.r}\)

n

In the case of disjoint clusters we have \(n_{i.}=|U_i|\) and \(n_{.j}=|V_j|\). Let us define the following numbers:

  • a—the number of pairs of elements in S that are in the same set in U and in the same set in V;

  • b—the number of pairs of elements in S that are in different sets in U and in different sets in V;

  • c—the number of pairs of elements in S that are in the same set in U and in different sets in V;

  • d—the number of pairs of elements in S that are in different sets in U and in the same set in V.

The Rand index is a measure of similarity between U and V defined as

$$\begin{aligned} RI=\frac{a+b}{a+b+c+d}=\frac{a+b}{{n\atopwithdelims ()2}}. \end{aligned}$$

The numbers abcd can be easily obtained by using simple combinatorial methods.

The Rand index has a fixed range of [0, 1]. A problem with this type of similarity measure is that the expected value of this index of two random partitions does not take a constant value (say zero). The adjusted Rand index ARI is proposed in Hubert and Arabie (1985) assuming that the contingency table is constructed randomly when the size of the clusters in U and V is fixed. The ARI is calculated based on an upper bound 1 on the RI and its expected value

$$\begin{aligned} ARI=\frac{\sum _{i=1}^k\sum _{j=1}^r{n_{ij}\atopwithdelims ()2}-\sum _{i=1}^k{n_{i.}\atopwithdelims ()2}\sum _{j=1}^r{n_{.j}\atopwithdelims ()2}/{n\atopwithdelims ()2}}{\frac{1}{2}\left[ \sum _{i=1}^k{n_{i.}\atopwithdelims ()2}+\sum _{j=1}^r{n_{.j}\atopwithdelims ()2}\right] -\sum _{i=1}^k{n_{i.}\atopwithdelims ()2}\sum _{j=1}^r{n_{.j}\atopwithdelims ()2}/{n\atopwithdelims ()2}}. \end{aligned}$$

It is the normalized difference of the Rand Index and its expected value under the null hypothesis (Wagner and Wagner 2007).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Majstorović, S., Sabo, K., Jung, J. et al. Spectral methods for growth curve clustering. Cent Eur J Oper Res 26, 715–737 (2018). https://doi.org/10.1007/s10100-017-0515-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10100-017-0515-6

Keywords

Navigation