Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets

Ciampi, Antonio; Lechevallier, Yves; Limas, Manuel Castejón; Marcos, Ana González

doi:10.1007/s10044-007-0088-4

Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets

Theoretical Advances
Published: 30 October 2007

Volume 11, pages 199–220, (2008)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Antonio Ciampi¹,
Yves Lechevallier²,
Manuel Castejón Limas³ &
…
Ana González Marcos³

268 Accesses
6 Citations
Explore all metrics

Abstract

The problem of clustering subpopulations on the basis of samples is considered within a statistical framework: a distribution for the variables is assumed for each subpopulation and the dissimilarity between any two populations is defined as the likelihood ratio statistic which compares the hypothesis that the two subpopulations differ in the parameter of their distributions to the hypothesis that they do not. A general algorithm for the construction of a hierarchical classification is described which has the important property of not having inversions in the dendrogram. The essential elements of the algorithm are specified for the case of well-known distributions (normal, multinomial and Poisson) and an outline of the general parametric case is also discussed. Several applications are discussed, the main one being a novel approach to dealing with massive data in the context of a two-step approach. After clustering the data in a reasonable number of ‘bins’ by a fast algorithm such as k-Means, we apply a version of our algorithm to the resulting bins. Multivariate normality for the means calculated on each bin is assumed: this is justified by the central limit theorem and the assumption that each bin contains a large number of units, an assumption generally justified when dealing with truly massive data such as currently found in modern data analysis. However, no assumption is made about the data generating distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Means Clustering

Article Open access 23 September 2022

Supervised Nested Algorithm for Classification Based on K-Means

Hierarchical Clustering for Large Data Sets

Notes

The proof of this and the other propositions of this paper are given in Sect. 7.

References

Conversano C (2002) Bagged mixtures of classifiers using model scoring criteria. Pattern Anal Appl 5(4):351–362
Article MathSciNet Google Scholar
Barandela R, Valdovinos RM, Sánchez JS (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256
Article MathSciNet Google Scholar
Yang J, Ye H, Zhang D (2004) A new lda-kl combined method for feature extraction and its generalisation. Pattern Anal Appl 7(1):40–50. doi:10.1007/s10044-004-0205-6
MathSciNet Google Scholar
Masip D, Kuncheva LI, Vitria J (2005) An ensemble-based method for linear feature extraction for two-class problems. Pattern Anal Appl 227–237
Chou C-H, Su M-C (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220. doi:10.1007/s10044-004-0218-1
MathSciNet Google Scholar
Gyllenberg M, Koski T, Lund T, Nevalainen O (2000) Clustering by adaptive local search with multiple search operators. Pattern Anal Appl 3(4):348–357. doi:10.1007/s100440070006
Article MathSciNet Google Scholar
Omran MGH, Salman A, Engelbrech AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 34:332–344. doi:10.1007/978-3-540-34956-3_6
Google Scholar
Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of gaussian models. Pattern Anal Appl 8(1–2):32–49. doi:10.1007/s10044-005-0240-y
MathSciNet Google Scholar
Fränti P, Kivijärvi J (2000) Randomised local search algorithm for the clustering problem. Pattern Anal Appl 3:358–369
Article Google Scholar
Calinski T, Corsten LCA (1985) Clustering means in anova by simultaneous testing. Biometrics 41:39–48
Article Google Scholar
Gabriel KR (1964) A procedure for testing the homogeneity of all sets of means in analysis of variance. Biometrics 20(3):459–477. doi: http://dx.doi.org/10.2307/2528488
Gabriel KR (1969) Simultaneous test procedures—some theory of multiple comparisons. Ann Math Stat 40(1):224–250
Article MathSciNet MATH Google Scholar
Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27:387–397
Article Google Scholar
Banfield JD, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49:803–821
Article MathSciNet MATH Google Scholar
Bensmail H, Celeux G, Raftery AE, Robert CP (1997) Inference in model-based cluster analysis. Stat Comput 7(1):1–10. doi: http://dx.doi.org/10.1023/A:1018510926151
Google Scholar
Fraley C (1998) Algorithms for model-based gaussian hierarchical clustering. SIAM J Sci Comput 20:270–281
Article MathSciNet MATH Google Scholar
Fraley C, Raftery AE (1999) Mclust: software for model-based cluster analysis. J Classification 16:297–306
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc 39(1):1–38
MathSciNet MATH Google Scholar
Gilles Celeux, Gérard Govaert (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332, ISSN 0167-9473. doi:http://dx.doi.org/10.1016/0167-9473(92)90042-E
Maitra Ranjan (1998) Clustering massive datasets. Technical report, http://www.math.umbc.edu/∼maitra/papers/jsm98.ps
Murtagh F (2002) Clustering in massive data sets, chap 14. Kluwer, Dordrecht, pp 401–545
Kaufman L, Rousseeuw PJ (1990) Finding groups in data. An introduction to cluster analysis. Wiley series in probability and mathematical statistics. Applied probability and statistics. Wiley, New York
Google Scholar
Barndorff-Nielsen OE (1978) Information and exponential families in statistical theory. Wiley, Chichester
MATH Google Scholar
Cox DR, Hinckley DV (1974) Theoretical statistics. Chapman and Hall, London
MATH Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London
Google Scholar
Bradley PS, Fayyad UM (1998) Refining initial points for k-Means clustering. In: Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 91–99
Bradley PS, Fayyad U, Reina C (1998) Scaling cluster algorithms to large databases. In: Proceedings of the 4th international conference on knowledge discovery and data mining. American Association for Artificial Intelligence Press, Menlo Park, pp 9–15
Good P (2005) Permutation, parametric, and bootstrap tests of hypotheses. A practical guide to resampling methods for testing hypotheses. Springer, Heidelberg
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1997) Birch: A new data clustering algorithm and its applications. Data Mining and Knowl Discov 1(2):141–182
Article Google Scholar
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Bocca J, Jarke M, Zaniolo C (eds) 20th International Conference on Very Large Data Bases, 12–15 September 1994, Santiago, Chile proceedings. Morgan Kaufmann Publishers, Los Altos, CA 94022, pp 144–155
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 94–105
Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 73–84
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding of the 2nd international conference on knowledge discovery and data mining, Portland, pp 226–231
Ester M, Kriegel H-P, Sander J, Wimmer M, Xu X (1998) Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th international conference on very large data bases, New York City, NY, pp 323–333
Nittel S, Leung KT, Braverman A (2004) Scaling clustering algorithms for massive data sets using data streams. In: Proceedings of the 20th international conference on eata engineering (ICDE’04)
Guha S, Rastogi R, Shim K (1999) Rock: a robust clustering algorithm for categorial attributes. In: Proceeding of the 15th international conference on data engineering, Sydney, Australia
Hartigan JA (1975) Clustering algorithms. Wiley, New York
MATH Google Scholar
Chris Fraley, Adrian E Raftery (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588. http://citeseer.nj.nec.com/article/fraley98how.html
Fayyad U, Smyth P (1996) From massive datasets to science catalogs: Applications and challenges. http://www.citeseer.ist.psu.edu/fayyad95from.html
Gordon AD (1986) Links between clustering and assignment procedures. In: Proceedings of computational statistics, pp 149–156. doi: http://dx.doi.org/10.1016/0167-9473(92)90042-E
Cutting DR, Pedersen JO, Karger D, Tukey JW (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, pp 318–329. http://citeseer.ist.psu.edu/cutting92scattergather.html
Tantrum J, Murua A, Stuetzle W (2002) Hierarchical model-based clustering of large datasets through fractionation and refractionation. http://citeseer.nj.nec.com/572414.html
SAS Institute Inc (2004) SAS/STAT^® 9.1 User’s Guide. SAS Institute Inc, Cary. ISBN 1-59047-243-8
Ciampi A, Lechevallier Y (2000) Clustering large, multi-level data sets: an approach based on kohonen self organizing maps. In: Principles of data mining and knowledge discovery: 4th european conference, PKDD 2000, Lyon, France. Proceedings, volume 1910/2000 of Lecture Notes in Computer Science. Springer, Heidelberg, pp 353–358
Lechevallier Y, Ciampi A (2007) Multilevel clustering for large databases. In: Auget J-L, Balakrishnan N, Mesbah M, Molenberghs G (eds) Advances in statistical methods for the health sciences, statistics for industry and technology, chap 10. Applied Probality and Statistics, Springer edition, pp 263–274
Posse C (2001) Hierarchical model-based clustering for large datasets. J Comput Graph Stat 10:464–486
Article MathSciNet Google Scholar
Guo H, Renaut R, Chen K, Reiman E (2003) Clustering huge data sets for parametric pet imaging. Biosystems 71:81–92
Article Google Scholar
Frigui H (2004) Fuzzy information. Processing NAFIPS’04. IEEE annual meeting, 27–30 June 2004, vol 2, pp 967–972. doi:10.1109/NAFIPS.2004.1337437
Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78:553–569
Article MATH Google Scholar
Meila M (2002) Comparing clusterings. Technical Report 418, Department of Statistics, University of Washington
Singh S (1998) 2d spiral pattern recognition with possibilistic measures. Pattern Recogn Lett 19(2):141–147, ISSN 0167-8655. doi: http://dx.doi.org/10.1016/S0167-8655(97)00163-3

Download references

Acknowledgements

We wish to thank Dr. F. Clavel for having shared the EPIC French data with us and Dr. E. Riboli for general assistance in becoming familiar with EPIC. The authors gratefully acknowledge the hospitality of the Department of Epidemiology and Statistics members during M. Castejón and A. González visits to McGill University. M. Castejón also thanks the hospitality of INRIA members during his visit. We gratefully acknowledge support from the Ministerio de Educación y Ciencia de España, Dirección General de Investigación, by means of the DPI2006-14784 and DPI2007-61090 research projects.

Author information

Authors and Affiliations

Department of Epidemiology and Biostatistics, McGill University, Montreal, P.Q., Canada
Antonio Ciampi
INRIA—Rocquencourt, 87153, Le Chesnay Cedex, France
Yves Lechevallier
Department of Mechanical, Informatical and Aerospace Engineering, Universidad de León, 24007, León, Spain
Manuel Castejón Limas & Ana González Marcos

Authors

Antonio Ciampi
View author publications
You can also search for this author in PubMed Google Scholar
Yves Lechevallier
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Castejón Limas
View author publications
You can also search for this author in PubMed Google Scholar
Ana González Marcos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Ciampi.

Appendix: Proofs

Proof of Proposition 1

Since d(A _i, A _j) > 0 we have that f(A _r) > 0, r = 1, 2, ..., M. Thus f(A _i ∪ A _j ) > f(A _i) + f(A _j) and obviously f(A _i ∪ A _j ) > max {f(A _i), f(A _j)}

Proof of Proposition 2

$$ d\left(B_s,B_t\right)=\left({\overline{x}}_s-{\overline{x}}_{s \cup t}\right)^{\rm T} V_{s}^{-1} \left({\overline{x}}_s-{\overline{x}}_{s \cup t}\right)+ \left({\overline{x}}_t-{\overline{x}}_{s \cup t}\right)^{\rm T} V_{t}^{-1} \left({\overline{x}}_t-{\overline{x}}_{s \cup t}\right) $$

(41)

Now:

$$ {\overline{x}}_s-{\overline{x}}_{s \cup t} = \left(V_{s}^{-1}+V_{t}^{-1}\right)^{-1} \left[\left(V_{s}^{-1}+V_{t}^{-1}\right){\overline{x}}_s - V_{s}^{-1} {\overline{x}}_s - V_{t}^{-1} {\overline{x}}_t \right] $$

(42)

$$ {\overline{x}}_s-{\overline{x}}_{s \cup t} = \left(V_{s}^{-1}+V_{t}^{-1}\right)^{-1} V_t^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right) $$

(43)

$$ {\overline{x}}_t-{\overline{x}}_{s \cup t} = \left(V_{s}^{-1}+V_{t}^{-1}\right)^{-1} V_s^{-1} \left({\overline{x}}_t-{\overline{x}}_s\right) $$

(44)

Therefore:

$$ d\left(B_s,B_t\right) = \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} V_t^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_s^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_t^{-1}\left({\overline{x}}_s- {\overline{x}}_t\right) + \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} V_s^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_t^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_s^{-1}\left({\overline{x}}_s- {\overline{x}}_t\right)= \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s+V_t\right)^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1} V_t^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right) + \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s+V_t\right)^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1} V_s^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right)= \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T}\left(V_s+V_t\right)^{- 1}\left(V_s^{-1}+V_t^{-1}\right)\left(V_s^{-1}+V_t^{-1}\right)^{- 1}\left({\overline{x}}_s-{\overline{x}}_t\right) = \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s+V_t\right)^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right) $$

(45)

where we have used

$$ V_s^{-1}\left(V_s^{-1}+V_t^{-1}\right)^{-1}V_t^{-1}= \left(V_s+V_t\right)^{-1} $$

(46)

Notice also that we can write

$$ d\left(B_s,B_t\right)=\left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s^{-1}+V_t^{-1}\right)^{-1} V_s^{-1}V_t^{-1}\left({\overline{x}}_s-{\overline{x}}_t\right) $$

(47)

owing to the conmutativity of symmetric matrices.

Proof of Proposition 3

$$ d\left(A_i, A_j \right) = \sum_{r \in A_i \cup A_j} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) - \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) $$

(48)

$$ d\left(A_i, A_j \right) = \sum_{i=1}^2 \left\{\sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) - \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)\right\} $$

(49)

Let us develop the first term within the curly bracket above:

$$ \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1}\left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} + {\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} + {\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) + \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) + 2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) $$

(50)

Now notice that the last term is zero. This follows from:

$$ \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} \sum_{r \in A_i} V_r^{-1} \left({\overline{x}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) $$

(51)

and

$$ \sum_{r \in A_i} V_r^{-1}\left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)= \sum_{r \in A_i} V_r^{-1} {\overline{{\mathbf{x}}}}_{r} - \sum_{r \in A_i} V_r^{-1} {\overline{{\mathbf{x}}}}_{A_i}=V_{A_i}^{-1} {\overline{{\mathbf{x}}}}_{A_i} - \left(V_{A_i} \right)^{-1} {\overline{{\mathbf{x}}}}_{A_i} = 0 $$

(52)

since by definition

$$ V_{A_i}^{-1}= \sum_{r \in A_i} V_r^{-1}, \quad {\overline{{\mathbf{x}}}}_{A_i}=V_{A_i} \sum_{r \in A_i} V_r^{-1} {\overline{{\mathbf{x}}}}_{r} $$

(53)

We now have,

$$ d\left(A_i, A_j \right) = \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) + \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) - \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) = \sum_{i=1}^2 \left\{\sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) \right\} $$

(54)

and we can proceed as in proposition 2. We write to avoid confusion

$$ {{\mathcal{U}}}_r= V_r^{-1}, {{\mathcal{U}}}_{A_i}=\sum_{r \in A_i} V_r^{-1} = \sum_{r \in A_i} {{\mathcal{U}}}_r, {{\mathcal{U}}}_{A_i \cup A_j}={{\mathcal{U}}}_{A_i} + {{\mathcal{U}}}_{A_j} $$

whence

$$ {\overline{{\mathbf{x}}}}_{A_i \cup A_j} = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i} + {{\mathcal{U}}}_{A_j} {\overline{{\mathbf{x}}}}_{A_j}\right] $$

(55)

and therefore

$$ {\overline{{\mathbf{x}}}}_{A_i}-{\overline{{\mathbf{x}}}}_{A_i \cup A_j} = {{\mathcal{U}}}_{A_i}^{-1} \sum_{r \in A_i} {{\mathcal{U}}}_r {\overline{{\mathbf{x}}}}_{r} - {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i}{\overline{{\mathbf{x}}}}_{A_i}+{{\mathcal{U}}}_{A_ j}{\overline{{\mathbf{x}}}}_{A_j}\right] = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i \cup A_j} {\overline{{\mathbf{x}}}}_{A_i} - {{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i} - {{\mathcal{U}}}_{A_j} {\overline{{\mathbf{x}}}}_{A_j} \right] = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i}+ {{\mathcal{U}}}_{A_j}{\overline{{\mathbf{x}}}}_{A1} - {{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i} - {{\mathcal{U}}}_{A_j} {\overline{{\mathbf{x}}}}_{A_j} \right] = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_j} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) $$

(56)

and similarly

$$ {\overline{{\mathbf{x}}}}_{A_j} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j}= {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} \left({\overline{{\mathbf{x}}}}_{A_j} - {\overline{{\mathbf{x}}}}_{A_i} \right) $$

(57)

Substituting in the expression for d(A _i, A _j ) above (Eq. 54, p. 36):

$$ d\left(A_i, A_j \right) = \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) + \sum_{r \in A_j} \left({\overline{{\mathbf{x}}}}_{A_j} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left(\sum_{r \in A_i} V_r^{-1} \right) {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A2}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) + \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left(\sum_{r \in A_j} V_r^{-1} \right) {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)= \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) + \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)= \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j} \left({{\mathcal{U}}}_{A_i} + {{\mathcal{U}}}_{A_j} \right) {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)= \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) $$

(58)

where we have used the commutativity of symmetric matrices, the definition ${{\mathcal{U}}}_{A_i}=\sum_{r \in A_i} V_r^{-1},$ and ${{\mathcal{U}}}_{A_i \cup A_j}={{\mathcal{U}}}_{A_i} + {{\mathcal{U}}}_{A_j}.$ Finally noting that:

$$ {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j}= \left(V_{A_i}^{-1} + V_{A_j}^{-1} \right)^{-1} V_{A_i}^{-1} V_{A_j}^{-1}=\left(V_{A_i} + V_{A_j} \right)^{-1} $$

(59)

we have

$$ d\left(A_i,A_j\right)=\left({\overline{{\mathbf{x}}}}_{A_i}- {\overline{{\mathbf{x}}}}_{A_j}\right)^{\rm T} \left(V_{A_i} + V_{A_j} \right)^{-1} \left({\overline{{\mathbf{x}}}}_{A_i}-{\overline{{\mathbf{x}}}}_{A_j}\right) $$

(60)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ciampi, A., Lechevallier, Y., Limas, M.C. et al. Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets. Pattern Anal Applic 11, 199–220 (2008). https://doi.org/10.1007/s10044-007-0088-4

Download citation

Received: 27 June 2006
Accepted: 09 April 2007
Published: 30 October 2007
Issue Date: June 2008
DOI: https://doi.org/10.1007/s10044-007-0088-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets

Abstract

Access this article

Similar content being viewed by others

Hierarchical Means Clustering

Supervised Nested Algorithm for Classification Based on K-Means

Hierarchical Clustering for Large Data Sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Proof of Proposition 1

Proof of Proposition 2

Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets

Abstract

Access this article

Similar content being viewed by others

Hierarchical Means Clustering

Supervised Nested Algorithm for Classification Based on K-Means

Hierarchical Clustering for Large Data Sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Appendix: Proofs

Proof of Proposition 1

Proof of Proposition 2

Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation