Skip to main content
Log in

Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The problem of clustering subpopulations on the basis of samples is considered within a statistical framework: a distribution for the variables is assumed for each subpopulation and the dissimilarity between any two populations is defined as the likelihood ratio statistic which compares the hypothesis that the two subpopulations differ in the parameter of their distributions to the hypothesis that they do not. A general algorithm for the construction of a hierarchical classification is described which has the important property of not having inversions in the dendrogram. The essential elements of the algorithm are specified for the case of well-known distributions (normal, multinomial and Poisson) and an outline of the general parametric case is also discussed. Several applications are discussed, the main one being a novel approach to dealing with massive data in the context of a two-step approach. After clustering the data in a reasonable number of ‘bins’ by a fast algorithm such as k-Means, we apply a version of our algorithm to the resulting bins. Multivariate normality for the means calculated on each bin is assumed: this is justified by the central limit theorem and the assumption that each bin contains a large number of units, an assumption generally justified when dealing with truly massive data such as currently found in modern data analysis. However, no assumption is made about the data generating distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. The proof of this and the other propositions of this paper are given in Sect. 7.

References

  1. Conversano C (2002) Bagged mixtures of classifiers using model scoring criteria. Pattern Anal Appl 5(4):351–362

    Article  MathSciNet  Google Scholar 

  2. Barandela R, Valdovinos RM, Sánchez JS (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256

    Article  MathSciNet  Google Scholar 

  3. Yang J, Ye H, Zhang D (2004) A new lda-kl combined method for feature extraction and its generalisation. Pattern Anal Appl 7(1):40–50. doi:10.1007/s10044-004-0205-6

    MathSciNet  Google Scholar 

  4. Masip D, Kuncheva LI, Vitria J (2005) An ensemble-based method for linear feature extraction for two-class problems. Pattern Anal Appl 227–237

  5. Chou C-H, Su M-C (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7(2):205–220. doi:10.1007/s10044-004-0218-1

    MathSciNet  Google Scholar 

  6. Gyllenberg M, Koski T, Lund T, Nevalainen O (2000) Clustering by adaptive local search with multiple search operators. Pattern Anal Appl 3(4):348–357. doi:10.1007/s100440070006

    Article  MathSciNet  Google Scholar 

  7. Omran MGH, Salman A, Engelbrech AP (2006) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 34:332–344. doi:10.1007/978-3-540-34956-3_6

    Google Scholar 

  8. Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of gaussian models. Pattern Anal Appl 8(1–2):32–49. doi:10.1007/s10044-005-0240-y

    MathSciNet  Google Scholar 

  9. Fränti P, Kivijärvi J (2000) Randomised local search algorithm for the clustering problem. Pattern Anal Appl 3:358–369

    Article  Google Scholar 

  10. Calinski T, Corsten LCA (1985) Clustering means in anova by simultaneous testing. Biometrics 41:39–48

    Article  Google Scholar 

  11. Gabriel KR (1964) A procedure for testing the homogeneity of all sets of means in analysis of variance. Biometrics 20(3):459–477. doi: http://dx.doi.org/10.2307/2528488

  12. Gabriel KR (1969) Simultaneous test procedures—some theory of multiple comparisons. Ann Math Stat 40(1):224–250

    Article  MathSciNet  MATH  Google Scholar 

  13. Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27:387–397

    Article  Google Scholar 

  14. Banfield JD, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49:803–821

    Article  MathSciNet  MATH  Google Scholar 

  15. Bensmail H, Celeux G, Raftery AE, Robert CP (1997) Inference in model-based cluster analysis. Stat Comput 7(1):1–10. doi: http://dx.doi.org/10.1023/A:1018510926151

    Google Scholar 

  16. Fraley C (1998) Algorithms for model-based gaussian hierarchical clustering. SIAM J Sci Comput 20:270–281

    Article  MathSciNet  MATH  Google Scholar 

  17. Fraley C, Raftery AE (1999) Mclust: software for model-based cluster analysis. J Classification 16:297–306

    Google Scholar 

  18. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  19. Gilles Celeux, Gérard Govaert (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332, ISSN 0167-9473. doi:http://dx.doi.org/10.1016/0167-9473(92)90042-E

  20. Maitra Ranjan (1998) Clustering massive datasets. Technical report, http://www.math.umbc.edu/∼maitra/papers/jsm98.ps

  21. Murtagh F (2002) Clustering in massive data sets, chap 14. Kluwer, Dordrecht, pp 401–545

  22. Kaufman L, Rousseeuw PJ (1990) Finding groups in data. An introduction to cluster analysis. Wiley series in probability and mathematical statistics. Applied probability and statistics. Wiley, New York

    Google Scholar 

  23. Barndorff-Nielsen OE (1978) Information and exponential families in statistical theory. Wiley, Chichester

    MATH  Google Scholar 

  24. Cox DR, Hinckley DV (1974) Theoretical statistics. Chapman and Hall, London

    MATH  Google Scholar 

  25. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London

    Google Scholar 

  26. Bradley PS, Fayyad UM (1998) Refining initial points for k-Means clustering. In: Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 91–99

  27. Bradley PS, Fayyad U, Reina C (1998) Scaling cluster algorithms to large databases. In: Proceedings of the 4th international conference on knowledge discovery and data mining. American Association for Artificial Intelligence Press, Menlo Park, pp 9–15

  28. Good P (2005) Permutation, parametric, and bootstrap tests of hypotheses. A practical guide to resampling methods for testing hypotheses. Springer, Heidelberg

    Google Scholar 

  29. Zhang T, Ramakrishnan R, Livny M (1997) Birch: A new data clustering algorithm and its applications. Data Mining and Knowl Discov 1(2):141–182

    Article  Google Scholar 

  30. Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Bocca J, Jarke M, Zaniolo C (eds) 20th International Conference on Very Large Data Bases, 12–15 September 1994, Santiago, Chile proceedings. Morgan Kaufmann Publishers, Los Altos, CA 94022, pp 144–155

  31. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 94–105

  32. Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD international conference on management of data. ACM, New York, pp 73–84

  33. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding of the 2nd international conference on knowledge discovery and data mining, Portland, pp 226–231

  34. Ester M, Kriegel H-P, Sander J, Wimmer M, Xu X (1998) Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th international conference on very large data bases, New York City, NY, pp 323–333

  35. Nittel S, Leung KT, Braverman A (2004) Scaling clustering algorithms for massive data sets using data streams. In: Proceedings of the 20th international conference on eata engineering (ICDE’04)

  36. Guha S, Rastogi R, Shim K (1999) Rock: a robust clustering algorithm for categorial attributes. In: Proceeding of the 15th international conference on data engineering, Sydney, Australia

  37. Hartigan JA (1975) Clustering algorithms. Wiley, New York

    MATH  Google Scholar 

  38. Chris Fraley, Adrian E Raftery (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588. http://citeseer.nj.nec.com/article/fraley98how.html

  39. Fayyad U, Smyth P (1996) From massive datasets to science catalogs: Applications and challenges. http://www.citeseer.ist.psu.edu/fayyad95from.html

  40. Gordon AD (1986) Links between clustering and assignment procedures. In: Proceedings of computational statistics, pp 149–156. doi: http://dx.doi.org/10.1016/0167-9473(92)90042-E

  41. Cutting DR, Pedersen JO, Karger D, Tukey JW (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, pp 318–329. http://citeseer.ist.psu.edu/cutting92scattergather.html

  42. Tantrum J, Murua A, Stuetzle W (2002) Hierarchical model-based clustering of large datasets through fractionation and refractionation. http://citeseer.nj.nec.com/572414.html

  43. SAS Institute Inc (2004) SAS/STAT® 9.1 User’s Guide. SAS Institute Inc, Cary. ISBN 1-59047-243-8

  44. Ciampi A, Lechevallier Y (2000) Clustering large, multi-level data sets: an approach based on kohonen self organizing maps. In: Principles of data mining and knowledge discovery: 4th european conference, PKDD 2000, Lyon, France. Proceedings, volume 1910/2000 of Lecture Notes in Computer Science. Springer, Heidelberg, pp 353–358

  45. Lechevallier Y, Ciampi A (2007) Multilevel clustering for large databases. In: Auget J-L, Balakrishnan N, Mesbah M, Molenberghs G (eds) Advances in statistical methods for the health sciences, statistics for industry and technology, chap 10. Applied Probality and Statistics, Springer edition, pp 263–274

  46. Posse C (2001) Hierarchical model-based clustering for large datasets. J Comput Graph Stat 10:464–486

    Article  MathSciNet  Google Scholar 

  47. Guo H, Renaut R, Chen K, Reiman E (2003) Clustering huge data sets for parametric pet imaging. Biosystems 71:81–92

    Article  Google Scholar 

  48. Frigui H (2004) Fuzzy information. Processing NAFIPS’04. IEEE annual meeting, 27–30 June 2004, vol 2, pp 967–972. doi:10.1109/NAFIPS.2004.1337437

  49. Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78:553–569

    Article  MATH  Google Scholar 

  50. Meila M (2002) Comparing clusterings. Technical Report 418, Department of Statistics, University of Washington

  51. Singh S (1998) 2d spiral pattern recognition with possibilistic measures. Pattern Recogn Lett 19(2):141–147, ISSN 0167-8655. doi: http://dx.doi.org/10.1016/S0167-8655(97)00163-3

Download references

Acknowledgements

We wish to thank Dr. F. Clavel for having shared the EPIC French data with us and Dr. E. Riboli for general assistance in becoming familiar with EPIC. The authors gratefully acknowledge the hospitality of the Department of Epidemiology and Statistics members during M. Castejón and A. González visits to McGill University. M. Castejón also thanks the hospitality of INRIA members during his visit. We gratefully acknowledge support from the Ministerio de Educación y Ciencia de España, Dirección General de Investigación, by means of the DPI2006-14784 and DPI2007-61090 research projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Ciampi.

Appendix: Proofs

Appendix: Proofs

Proof of Proposition 1

Since d(A i , A j ) >  0 we have that f(A r ) > 0, r = 1, 2, ..., M. Thus f(A i A j ) >  f(A i ) +  f(A j ) and obviously f(A i A j ) >  max {f(A i ), f(A j )}

Proof of Proposition 2

$$ d\left(B_s,B_t\right)=\left({\overline{x}}_s-{\overline{x}}_{s \cup t}\right)^{\rm T} V_{s}^{-1} \left({\overline{x}}_s-{\overline{x}}_{s \cup t}\right)+ \left({\overline{x}}_t-{\overline{x}}_{s \cup t}\right)^{\rm T} V_{t}^{-1} \left({\overline{x}}_t-{\overline{x}}_{s \cup t}\right) $$
(41)

Now:

$$ {\overline{x}}_s-{\overline{x}}_{s \cup t} = \left(V_{s}^{-1}+V_{t}^{-1}\right)^{-1} \left[\left(V_{s}^{-1}+V_{t}^{-1}\right){\overline{x}}_s - V_{s}^{-1} {\overline{x}}_s - V_{t}^{-1} {\overline{x}}_t \right] $$
(42)
$$ {\overline{x}}_s-{\overline{x}}_{s \cup t} = \left(V_{s}^{-1}+V_{t}^{-1}\right)^{-1} V_t^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right) $$
(43)
$$ {\overline{x}}_t-{\overline{x}}_{s \cup t} = \left(V_{s}^{-1}+V_{t}^{-1}\right)^{-1} V_s^{-1} \left({\overline{x}}_t-{\overline{x}}_s\right) $$
(44)

Therefore:

$$ d\left(B_s,B_t\right) = \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} V_t^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_s^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_t^{-1}\left({\overline{x}}_s- {\overline{x}}_t\right) + \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} V_s^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_t^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1}V_s^{-1}\left({\overline{x}}_s- {\overline{x}}_t\right)= \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s+V_t\right)^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1} V_t^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right) + \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s+V_t\right)^{-1} \left(V_s^{-1}+V_t^{-1}\right)^{-1} V_s^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right)= \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T}\left(V_s+V_t\right)^{- 1}\left(V_s^{-1}+V_t^{-1}\right)\left(V_s^{-1}+V_t^{-1}\right)^{- 1}\left({\overline{x}}_s-{\overline{x}}_t\right) = \left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s+V_t\right)^{-1} \left({\overline{x}}_s-{\overline{x}}_t\right) $$
(45)

where we have used

$$ V_s^{-1}\left(V_s^{-1}+V_t^{-1}\right)^{-1}V_t^{-1}= \left(V_s+V_t\right)^{-1} $$
(46)

Notice also that we can write

$$ d\left(B_s,B_t\right)=\left({\overline{x}}_s-{\overline{x}}_t\right)^{\rm T} \left(V_s^{-1}+V_t^{-1}\right)^{-1} V_s^{-1}V_t^{-1}\left({\overline{x}}_s-{\overline{x}}_t\right) $$
(47)

owing to the conmutativity of symmetric matrices.

Proof of Proposition 3

$$ d\left(A_i, A_j \right) = \sum_{r \in A_i \cup A_j} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) - \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) $$
(48)
$$ d\left(A_i, A_j \right) = \sum_{i=1}^2 \left\{\sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) - \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)\right\} $$
(49)

Let us develop the first term within the curly bracket above:

$$ \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1}\left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} + {\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} + {\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) + \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) + 2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) $$
(50)

Now notice that the last term is zero. This follows from:

$$ \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} \sum_{r \in A_i} V_r^{-1} \left({\overline{x}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) $$
(51)

and

$$ \sum_{r \in A_i} V_r^{-1}\left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)= \sum_{r \in A_i} V_r^{-1} {\overline{{\mathbf{x}}}}_{r} - \sum_{r \in A_i} V_r^{-1} {\overline{{\mathbf{x}}}}_{A_i}=V_{A_i}^{-1} {\overline{{\mathbf{x}}}}_{A_i} - \left(V_{A_i} \right)^{-1} {\overline{{\mathbf{x}}}}_{A_i} = 0 $$
(52)

since by definition

$$ V_{A_i}^{-1}= \sum_{r \in A_i} V_r^{-1}, \quad {\overline{{\mathbf{x}}}}_{A_i}=V_{A_i} \sum_{r \in A_i} V_r^{-1} {\overline{{\mathbf{x}}}}_{r} $$
(53)

We now have,

$$ d\left(A_i, A_j \right) = \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) + \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) - \sum_{i=1}^2 \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{r} - {\overline{{\mathbf{x}}}}_{A_i} \right) = \sum_{i=1}^2 \left\{\sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) \right\} $$
(54)

and we can proceed as in proposition 2. We write to avoid confusion

$$ {{\mathcal{U}}}_r= V_r^{-1}, {{\mathcal{U}}}_{A_i}=\sum_{r \in A_i} V_r^{-1} = \sum_{r \in A_i} {{\mathcal{U}}}_r, {{\mathcal{U}}}_{A_i \cup A_j}={{\mathcal{U}}}_{A_i} + {{\mathcal{U}}}_{A_j} $$

whence

$$ {\overline{{\mathbf{x}}}}_{A_i \cup A_j} = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i} + {{\mathcal{U}}}_{A_j} {\overline{{\mathbf{x}}}}_{A_j}\right] $$
(55)

and therefore

$$ {\overline{{\mathbf{x}}}}_{A_i}-{\overline{{\mathbf{x}}}}_{A_i \cup A_j} = {{\mathcal{U}}}_{A_i}^{-1} \sum_{r \in A_i} {{\mathcal{U}}}_r {\overline{{\mathbf{x}}}}_{r} - {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i}{\overline{{\mathbf{x}}}}_{A_i}+{{\mathcal{U}}}_{A_ j}{\overline{{\mathbf{x}}}}_{A_j}\right] = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i \cup A_j} {\overline{{\mathbf{x}}}}_{A_i} - {{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i} - {{\mathcal{U}}}_{A_j} {\overline{{\mathbf{x}}}}_{A_j} \right] = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left[{{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i}+ {{\mathcal{U}}}_{A_j}{\overline{{\mathbf{x}}}}_{A1} - {{\mathcal{U}}}_{A_i} {\overline{{\mathbf{x}}}}_{A_i} - {{\mathcal{U}}}_{A_j} {\overline{{\mathbf{x}}}}_{A_j} \right] = {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_j} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) $$
(56)

and similarly

$$ {\overline{{\mathbf{x}}}}_{A_j} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j}= {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} \left({\overline{{\mathbf{x}}}}_{A_j} - {\overline{{\mathbf{x}}}}_{A_i} \right) $$
(57)

Substituting in the expression for d(A i , A j ) above (Eq. 54, p. 36):

$$ d\left(A_i, A_j \right) = \sum_{r \in A_i} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) + \sum_{r \in A_j} \left({\overline{{\mathbf{x}}}}_{A_j} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right)^{\rm T} V_r^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_i \cup A_j} \right) = \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left(\sum_{r \in A_i} V_r^{-1} \right) {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A2}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) + \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left(\sum_{r \in A_j} V_r^{-1} \right) {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)= \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) + \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_j} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)= \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j} \left({{\mathcal{U}}}_{A_i} + {{\mathcal{U}}}_{A_j} \right) {{\mathcal{U}}}_{A_i \cup A_j}^{-1} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)= \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right)^{\rm T} {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j} \left({\overline{{\mathbf{x}}}}_{A_i} - {\overline{{\mathbf{x}}}}_{A_j} \right) $$
(58)

where we have used the commutativity of symmetric matrices, the definition \({{\mathcal{U}}}_{A_i}=\sum_{r \in A_i} V_r^{-1},\) and \({{\mathcal{U}}}_{A_i \cup A_j}={{\mathcal{U}}}_{A_i} + {{\mathcal{U}}}_{A_j}.\) Finally noting that:

$$ {{\mathcal{U}}}_{A_i \cup A_j}^{-1} {{\mathcal{U}}}_{A_i} {{\mathcal{U}}}_{A_j}= \left(V_{A_i}^{-1} + V_{A_j}^{-1} \right)^{-1} V_{A_i}^{-1} V_{A_j}^{-1}=\left(V_{A_i} + V_{A_j} \right)^{-1} $$
(59)

we have

$$ d\left(A_i,A_j\right)=\left({\overline{{\mathbf{x}}}}_{A_i}- {\overline{{\mathbf{x}}}}_{A_j}\right)^{\rm T} \left(V_{A_i} + V_{A_j} \right)^{-1} \left({\overline{{\mathbf{x}}}}_{A_i}-{\overline{{\mathbf{x}}}}_{A_j}\right) $$
(60)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ciampi, A., Lechevallier, Y., Limas, M.C. et al. Hierarchical clustering of subpopulations with a dissimilarity based on the likelihood ratio statistic: application to clustering massive data sets. Pattern Anal Applic 11, 199–220 (2008). https://doi.org/10.1007/s10044-007-0088-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-007-0088-4

Keywords

Navigation