Skip to main content
Log in

Combinatorial optimisation and hierarchical classifications

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

This paper is devoted to some selected topics relating Combinatorial Optimization and Hierarchical Classification. It is oriented toward extensions of the standard classification schemes (the hierarchies): pyramids, quasi-hierarchies, circular clustering, rigid clustering and others. Bijection theorems between these models and dissimilarity models allow to state some clustering problems as optimization problems. Within the galaxy of optimization we have especially discussed the following: NP-completeness results and search for polynomial instances; problems solved in a polynomial time (e.g. subdominant theory); design, analysis and applications of algorithms. In contrast with the orientation to “new” clustering problems, the last part discusses some standard algorithmic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aho, A. V., Hopcroft, J. E., & Ullman, J. D. (1974). The design and analysis of computer algorithms (pp. 74–84). Reading: Addison-Wesley.

    Google Scholar 

  • Asprejan, J.-D. (1966). Un algorithme pour construire des classes d’après une matrice de distance. Mashinnyi Perevod i Prikladnaja Lingvistika, 9, 3–18.

    Google Scholar 

  • Bandelt, H.-J. (1992). Four-point characterization of the dissimilarity function obtained from indexed closed weak hierarchies. Mathematisches Seminar, Hamburg Universität.

  • Bandelt, H.-J., & Dress, W. M. (1989). Weak hierarchies associated with a similarity measure—an additive clustering technique. Bulletin of Mathematical Biology, 51(1), 133–166.

    Google Scholar 

  • Bandelt, H.-J., & Dress, W. M. (1994). An order theoretic framework for overlapping clustering. Discrete Mathematics, 136, 21–37.

    Article  Google Scholar 

  • Barthélemy, J.-P. (2003). Classifications binaires. In: Actes des Rencontres de la Société Francophone de Classification (pp. 67–69).

  • Barthélemy, J.-P., & Brucker, F. (2001). NP-hard approximation problems in overlapping clustering. Journal of Classification, 18(2), 159–183.

    Google Scholar 

  • Barthélemy, J.-P., & Brucker, F. (2004). Binary clustering, preprint.

  • Barthélemy, J.-P., & Guénoche, A. (1988). Les arbres et les représentations de proximité. Paris: Masson.

    Google Scholar 

  • Barthélemy, J.-P., Brucker, F., & Osswald, C. (2004). Combinatorial optimisation and hierarchical classifications. 4OR, 2(3), 179–219.

    Article  Google Scholar 

  • Batbedat, A. (1988). Les isomorphismes hte et hts, après la bijection de Benzécri–Johnson. Metron, 46, 47–59.

    Google Scholar 

  • Batbedat, A. (1989). Les dissimilarités médias et arbas. Statistiques et Analyse des Données, 14, 1–18.

    Google Scholar 

  • Bayer, R. (1972). Symmetric binary b-trees: data structures and maintenance. Acta Informatica, 1, 290–306.

    Article  Google Scholar 

  • Benzécri, J.-P. (1973). L’analyse de données. Paris: Dunod.

    Google Scholar 

  • Bertrand, P. (2000). Set systems and dissimilarities. European Journal of Combinatorics, 21, 727–743.

    Article  Google Scholar 

  • Bertrand, P. (2002). Set systems for which each set properly intersects at most one other set—application to pyramidal clustering. Cahiers du Ceremade, 0202.

  • Bertrand, P., & Janowitz, M. (2003). The k-weak hierarchies: an extension of the weak hierarchical clustering structure. Discrete Applied Maths, 127, 199–220.

    Article  Google Scholar 

  • Birkhoff, G. (1967). Lattice theory. Providence: American Mathematical Society.

    Google Scholar 

  • Borůvka, O. (1926a). O jistém problému minimálniím (about a certain minimal problem). Práce Moravské Přírodovědecké Spolecnosti v Brně, 3, 37–58.

    Google Scholar 

  • Borůvka, O. (1926b). Příspěvek k řešení otázky ekonomické stavby elektrovodných sítí (contribution to the solution of the problem of economical construction of electrical networks). Elektrotechnický Obzor, 15, 153–154.

    Google Scholar 

  • Brucker, F. (2001). Modèles de classification en classes empiétantes. PhD thesis, EHESS and ENST Bretagne.

  • Brucker, F. (2002). Sub-dominant theory in numerical taxonomy. ENST-Bretagne, submitted.

  • Brucker, F. (2003). Réalisations de dissimilarités. In: Actes des rencontres de la société francophone de classification (pp. 7–10).

  • Brucker, F. (2004). From hypertrees to arboreal quasi-ultrametrics. Discrete Applied Mathematics, to appear.

  • Brucker, F., Osswald, C., & Barthélemy, J.-P. (2003). Rigid hypergraphs: combinatorial optimization problem in clustering and similarity analysis. In: INOC 2003 Proceedings (pp. 126–133).

  • Brucker, P. (1978). On the complexity of clustering problems. In: M. Beckman, & H. P. Kunzi (Eds.), Optimization and operations research (pp. 45–54). Heidelberg: Springer.

    Google Scholar 

  • Carroll, J. D. (1976). Spatial, non-spatial and hybrid models for scaling. Psychometrika, 41, 439–463.

    Article  Google Scholar 

  • Carroll, J. D., & Chang, J. J. (1973). A method for fitting a class of hierarchical tree structure models to dissimilarity data, and its application to some body part data of Miller. In: Proc. of the 81st convention of the American psychological association (Vol. 8, pp. 1097–1098).

  • Carroll, J. D., & Pruzansky, S. (1980). Discrete and hybrid scaling methods. In: E. B. Lauterman, & H. Freger (Eds.), Similarity and choice. Bern: Hans Huber.

    Google Scholar 

  • Cavalli-Sforza, L. L., & Edwards, A. W. F. (1967). Phylogenetic analysis models and estimation procedures. American Journal of Human Genetics, 19, 233–257.

    Google Scholar 

  • Chandon, J. L., Lemaire, J., & Pouget, J. (1980). Construction de l’ultramétrique la plus proche d’une dissimilarité au sens des moindres carrés. RAIRO/Recherche Opérationnelle, série Recherche Operationelle, 14, 157–170.

    Google Scholar 

  • Chen, Z. (1996). Space-conserving agglomerative algorithms. Journal of Classification, 13, 157–168.

    Article  Google Scholar 

  • Chepoï, V., & Fichet, B. (1997). Recognition of Robinsonian dissimilarities. Journal of Classification, 14, 311–325.

    Article  Google Scholar 

  • Chepoï, V., & Fichet, B. (2000). L -approximation via subdominants. Journal of Mathematical Psychology, 44(4), 600–616.

    Article  Google Scholar 

  • Choquet, G. (1938). Étude de certains réseaux de routes. Comptes-rendus de l’Académie des Sciences, 206, 310–313.

    Google Scholar 

  • Colonius, H., & Schulze, H. H. (1981). Tree structures for proximity data. British Journal of Mathematical and Statistical Psychology, 34, 167–180.

    Google Scholar 

  • Diatta, J. (1996). Une extension de la classification hiérarchique: les quasi-hiérarchies. PhD thesis, Université de Provence.

  • Diatta, J. (1997). Dissimilarités multivoies et généralisations d’hypergraphes sans triangles. Mathématiques, Informatique et Sciences Humaines, 138, 57–73.

    Google Scholar 

  • Diatta, J. (1998). Approximating dissimilarities by quasi-ultrametrics. Discrete Mathematics, 192, 81–86.

    Article  Google Scholar 

  • Diatta, J., & Fichet, B. (1994). From Asprejan hierarchies and Bandelt-Dress weak-hierarchies to quasi-hierarchies. In: E. Diday et al. (Eds.), New approaches in classification and data analysis (pp. 111–118). Berlin: Springer.

  • Diday, E. (1971). Une nouvelle méthode en classification automatique et reconnaissance des formes: la méthode des nuées dynamiques. Revue de Statistique Appliquée, 19(2), 19–33.

    Google Scholar 

  • Diday, E. (1983). Inversions en classification automatique: applications à la construction adaptative d’indices d’agrégation. Revue de Statistique Appliquée, 31(1), 45–62.

    Google Scholar 

  • Diday, E. (1984). Une représentation visuelle des classes empiétantes: les pyramides. Research report 291, INRIA.

  • Diday, E. (1986). Orders and overlapping clusters in pyramids. In: J. de Leew, et al. (Eds.), Multidimensional data analysis proceedings (pp. 201–234).

  • Dijkstra, E. (1959). Two problems in connection with graphs. Numerische Mathematik, 1, 269–271.

    Article  Google Scholar 

  • Duchet, P. (1979). Représentations, Noyaux en théorie des graphes et hypergraphes. PhD thesis, Université Paris VI, doctorat d’état.

  • Duchet, P. (1984). Classical perfect graphs. An introduction with emphasis on triangulated and interval graphs (pp. 67–96). Topics in perfect graphs. North-Holland, Amsterdam.

  • Durand, C. (1989). Ordres et graphes pseudo-hiérarchiques: théorie et optimisation algorithmique. PhD thesis, Université de Provence.

  • Farris, J. S. (1969). On the cophenetic correlation coefficients. Systematic Zoology, 18, 279–285.

    Article  Google Scholar 

  • Fichet, B. (1984). Sur une extension de la notion de hiérarchie et son équivalence avec quelques matrices de Robinson. In: Actes des “Journées de statistique de la Grande Motte” (pp. 12–12).

  • Fichet, B. (1986). Data analysis: geometric and algebraic structures. In: Y. A. Prohorov, et al. (Eds.), First world congress of the Bernoulli society proceedings (pp. 123–132). V.N.U. Science Press.

  • Fichet, B. (2001). Ultramétriques supérieures minimales sous contraintes. In: SFC O1 (pp. 147–150).

  • Flament, C. (1962). L’analyse de similitude. Cahiers du Centre de Recherche Opérationnelle, 4, 63–97.

    Google Scholar 

  • Flament, C. (1976). Hypergraphes et analyse de données. Séminaire INRIA.

  • Flament, C. (1978). Hypergraphes arborés. Discrete Mathematics, 21, 223–227.

    Article  Google Scholar 

  • Flament, C., Degenne, A., & Vergès, P. (1979). Analyse de similitude ordinale. Informatique et Sciences Humaines, 40–41, 223–231.

    Google Scholar 

  • Florek, K., Kuraszewicz, J., Perkal, J., Steinhaus, H., & Zubrzyki, S. (1951a). Sur la liaison et la division des points d’un ensemble fini. Colloquium Mathematicae, 2, 282–285.

    Google Scholar 

  • Florek, K., Kuraszewicz, J., Perkal, J., Steinhaus, H., & Zubrzyki, S. (1951b). Taksonomia wroclawska. Przeglad Antropol., 17, 193–207.

    Google Scholar 

  • Garey, M. R., & Johnson, D. S. (1979). Computers and intractability—a guide to the theory of NP -Completeness. New York: Freeman.

    Google Scholar 

  • Gilmore, P. C. (1962). Families of sets with faithful graph representation. Technical report, Thomas J. Watson Research Center, Yorktown Heights.

  • Gordon, A. D. (1999). Classification methods for the exploratory analysis of multivariate data. London: Chapman and Hill.

    Google Scholar 

  • Gower, J. C., & Ross, G. J. S. (1969). Minimal spanning trees and single linkage cluster analysis. Applied Statistics, 18, 54–64.

    Article  Google Scholar 

  • Hansen, P., & Jaumard, B. (1997). Cluster analysis and mathematical programming. Mathematical programming, 79, 191–215.

    Google Scholar 

  • Hansen, P., Jaumard, B., & Sanlaville, E. (1994). Partitionning problems in cluster analysis: a review of mathematical approaches. In: E. Diday, et al. (Eds.), New approaches in classification and data analysis (pp. 228–240). Berlin: Springer.

    Google Scholar 

  • Hansen, P., Jaumard, B., & Mladenovic, N. (1995). How to choose k entities among n. In: I. Cox, P. Hansen, & B. Julesz (Eds.), Partitioning data sets (pp. 105–116). Providence.

  • Hartigan, J. A. (1967). Representation of similarity matrices by trees. Journal of the American Mathematical Society, 62, 1140–1158.

    Google Scholar 

  • Hartigan, J. A. (1975). Clustering algorithms. Chichester: Wiley.

    Google Scholar 

  • Henley, N. M. (1969). A psychological study of the semantics of animal terms. Journal of Verbal Learning and Verbal Behavior, 8, 176–184.

    Article  Google Scholar 

  • Hubert, L., Arabie, P., & Meulman, J. (1997). Linear and circular unidimensional scaling for symmetric proximity matrices. British Journal of Mathematical and Statistical Psychology, 50, 253–284.

    Google Scholar 

  • Hubert, L., Arabie, P., & Meulman, J. (1998). Graph-theoretic representations for proximity matrices through strongly-anti-Robinsonian or circular strongly-anti-Robinsonian matrices. Psychometrica, 63(4), 341–358.

    Article  Google Scholar 

  • Janowitz, M. F. (1978). An order theoretic model for cluster analysis. SIAM Journal of Applied Mathematics, 34, 55–72.

    Article  Google Scholar 

  • Janowitz, M. F. (1979). Monotone equivariant cluster methods. SIAM Journal of Applied Mathematics, 37, 148–165.

    Article  Google Scholar 

  • Janowitz, M. F. (1981). Continuous L-clustering methods. Discrete Applied Mathematics, 3, 100–112.

    Article  Google Scholar 

  • Jardine, J. P. J., Jardine, N., & Sibson, R. (1967). The structure and construction of taxonomic hierarchies. Mathematical Biosciences, 1, 171–179.

    Article  Google Scholar 

  • Jardine, N., & Sibson, R. (1971). Mathematical taxonomy, part II. London: Wiley.

    Google Scholar 

  • Jarník, V. (1930). O jistém problèmu minimalnim. Práce Moravské Přírodovědecké Spolecnosti v Brně (Acta Societatis Scientiarum Naturalium (Moravicae)), 4, 57–63.

    Google Scholar 

  • Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.

    Article  Google Scholar 

  • Krivanek, A., & Moravek, J. (1986). NP-hard problems in hierarchical-tree clustering. Acta Informatica, 23, 311–323.

    Article  Google Scholar 

  • Kruskal, J. B. (1956). On the shortest spanning tree of a graph and the travelling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.

    Article  Google Scholar 

  • Lance, G. N., & Williams, W. T. (1967a). A general theory of classificatory sorting strategies. The Computer Journal, 9(4), 373–380.

    Google Scholar 

  • Lance, G. N., & Williams, W. T. (1967b). A general theory of classificatory sorting strategies. The Computer Journal, 10(3), 271–277.

    Article  Google Scholar 

  • Leclerc, B. (1981). Description combinatoire des ultramétriques. Mathématiques et Sciences Humaines, 73, 5–37.

    Google Scholar 

  • Leclerc, B. (1984). Comment reconnaître un hypergraphe arboré. Cahiers du CAMS.

  • Leclerc, B. (1985a). Les hiérarchies de parties et leur demi-treillis. Mathématiques et Sciences Humaines, 89, 5–34.

    Google Scholar 

  • Leclerc, B. (1985b). La comparaison de hiérarchies : indices et métriques. Mathématiques et Sciences Humaines, 92, 5–40.

    Google Scholar 

  • McQuitty, L. L. (1957). Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educational and Psychological Measurement, 17, 207–229.

    Article  Google Scholar 

  • Osswald, C. (2003a). Classification, analyse de la similitude et hypergraphes. PhD thesis, EHESS and ENST Bretagne.

  • Osswald, C. (2003b). Dissimilarités circulaires et hypercycles. In: Actes des rencontres de la société francophone de classification (pp. 165–168).

  • Osswald, C. (2003c). Robustesse aux variations de méthode pour la classification hiérarchique. In: XXXVèmes Journées de Statistiques (pp. 751–754). Lyon, 2003. SFdS.

  • Prim, R. C. (1957). Shortest connection network and some generalizations. Bell System Technical Journal, 26, 1389–1401.

    Google Scholar 

  • Quilliot, A. (1984). Circular representation problem on hypergraphs. Discrete Mathematics, 51, 251–264.

    Article  Google Scholar 

  • Reingold, E. M., Nievergelt, J., & Deo, N. (1977). Combinatorial algorithms: theory and practice. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Robinson, W. S. (1951). A method for chronologically ordering archaeological deposits. American Antiquity, 16, 295–301.

    Article  Google Scholar 

  • Roux, M. (1968). Un algorithme pour trouver une hiérarchie particulière. PhD thesis, ISUP, Paris.

  • Roux, M. (1985). Algorithmes de classification. Paris: Masson.

    Google Scholar 

  • Sibson, R. (1971). Some observations of a paper by Lance and Williams. The Computer Journal, 14, 156–157.

    Article  Google Scholar 

  • Sneath, P. H. A. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17, 201–226.

    Google Scholar 

  • Sokal, R. R., & Rolf, F. J. (1962). The comparison of dendrograms by objective methods. Taxon, 9, 33–40.

    Article  Google Scholar 

  • Sokal, R. R., & Sneath, P. H. A. (1963). Principles of numerical taxonomy. San Francisco: Freeman.

    Google Scholar 

  • Sorensen, T. (1948). A method to establish groups of equal amplitude in plant sociology based on the dissimilarity of species content and the applications of the analysis of the vegetation of danish common. Biologiske Skrifter, 5(4), 1–34.

    Google Scholar 

  • Van Cutsem, B. (Ed.) (1994). Classification and dissimilarity analysis. Lecture notes in statistics (Vol. 93). New York: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J.-P. Barthélemy.

Additional information

This article appeared in 4OR 2, 179–219, 2004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barthélemy, JP., Brucker, F. & Osswald, C. Combinatorial optimisation and hierarchical classifications. Ann Oper Res 153, 179–214 (2007). https://doi.org/10.1007/s10479-007-0174-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-007-0174-4

Keywords

Navigation