Abstract
Condorcet’s Jury Theorem has been invoked for ensemble classifiers to indicate that the combination of many classifiers can have better predictive performance than a single classifier. Such a theoretical underpinning is unknown for consensus clustering. This article extends Condorcet’s Jury Theorem to the mean partition approach under the additional assumptions that a unique but unknown ground-truth partition exists and sample partitions are drawn from a sufficiently small ball containing the ground-truth.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The support of Q is the smallest closed subset \({\mathcal {S}}_Q \subseteq {\mathcal {P}}\) such that \(Q({\mathcal {S}}_Q) = 1\).
- 2.
Recall that a mean partition is not unique in general.
References
Berend, D., Paroush, J.: When is condorcet’s jury theorem valid? Soc. Choice Welf. 15(4), 481–488 (1998)
Bhattacharya, A., Bhattacharya, R.: Nonparametric Inference on Manifolds with Applications to Shape Spaces. Cambridge University Press, Cambridge (2012)
Bredon, G.E.: Introduction to Compact Transformation Groups. Elsevier, New York City (1972)
de Condorcet, N.C.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)
Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Advances in Soft Computing (2002)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: methods and analysis. ACM Trans. Knowl. Discov. Data 2(4), 1–40 (2009)
Dryden, I.L., Mardia, K.V.: Statistical Shape Analysis. Wiley, Hoboken (1998)
Feragen, A., Lo, P., De Bruijne, M., Nielsen, M., Lauze, F.: Toward a theory of statistical tree-shape analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2008–2021 (2013)
Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. Int. J. Artif. Intell. Tools 13(4), 863–880 (2004)
Franek, L., Jiang, X.: Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recognit. 47(2), 833–842 (2014)
Fréchet, M.: Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’institut Henri Poincaré 10, 215–310 (1948)
Ginestet, C.E.: Strong Consistency of Fréchet Sample Mean Sets for Graph-Valued Random Variables. arXiv: 1204.3183 (2012)
Ghaemi, R., Sulaiman, N., Ibrahim, H., Mustapha, N.: A survey: clustering ensembles techniques. Proc. World Acad. Sci. Eng. Technol. 38, 644–657 (2009)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data 1(1), 341–352 (2007)
Grofman, B., Owen, G., Feld, S.L.: Thirteen theorems in search of the truth. Theory Decis. 15(3), 261–278 (1983)
Huckemann, S., Hotz, T., Munk, A.: Intrinsic shape analysis: geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Statistica Sinica 20, 1–100 (2010)
Jain, B.J., Obermayer, K.: Structure spaces. J. Mach. Learn. Res. 10, 2667–2714 (2009)
Jain, B.J.: Geometry of Graph Edit Distance Spaces. arXiv: 1505.08071 (2015)
Jain, B.J.: Asymptotic Behavior of Mean Partitions in Consensus Clustering. arXiv:1512.06061 (2015)
Jain, B.J.: Statistical analysis of graphs. Pattern Recognit. 60, 802–812 (2016)
Jain, B.J.: Homogeneity of Cluster Ensembles. arXiv:1602.02543 (2016)
Jain, B.J.: The Mean Partition Theorem of Consensus Clustering. arXiv:1604.06626 (2016)
Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bul. Lond. Math. Soc. 16, 81–121 (1984)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Lam, L., Suen, C.Y.: Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Syst. Man Cybern.- Part A: Syst. Hum. 27(5), 553–568 (1997)
Li, T., Ding, C., Jordan, M.I.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: IEEE International Conference on Data Mining (2007)
Marron, J.S., Alonso, A.M.: Overview of object oriented data analysis. Biom. J. 56(5), 732–753 (2014)
Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009)
Ratcliffe, J.G.: Foundations of Hyperbolic Manifolds. Springer, New York (2006). https://doi.org/10.1007/978-0-387-47322-2
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Surowiecki, J.: The Wisdom of Crowds. Anchor, New York City (2005)
Topchy, A.P., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)
Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted partition consensus via kernels. Pattern Recognit. 43(8), 2712–2724 (2010)
Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell. 25(03), 337–372 (2011)
Waldron, J.: The wisdom of the multitude: some reflections on Book III chapter 11 of the politics. Polit. Theory 23, 563–84 (1995)
Wang, H., Marron, J.S.: Object oriented data analysis: sets of trees. Ann. Stat. 35, 1849–1873 (2007)
Yang, F., Li, X., Li, Q., Li, T.: Exploring the diversity in cluster ensemble generation: random sampling and random projection. Expert Syst. Appl. 41(10), 4844–4866 (2014)
Zhou, Z.: Ensemble Methods: Foundations and Algorithms. Taylor & Francis Group, LLC, Abingdon (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proof of Theorem 2
A Proof of Theorem 2
To prove Theorem 2, it is helpful to use a suitable representation of partitions. We suggest to represent partitions as points of some geometric space, called orbit space [20]. Orbit spaces are well explored, possess a rich geometrical structure and have a natural connection to Euclidean spaces [3, 19, 30].
1.1 A.1 Partition Spaces
We denote the natural projection that sends matrices to the partitions they represent by
The group \(\varPi = \varPi ^\ell \) of all (\(\ell \times \ell \))-of all (\(\ell \times \ell \))-permutation matrices is a discontinuous group that acts on \({\mathcal {X}}\) by matrix multiplication, that is
The orbit of \({\varvec{X}} \in {\mathcal {X}}\) is the set \(\mathop {\left[ {\varvec{X}} \right] } = \mathop {\left\{ {\varvec{PX}} \,:\, {\varvec{P}} \in \varPi \right\} }\). The orbit space of partitions is the quotient space \({\mathcal {X}}/\varPi = \mathop {\left\{ \mathop {\left[ {\varvec{X}} \right] } \,:\, {\varvec{X}} \in {\mathcal {X}} \right\} }\) obtained by the action of the permutation group \(\varPi \) on the set \({\mathcal {X}}\). We write \({\mathcal {P}} = {\mathcal {X}}/\varPi \) to denote the partition space and \(X \in {\mathcal {P}}\) to denote an orbit \([{\varvec{X}}] \in {\mathcal {X}}/\varPi \). The natural projection \(\pi : {\mathcal {X}} \rightarrow {\mathcal {P}}\) sends matrices \({\varvec{X}}\) to the partitions \(\pi ({\varvec{X}}) = \mathop {\left[ {\varvec{X}} \right] }\) they represent. The partition space \({\mathcal {P}}\) is endowed with the intrinsic metric \(\delta \) defined by \(\delta (X, Y) = \min \mathop {\left\{ \Vert {{\varvec{X}} - {\varvec{Y}}}\Vert \,:\, {\varvec{X}} \in X, {\varvec{Y}} \in Y \right\} }\).
1.2 A.2 Dirichlet Fundamental Domains
We use the following notations: By \(\overline{{\mathcal {U}}}\) we denote the closure of a subset \({\mathcal {U}} \subseteq {\mathcal {X}}\), by \(\partial {\mathcal {U}}\) the boundary of \({\mathcal {U}}\), and by \({\mathcal {U}}^\circ \) the open subset \(\overline{{\mathcal {U}}} \setminus \partial {\mathcal {U}}\). The action of permutation \({\varvec{P}} \in \varPi \) on the subset \({\mathcal {U}}\subseteq {\mathcal {X}}\) is the set defined by \({\varvec{P}}\,{\mathcal {U}} = \mathop {\left\{ {\varvec{PX}} \, :\, {\varvec{X}} \in {\mathcal {U}} \right\} }\). By \(\varPi ^* = \varPi \setminus \mathop {\left\{ {\varvec{I}} \right\} }\) we denote the subset of (\(\ell \times \ell \))-permutation matrices without identity matrix \({\varvec{I}}\).
A subset \({\mathcal {F}}\) of \({\mathcal {X}}\) is a fundamental set for \(\varPi \) if and only if \({\mathcal {F}}\) contains exactly one representation \({\varvec{X}}\) from each orbit \(\mathop {\left[ {\varvec{X}} \right] } \in {\mathcal {X}}/\varPi \). A fundamental domain of \(\varPi \) in \({\mathcal {X}}\) is a closed connected set \({\mathcal {F}} \subseteq {\mathcal {X}}\) that satisfies
-
1.
\(\displaystyle {\mathcal {X}} = \bigcup _{{\varvec{P}} \in \varPi } {\varvec{P}}{\mathcal {F}}\)
-
2.
\({\varvec{P}} {\mathcal {F}}^\circ \cap {\mathcal {F}}^\circ = \emptyset \) for all \({\varvec{P}} \in \varPi ^*\).
Proposition 1
Let \({\varvec{Z}}\) be a representation of an asymmetric partition \(Z \in {\mathcal {P}}\). Then
is a fundamental domain, called Dirichlet fundamental domain of \({\varvec{Z}}\).
Proof
[30], Theorem 6.6.13. \(\square \)
Lemma 1
Let \({\mathcal {D}}_{{\varvec{Z}}}\) be a Dirichlet fundamental domain of representation \({\varvec{Z}}\) of an asymmetric partition \(Z \in {\mathcal {P}}\). Suppose that \({\varvec{X}}\) and \({\varvec{X}}'\) are two different representations of a partition X such that \({\varvec{X}}, {\varvec{X}}' \in {\mathcal {D}}_{{\varvec{Z}}}\). Then \({\varvec{X}}, {\varvec{X}}' \in \partial {\mathcal {D}}_{{\varvec{Z}}}\).
Proof
[19], Prop. 3.13 and [22], Prop. A.2. \(\square \)
1.3 A.3 Multiple Alignments
Let \({\mathcal {S}}_n = \mathop {\left( X_1, \ldots , X_n \right) }\) be a sample of n partitions \(X_i \in {\mathcal {P}}\). A multiple alignment of \({\mathcal {S}}_n\) is an n-tuple consisting of representations \({\varvec{X}}_{i}\in X_{i}\). By
we denote the set of all multiple alignments of \({\mathcal {S}}_n\). A multiple alignment is said to be in optimal position with representation \({\varvec{Z}}\) of a partition Z, if all representations \({\varvec{X}}_{i}\) of are in optimal position with \({\varvec{Z}}\). The mean of a multiple alignment is denoted by
An optimal multiple alignment is a multiple alignment that minimizes the function
The problem of finding an optimal multiple alignment is that of finding a multiple alignment with smallest average pairwise squared distances in \({\mathcal {X}}\). To show equivalence between mean partitions and an optimal multiple alignments, we introduce the sets of minimizers of the respective functions \(F_n\) and \(f_n\):
For a given sample \({\mathcal {S}}_n\), the set \({\mathcal {M}}(F_n)\) is the mean partition set and \({\mathcal {M}}(f_n)\) is the set of all optimal multiple alignments. The next result shows that any solution of \(F_n\) is also a solution of \(f_n\) and vice versa.
Theorem 3
For any sample \({\mathcal {S}}_n \in {\mathcal {P}}^n\), the map
is surjective.
Proof
[23], Theorem 4.1. \(\square \)
1.4 A.4 Proof of Theorem 2
Parts 1–8 show the assertion of Eq. (2) and Part 9 shows the assertion of Eq. (3).
1 Without loss of generality, we pick a representation \({\varvec{X}}_{*}\) of the ground-truth partition \(X_*\). Let \({\varvec{Z}}\) be a representation of Z in optimal position with \({\varvec{X}}_{*}\). By
we denote the asymmetry ball of representation \({\varvec{Z}}\). By construction, we have \({\varvec{X}}_{*} \in {\mathcal {A}}_{{\varvec{Z}}}\).
2 Since \(\varPi \) acts discontinuously on \({\mathcal {X}}\), there is a bijective isometry
according to [30], Theorem 13.1.1.
3 From [22], Theorem 3.1 follows that the mean partition M of \({\mathcal {S}}_n\) is unique. We show that \(M \in {\mathcal {A}}_Z\). Suppose that is a multiple alignment in optimal position with \({\varvec{Z}}\). Since \(\phi :{\mathcal {A}}_{{\varvec{Z}}} \rightarrow {\mathcal {A}}_Z\) is a bijective isometry, we have
showing that the multiple alignment is optimal. From Theorem 3 follows that
is a representation of a mean partition M of \({\mathcal {S}}_n\). Since \({\mathcal {A}}_{{\varvec{Z}}}\) is convex, we find that \({\varvec{M}} \in {\mathcal {A}}_{{\varvec{Z}}}\) and therefore \(M \in {\mathcal {A}}_Z\).
4 From Part 1–3 of this proof follows that the multiple alignment is in optimal position with \({\varvec{X}}_{*}\). We show that there is no other multiple alignment of \({\mathcal {S}}_n\) with this property. Observe that \({\mathcal {A}}_{{\varvec{Z}}}\) is contained in the Dirichlet fundamental domain \({\mathcal {D}}_{{\varvec{Z}}}\) of representation \({\varvec{Z}}\). Let \({\mathcal {S}}_{{\varvec{Z}}} = \phi ({\mathcal {S}}_Q)\) be a representation of the support in \( {\mathcal {A}}_{{\varvec{Z}}}^\circ \). Then by assumption, we have \({\mathcal {S}}_{{\varvec{Z}}} \subseteq {\mathcal {A}}_{{\varvec{Z}}}^\circ \subset {\mathcal {D}}_{{\varvec{Z}}}\) showing that \({\mathcal {S}}_{{\varvec{Z}}}\) lies in the interior of \({\mathcal {D}}_{{\varvec{Z}}}\). From the definition of a fundamental domain together with Lemma 1 follows that is the unique optimal alignment in optimal position with \({\varvec{X}}_{*}\).
5 With the same argumentation as in the previous part of this proof, we find that \({\varvec{M}}\) is the unique representation of M in optimal position with \({\varvec{X}}_{*}\).
6 Let \(z \in {\mathcal {Z}}\) be a data point. Since \({\varvec{X}}_{i} \in X_i\) is the unique representation in optimal position with \({\varvec{X}}_{*}\), the vote of \(X_i\) on data point z is of the form \(V_{X_i}(z) = V_{{\varvec{X}}_{i}}(z)\) for all \(i \in \mathop {\left\{ 1, \ldots , n \right\} }\). With the same argument, we have \(V_n(z) = V_{M}(z) = V_{{\varvec{M}}}(z)\).
7 By \({\varvec{x}}^{(i)}(z)\) we denote the column of \({\varvec{X}}_{i}\) that represents z. By definition, we have
for all \(i \in \mathop {\left\{ 1, \ldots , n \right\} }\). Since \(X_i\) and \(X_*\) are both hard partitions, we find that
where \(\mathbb {I}\) denotes the indicator function.
8 From the Mean Partition Theorem follows that
is the column of \({\varvec{M}}\) that represents z. Then the agreement of \({\varvec{M}}\) on z is given by
Thus, the agreement \(k_{{\varvec{M}}}(z)\) counts the fraction of sample partitions \(X_i\) that correctly classify z. Let
denote the probability that the majority of the sample partitions \(X_i\) correctly classifies z. Since the votes of the sample partitions are assumed to be independent, we can compute \(p_n\) using the binomial distribution
where \(r = \lfloor n/2 \rfloor + 1\) and \(\lfloor a \rfloor \) is the largest integer b with \(b \le a\). Then the assertion of Eq. (2) follows from [16], Theorem 1.
9 We show the assertion of Eq. (3). By assumption, the support \({\mathcal {S}}_Q\) is contained in an open subset of the asymmetry ball \({\mathcal {A}}_Z\). From [22], Theorem 3.1 follows that the expected partition \(M_Q\) of Q is unique. Then the sequence \((M_n)_{n \in \mathbb {N}}\) converges almost surely to the expected partition \(M_Q\) according to [20], Theorem 3.1 and Theorem 3.3. From the first eight parts of the proof follows that the limit partition \(M_Q\) agrees on any data point z almost surely with the ground-truth partition \(X_*\). This shows the assertion.
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Jain, B. (2018). Condorcet’s Jury Theorem for Consensus Clustering. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-00111-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00110-0
Online ISBN: 978-3-030-00111-7
eBook Packages: Computer ScienceComputer Science (R0)