Condorcet’s Jury Theorem for Consensus Clustering

Jain, Brijnesh

doi:10.1007/978-3-030-00111-7_14

Condorcet’s Jury Theorem for Consensus Clustering

Brijnesh Jain¹⁵

Conference paper
First Online: 30 August 2018

1228 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11117))

Abstract

Condorcet’s Jury Theorem has been invoked for ensemble classifiers to indicate that the combination of many classifiers can have better predictive performance than a single classifier. Such a theoretical underpinning is unknown for consensus clustering. This article extends Condorcet’s Jury Theorem to the mean partition approach under the additional assumptions that a unique but unknown ground-truth partition exists and sample partitions are drawn from a sufficiently small ball containing the ground-truth.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The support of Q is the smallest closed subset ${\mathcal {S}}_Q \subseteq {\mathcal {P}}$ such that $Q({\mathcal {S}}_Q) = 1$.
2.
Recall that a mean partition is not unique in general.

References

Berend, D., Paroush, J.: When is condorcet’s jury theorem valid? Soc. Choice Welf. 15(4), 481–488 (1998)
Article MathSciNet Google Scholar
Bhattacharya, A., Bhattacharya, R.: Nonparametric Inference on Manifolds with Applications to Shape Spaces. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Bredon, G.E.: Introduction to Compact Transformation Groups. Elsevier, New York City (1972)
MATH Google Scholar
de Condorcet, N.C.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Imprimerie Royale, Paris (1785)
Google Scholar
Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. In: Advances in Soft Computing (2002)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Domeniconi, C., Al-Razgan, M.: Weighted cluster ensembles: methods and analysis. ACM Trans. Knowl. Discov. Data 2(4), 1–40 (2009)
Article Google Scholar
Dryden, I.L., Mardia, K.V.: Statistical Shape Analysis. Wiley, Hoboken (1998)
MATH Google Scholar
Feragen, A., Lo, P., De Bruijne, M., Nielsen, M., Lauze, F.: Toward a theory of statistical tree-shape analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2008–2021 (2013)
Article Google Scholar
Filkov, V., Skiena, S.: Integrating microarray data by consensus clustering. Int. J. Artif. Intell. Tools 13(4), 863–880 (2004)
Article Google Scholar
Franek, L., Jiang, X.: Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recognit. 47(2), 833–842 (2014)
Article Google Scholar
Fréchet, M.: Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’institut Henri Poincaré 10, 215–310 (1948)
MATH Google Scholar
Ginestet, C.E.: Strong Consistency of Fréchet Sample Mean Sets for Graph-Valued Random Variables. arXiv: 1204.3183 (2012)
Ghaemi, R., Sulaiman, N., Ibrahim, H., Mustapha, N.: A survey: clustering ensembles techniques. Proc. World Acad. Sci. Eng. Technol. 38, 644–657 (2009)
Google Scholar
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data 1(1), 341–352 (2007)
Article Google Scholar
Grofman, B., Owen, G., Feld, S.L.: Thirteen theorems in search of the truth. Theory Decis. 15(3), 261–278 (1983)
Article MathSciNet Google Scholar
Huckemann, S., Hotz, T., Munk, A.: Intrinsic shape analysis: geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Statistica Sinica 20, 1–100 (2010)
MathSciNet MATH Google Scholar
Jain, B.J., Obermayer, K.: Structure spaces. J. Mach. Learn. Res. 10, 2667–2714 (2009)
MathSciNet MATH Google Scholar
Jain, B.J.: Geometry of Graph Edit Distance Spaces. arXiv: 1505.08071 (2015)
Jain, B.J.: Asymptotic Behavior of Mean Partitions in Consensus Clustering. arXiv:1512.06061 (2015)
Jain, B.J.: Statistical analysis of graphs. Pattern Recognit. 60, 802–812 (2016)
Article Google Scholar
Jain, B.J.: Homogeneity of Cluster Ensembles. arXiv:1602.02543 (2016)
Jain, B.J.: The Mean Partition Theorem of Consensus Clustering. arXiv:1604.06626 (2016)
Kendall, D.G.: Shape manifolds, procrustean metrics, and complex projective spaces. Bul. Lond. Math. Soc. 16, 81–121 (1984)
Article MathSciNet Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Book Google Scholar
Lam, L., Suen, C.Y.: Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Syst. Man Cybern.- Part A: Syst. Hum. 27(5), 553–568 (1997)
Article Google Scholar
Li, T., Ding, C., Jordan, M.I.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: IEEE International Conference on Data Mining (2007)
Google Scholar
Marron, J.S., Alonso, A.M.: Overview of object oriented data analysis. Biom. J. 56(5), 732–753 (2014)
Article MathSciNet Google Scholar
Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009)
Article Google Scholar
Ratcliffe, J.G.: Foundations of Hyperbolic Manifolds. Springer, New York (2006). https://doi.org/10.1007/978-0-387-47322-2
Book MATH Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Surowiecki, J.: The Wisdom of Crowds. Anchor, New York City (2005)
Google Scholar
Topchy, A.P., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)
Article Google Scholar
Vega-Pons, S., Correa-Morris, J., Ruiz-Shulcloper, J.: Weighted partition consensus via kernels. Pattern Recognit. 43(8), 2712–2724 (2010)
Article Google Scholar
Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell. 25(03), 337–372 (2011)
Article MathSciNet Google Scholar
Waldron, J.: The wisdom of the multitude: some reflections on Book III chapter 11 of the politics. Polit. Theory 23, 563–84 (1995)
Article Google Scholar
Wang, H., Marron, J.S.: Object oriented data analysis: sets of trees. Ann. Stat. 35, 1849–1873 (2007)
Article MathSciNet Google Scholar
Yang, F., Li, X., Li, Q., Li, T.: Exploring the diversity in cluster ensemble generation: random sampling and random projection. Expert Syst. Appl. 41(10), 4844–4866 (2014)
Article Google Scholar
Zhou, Z.: Ensemble Methods: Foundations and Algorithms. Taylor & Francis Group, LLC, Abingdon (2012)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Electrical Engineering, Technical University Berlin, Berlin, Germany
Brijnesh Jain

Authors

Brijnesh Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brijnesh Jain .

Editor information

Editors and Affiliations

TU Berlin, Berlin, Germany
Frank Trollmann
TU Dresden, Dresden, Germany
Anni-Yasmin Turhan

A Proof of Theorem 2

To prove Theorem 2, it is helpful to use a suitable representation of partitions. We suggest to represent partitions as points of some geometric space, called orbit space [20]. Orbit spaces are well explored, possess a rich geometrical structure and have a natural connection to Euclidean spaces [3, 19, 30].

1.1 A.1 Partition Spaces

We denote the natural projection that sends matrices to the partitions they represent by

$$ \pi : {\mathcal {X}} \rightarrow {\mathcal {P}}, \quad {\varvec{X}} \mapsto \pi ({\varvec{X}}) = X. $$

The group $\varPi = \varPi ^\ell $ of all ($\ell \times \ell $)-of all ($\ell \times \ell $)-permutation matrices is a discontinuous group that acts on ${\mathcal {X}}$ by matrix multiplication, that is

$$ \cdot : \varPi \times {\mathcal {X}} \rightarrow {\mathcal {X}}, \quad ({\varvec{P}}, {\varvec{X}}) \mapsto {\varvec{PX}}. $$

The orbit of ${\varvec{X}} \in {\mathcal {X}}$ is the set $\mathop {\left[ {\varvec{X}} \right] } = \mathop {\left\{ {\varvec{PX}} \,:\, {\varvec{P}} \in \varPi \right\} }$. The orbit space of partitions is the quotient space ${\mathcal {X}}/\varPi = \mathop {\left\{ \mathop {\left[ {\varvec{X}} \right] } \,:\, {\varvec{X}} \in {\mathcal {X}} \right\} }$ obtained by the action of the permutation group $\varPi $ on the set ${\mathcal {X}}$. We write ${\mathcal {P}} = {\mathcal {X}}/\varPi $ to denote the partition space and $X \in {\mathcal {P}}$ to denote an orbit $[{\varvec{X}}] \in {\mathcal {X}}/\varPi $. The natural projection $\pi : {\mathcal {X}} \rightarrow {\mathcal {P}}$ sends matrices ${\varvec{X}}$ to the partitions $\pi ({\varvec{X}}) = \mathop {\left[ {\varvec{X}} \right] }$ they represent. The partition space ${\mathcal {P}}$ is endowed with the intrinsic metric $\delta $ defined by $\delta (X, Y) = \min \mathop {\left\{ \Vert {{\varvec{X}} - {\varvec{Y}}}\Vert \,:\, {\varvec{X}} \in X, {\varvec{Y}} \in Y \right\} }$.

1.2 A.2 Dirichlet Fundamental Domains

We use the following notations: By $\overline{{\mathcal {U}}}$ we denote the closure of a subset ${\mathcal {U}} \subseteq {\mathcal {X}}$, by $\partial {\mathcal {U}}$ the boundary of ${\mathcal {U}}$, and by ${\mathcal {U}}^\circ $ the open subset $\overline{{\mathcal {U}}} \setminus \partial {\mathcal {U}}$. The action of permutation ${\varvec{P}} \in \varPi $ on the subset ${\mathcal {U}}\subseteq {\mathcal {X}}$ is the set defined by ${\varvec{P}}\,{\mathcal {U}} = \mathop {\left\{ {\varvec{PX}} \, :\, {\varvec{X}} \in {\mathcal {U}} \right\} }$. By $\varPi ^* = \varPi \setminus \mathop {\left\{ {\varvec{I}} \right\} }$ we denote the subset of ($\ell \times \ell $)-permutation matrices without identity matrix ${\varvec{I}}$.

A subset ${\mathcal {F}}$ of ${\mathcal {X}}$ is a fundamental set for $\varPi $ if and only if ${\mathcal {F}}$ contains exactly one representation ${\varvec{X}}$ from each orbit $\mathop {\left[ {\varvec{X}} \right] } \in {\mathcal {X}}/\varPi $. A fundamental domain of $\varPi $ in ${\mathcal {X}}$ is a closed connected set ${\mathcal {F}} \subseteq {\mathcal {X}}$ that satisfies

1.
$\displaystyle {\mathcal {X}} = \bigcup _{{\varvec{P}} \in \varPi } {\varvec{P}}{\mathcal {F}}$
2.
${\varvec{P}} {\mathcal {F}}^\circ \cap {\mathcal {F}}^\circ = \emptyset $ for all ${\varvec{P}} \in \varPi ^*$.

Proposition 1

Let ${\varvec{Z}}$ be a representation of an asymmetric partition $Z \in {\mathcal {P}}$. Then

$$ {\mathcal {D}}_{{\varvec{Z}}} = \mathop {\left\{ {\varvec{X}} \in {\mathcal {X}} \,:\, \Vert {{\varvec{X}} - {\varvec{Z}}}\Vert \le \Vert {{\varvec{X}} - {\varvec{PZ}}}\Vert \text { for all }{\varvec{P}} \in \varPi \right\} } $$

is a fundamental domain, called Dirichlet fundamental domain of ${\varvec{Z}}$.

Proof

[30], Theorem 6.6.13. $\square $

Lemma 1

Let ${\mathcal {D}}_{{\varvec{Z}}}$ be a Dirichlet fundamental domain of representation ${\varvec{Z}}$ of an asymmetric partition $Z \in {\mathcal {P}}$. Suppose that ${\varvec{X}}$ and ${\varvec{X}}'$ are two different representations of a partition X such that ${\varvec{X}}, {\varvec{X}}' \in {\mathcal {D}}_{{\varvec{Z}}}$. Then ${\varvec{X}}, {\varvec{X}}' \in \partial {\mathcal {D}}_{{\varvec{Z}}}$.

Proof

[19], Prop. 3.13 and [22], Prop. A.2. $\square $

1.3 A.3 Multiple Alignments

Let ${\mathcal {S}}_n = \mathop {\left( X_1, \ldots , X_n \right) }$ be a sample of n partitions $X_i \in {\mathcal {P}}$. A multiple alignment of ${\mathcal {S}}_n$ is an n-tuple consisting of representations ${\varvec{X}}_{i}\in X_{i}$. By

we denote the set of all multiple alignments of ${\mathcal {S}}_n$. A multiple alignment is said to be in optimal position with representation ${\varvec{Z}}$ of a partition Z, if all representations ${\varvec{X}}_{i}$ of are in optimal position with ${\varvec{Z}}$. The mean of a multiple alignment is denoted by

$$\begin{aligned} {\varvec{M}}_{\mathfrak {X}} = \frac{1}{n} \sum _{i=1}^n {\varvec{X}}_{i}. \end{aligned}$$

An optimal multiple alignment is a multiple alignment that minimizes the function

$$\begin{aligned} f_n\!\mathop {\left( \mathfrak {X} \right) } = \frac{1}{n^2}\sum _{i=1}^n \sum _{j=1}^n \Vert {{\varvec{X}}_{i} - {\varvec{X}}_{j}}\Vert {^2}. \end{aligned}$$

The problem of finding an optimal multiple alignment is that of finding a multiple alignment with smallest average pairwise squared distances in ${\mathcal {X}}$. To show equivalence between mean partitions and an optimal multiple alignments, we introduce the sets of minimizers of the respective functions $F_n$ and $f_n$:

For a given sample ${\mathcal {S}}_n$, the set ${\mathcal {M}}(F_n)$ is the mean partition set and ${\mathcal {M}}(f_n)$ is the set of all optimal multiple alignments. The next result shows that any solution of $F_n$ is also a solution of $f_n$ and vice versa.

Theorem 3

For any sample ${\mathcal {S}}_n \in {\mathcal {P}}^n$, the map

is surjective.

Proof

[23], Theorem 4.1. $\square $

1.4 A.4 Proof of Theorem 2

Parts 1–8 show the assertion of Eq. (2) and Part 9 shows the assertion of Eq. (3).

1 Without loss of generality, we pick a representation ${\varvec{X}}_{*}$ of the ground-truth partition $X_*$. Let ${\varvec{Z}}$ be a representation of Z in optimal position with ${\varvec{X}}_{*}$. By

$$ {\mathcal {A}}_{{\varvec{Z}}} = \mathop {\left\{ {\varvec{X}} \in {\mathcal {X}} \,:\, \Vert {{\varvec{X}}-{\varvec{Z}}}\Vert \le \alpha _Z/4 \right\} } $$

we denote the asymmetry ball of representation ${\varvec{Z}}$. By construction, we have ${\varvec{X}}_{*} \in {\mathcal {A}}_{{\varvec{Z}}}$.

2 Since $\varPi $ acts discontinuously on ${\mathcal {X}}$, there is a bijective isometry

$$ \phi :{\mathcal {A}}_{{\varvec{Z}}} \rightarrow {\mathcal {A}}_Z, \quad {\varvec{X}} \mapsto \pi ({\varvec{X}}) $$

according to [30], Theorem 13.1.1.

3 From [22], Theorem 3.1 follows that the mean partition M of ${\mathcal {S}}_n$ is unique. We show that $M \in {\mathcal {A}}_Z$. Suppose that is a multiple alignment in optimal position with ${\varvec{Z}}$. Since $\phi :{\mathcal {A}}_{{\varvec{Z}}} \rightarrow {\mathcal {A}}_Z$ is a bijective isometry, we have

showing that the multiple alignment is optimal. From Theorem 3 follows that

is a representation of a mean partition M of ${\mathcal {S}}_n$. Since ${\mathcal {A}}_{{\varvec{Z}}}$ is convex, we find that ${\varvec{M}} \in {\mathcal {A}}_{{\varvec{Z}}}$ and therefore $M \in {\mathcal {A}}_Z$.

4 From Part 1–3 of this proof follows that the multiple alignment is in optimal position with ${\varvec{X}}_{*}$. We show that there is no other multiple alignment of ${\mathcal {S}}_n$ with this property. Observe that ${\mathcal {A}}_{{\varvec{Z}}}$ is contained in the Dirichlet fundamental domain ${\mathcal {D}}_{{\varvec{Z}}}$ of representation ${\varvec{Z}}$. Let ${\mathcal {S}}_{{\varvec{Z}}} = \phi ({\mathcal {S}}_Q)$ be a representation of the support in $ {\mathcal {A}}_{{\varvec{Z}}}^\circ $. Then by assumption, we have ${\mathcal {S}}_{{\varvec{Z}}} \subseteq {\mathcal {A}}_{{\varvec{Z}}}^\circ \subset {\mathcal {D}}_{{\varvec{Z}}}$ showing that ${\mathcal {S}}_{{\varvec{Z}}}$ lies in the interior of ${\mathcal {D}}_{{\varvec{Z}}}$. From the definition of a fundamental domain together with Lemma 1 follows that is the unique optimal alignment in optimal position with ${\varvec{X}}_{*}$.

5 With the same argumentation as in the previous part of this proof, we find that ${\varvec{M}}$ is the unique representation of M in optimal position with ${\varvec{X}}_{*}$.

6 Let $z \in {\mathcal {Z}}$ be a data point. Since ${\varvec{X}}_{i} \in X_i$ is the unique representation in optimal position with ${\varvec{X}}_{*}$, the vote of $X_i$ on data point z is of the form $V_{X_i}(z) = V_{{\varvec{X}}_{i}}(z)$ for all $i \in \mathop {\left\{ 1, \ldots , n \right\} }$. With the same argument, we have $V_n(z) = V_{M}(z) = V_{{\varvec{M}}}(z)$.

7 By ${\varvec{x}}^{(i)}(z)$ we denote the column of ${\varvec{X}}_{i}$ that represents z. By definition, we have

$$ p_z = \mathbb {P}\mathop {\left( V_{X_i}(z) = 1 \right) } = \mathbb {P}\mathop {\left( \mathop {\left\langle {\varvec{x}}^{(i)}(z), {\varvec{x}}^*(z) \right\rangle } > 0.5 \right) } $$

for all $i \in \mathop {\left\{ 1, \ldots , n \right\} }$. Since $X_i$ and $X_*$ are both hard partitions, we find that

$$ \mathop {\left\langle {\varvec{x}}^{(i)}(z), {\varvec{x}}^*(z) \right\rangle } = \mathbb {I}\mathop {\left\{ {\varvec{x}}^{(i)}(z) = {\varvec{x}}^*(z) \right\} }, $$

where $\mathbb {I}$ denotes the indicator function.

8 From the Mean Partition Theorem follows that

$$ {\varvec{m}}(z) = \frac{1}{n} \sum _{i=1}^n {\varvec{x}}^{(i)}(z) $$

is the column of ${\varvec{M}}$ that represents z. Then the agreement of ${\varvec{M}}$ on z is given by

$$\begin{aligned} k_{{\varvec{M}}}(z)&= \mathop {\left\langle {\varvec{m}}(z), {\varvec{x}}^*(z) \right\rangle }\\&= \frac{1}{n}\sum _{i=1}^n \mathop {\left\langle {\varvec{x}}^{(i)}(z), {\varvec{x}}^*(z) \right\rangle }\\&= \frac{1}{n}\sum _{i=1}^n \mathbb {I}\mathop {\left\{ {\varvec{x}}^{(i)}(z) = {\varvec{x}}^*(z) \right\} }. \end{aligned}$$

Thus, the agreement $k_{{\varvec{M}}}(z)$ counts the fraction of sample partitions $X_i$ that correctly classify z. Let

$$ p_n = \mathbb {P}\mathop {\left( h_n(z) = 1 \right) } = \mathbb {P}\mathop {\left( k_{{\varvec{M}}}(z) > 0.5 \right) } $$

denote the probability that the majority of the sample partitions $X_i$ correctly classifies z. Since the votes of the sample partitions are assumed to be independent, we can compute $p_n$ using the binomial distribution

$$ p_n = \sum _{i=r}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) p^i (1-p)^{n-i}, $$

where $r = \lfloor n/2 \rfloor + 1$ and $\lfloor a \rfloor $ is the largest integer b with $b \le a$. Then the assertion of Eq. (2) follows from [16], Theorem 1.

9 We show the assertion of Eq. (3). By assumption, the support ${\mathcal {S}}_Q$ is contained in an open subset of the asymmetry ball ${\mathcal {A}}_Z$. From [22], Theorem 3.1 follows that the expected partition $M_Q$ of Q is unique. Then the sequence $(M_n)_{n \in \mathbb {N}}$ converges almost surely to the expected partition $M_Q$ according to [20], Theorem 3.1 and Theorem 3.3. From the first eight parts of the proof follows that the limit partition $M_Q$ agrees on any data point z almost surely with the ground-truth partition $X_*$. This shows the assertion.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, B. (2018). Condorcet’s Jury Theorem for Consensus Clustering. In: Trollmann, F., Turhan, AY. (eds) KI 2018: Advances in Artificial Intelligence. KI 2018. Lecture Notes in Computer Science(), vol 11117. Springer, Cham. https://doi.org/10.1007/978-3-030-00111-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-00111-7_14
Published: 30 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00110-0
Online ISBN: 978-3-030-00111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proof of Theorem 2

A Proof of Theorem 2

1.1 A.1 Partition Spaces

1.2 A.2 Dirichlet Fundamental Domains

Proposition 1

Proof

Lemma 1

Proof

1.3 A.3 Multiple Alignments

Theorem 3

Proof

1.4 A.4 Proof of Theorem 2

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation