Abstract
Partitioning methods for observations represented by pairwise dissimilarities are studied. Particular emphasis is put on their properties when applied to dissimilarity matrices that do not admit a loss-free embedding into a vector space. Specifically, the Pairwise Clustering cost function is shown to exhibit a shift invariance property which basically means that any symmetric dissimilarity matrix can be modified to allow a vector-space representation without distorting the optimal group structure. In an approximate sense, the same holds true for a probabilistic generalization of Pairwise Clustering, the so-called Wishart–Dirichlet Cluster Process. This shift-invariance property essentially means that these clustering methods are “blind” against Euclidean or metric violations. From the application side, such blindness against metric violations might be seen as a highly desired feature, since it broadens the applicability of certain algorithms. From the viewpoint of theory building, however, the same property might be viewed as a “negative” result, since studying these algorithms will not lead to any new insights on the role of metricity in clustering problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Roth, V., Laub, J., Kawanabe, M., Buhmann, J.M.: Optimal cluster preserving embedding of non-metric proximity data. IEEE Trans. Pattern Anal. Mach. Intell. 25(12) (2003)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Puzicha, J., Hofmann, T., Buhmann, J.: A theory of proximity based clustering: structure detection by optimization. Pattern Recognit. 33(4), 617–634 (1999)
Hofmann, T., Buhmann, J.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997)
Brucker, P.: On the complexity of clustering problems. In: Beckman, M., Kunzi, H.P. (eds.) Optimization and Operations Research: Lecture Notes in Economics and Mathematical Systems, pp. 45–54. Springer, Berlin (1978)
Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York (1958)
Young, G., Householder, A.S.: Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22 (1938)
Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
McCullagh, P., Yang, J.: How many clusters? Bayesian Anal. 3, 101–120 (2008)
Vogt, J., Prabhakaran, S., Fuchs, T., Roth, V.: The translation-invariant Wishart–Dirichlet process for clustering distance data. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Prabhakaran, S., Boehm, A., Metzner, K.J., Roth, V.: Recovering networks from distance data. J. Mach. Learn. Res. 25, 349–364 (2012)
Pitman, J.: Combinatorial stochastic processes. In: Picard, J. (ed.) Ecole d’Ete de Probabilites de Saint-Flour XXXII-2002. Springer, Berlin (2006)
MacEachern, S.N.: Estimating normal means with a conjugate-style Dirichlet process prior. Commun. Stat., Simul. Comput. 23, 727–741 (1994)
Dahl, D.B.: Sequentially-allocated merge-split sampler for conjugate and non-conjugate Dirichlet process mixture models. Technical report, Department of Statistics, Texas A&M University (2005)
Ewens, W.: The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112 (1972)
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9, 249–265 (2000)
Blei, D., Jordan, M.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2006)
Srivastava, M.S.: Singular Wishart and multivariate beta distributions. Ann. Stat. (2003)
McCullagh, P.: Marginal likelihood for distance matrices. Stat. Sin. 19, 631–649 (2009)
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (2001)
Roth, V., Laub, J., Buhmann, J.M., Müller, K.-R.: Going metric: denoising pairwise data. In: Thrun, S., Becker, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 817–824. MIT Press, Cambridge (2003)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)
Tannapfel, A., Hahn, H.A., Katalinic, A., Fietkau, R.J., Kühn, R., Wittekind, C.W.: Prognostic value of ploidy and proliferation markers in renal cell carcinoma. Cancer 77(1), 164–171 (1996)
Fuchs, T.J., Wild, P.J., Moch, H., Buhmann, J.M.: Computational pathology analysis of tissue microarrays predicts survival of renal clear cell carcinoma patients. In: Medical Image Computing and Computer-Assisted Intervention. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 1–8. Springer, Berlin (2008)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Ahonen, T., Hadid, A., Pietikainen, M.: Face recognition with local binary patterns. In: ECCV 2004, vol. 3021, pp. 469–481 (2004)
Buhmann, J.M.: Information theoretic model validation for clustering. In: International Symposium on Information Theory, Austin Texas, pp. 1398–1402. IEEE Press, New York (2010). doi:10.1109/ISIT.2010.5513616
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Roth, V., Fuchs, T.J., Vogt, J.E., Prabhakaran, S., Buhmann, J.M. (2013). Structure Preserving Embedding of Dissimilarity Data. In: Pelillo, M. (eds) Similarity-Based Pattern Analysis and Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5628-4_7
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5628-4_7
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5627-7
Online ISBN: 978-1-4471-5628-4
eBook Packages: Computer ScienceComputer Science (R0)