Structure Preserving Embedding of Dissimilarity Data

Roth, Volker; Fuchs, Thomas J.; Vogt, Julia E.; Prabhakaran, Sandhya; Buhmann, Joachim M.

doi:10.1007/978-1-4471-5628-4_7

Volker Roth⁴,
Thomas J. Fuchs⁵,
Julia E. Vogt⁴,
Sandhya Prabhakaran⁴ &
…
Joachim M. Buhmann⁶

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1708 Accesses

Abstract

Partitioning methods for observations represented by pairwise dissimilarities are studied. Particular emphasis is put on their properties when applied to dissimilarity matrices that do not admit a loss-free embedding into a vector space. Specifically, the Pairwise Clustering cost function is shown to exhibit a shift invariance property which basically means that any symmetric dissimilarity matrix can be modified to allow a vector-space representation without distorting the optimal group structure. In an approximate sense, the same holds true for a probabilistic generalization of Pairwise Clustering, the so-called Wishart–Dirichlet Cluster Process. This shift-invariance property essentially means that these clustering methods are “blind” against Euclidean or metric violations. From the application side, such blindness against metric violations might be seen as a highly desired feature, since it broadens the applicability of certain algorithms. From the viewpoint of theory building, however, the same property might be viewed as a “negative” result, since studying these algorithms will not lead to any new insights on the role of metricity in clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Book Google Scholar
Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Article Google Scholar
Roth, V., Laub, J., Kawanabe, M., Buhmann, J.M.: Optimal cluster preserving embedding of non-metric proximity data. IEEE Trans. Pattern Anal. Mach. Intell. 25(12) (2003)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Puzicha, J., Hofmann, T., Buhmann, J.: A theory of proximity based clustering: structure detection by optimization. Pattern Recognit. 33(4), 617–634 (1999)
Article Google Scholar
Hofmann, T., Buhmann, J.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997)
Article Google Scholar
Brucker, P.: On the complexity of clustering problems. In: Beckman, M., Kunzi, H.P. (eds.) Optimization and Operations Research: Lecture Notes in Economics and Mathematical Systems, pp. 45–54. Springer, Berlin (1978)
Chapter Google Scholar
Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York (1958)
Google Scholar
Young, G., Householder, A.S.: Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22 (1938)
Article MATH Google Scholar
Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
Article Google Scholar
McCullagh, P., Yang, J.: How many clusters? Bayesian Anal. 3, 101–120 (2008)
Article MathSciNet Google Scholar
Vogt, J., Prabhakaran, S., Fuchs, T., Roth, V.: The translation-invariant Wishart–Dirichlet process for clustering distance data. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Google Scholar
Prabhakaran, S., Boehm, A., Metzner, K.J., Roth, V.: Recovering networks from distance data. J. Mach. Learn. Res. 25, 349–364 (2012)
Google Scholar
Pitman, J.: Combinatorial stochastic processes. In: Picard, J. (ed.) Ecole d’Ete de Probabilites de Saint-Flour XXXII-2002. Springer, Berlin (2006)
Google Scholar
MacEachern, S.N.: Estimating normal means with a conjugate-style Dirichlet process prior. Commun. Stat., Simul. Comput. 23, 727–741 (1994)
Article MathSciNet MATH Google Scholar
Dahl, D.B.: Sequentially-allocated merge-split sampler for conjugate and non-conjugate Dirichlet process mixture models. Technical report, Department of Statistics, Texas A&M University (2005)
Google Scholar
Ewens, W.: The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112 (1972)
Article MathSciNet MATH Google Scholar
Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9, 249–265 (2000)
MathSciNet Google Scholar
Blei, D., Jordan, M.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2006)
Article MathSciNet Google Scholar
Srivastava, M.S.: Singular Wishart and multivariate beta distributions. Ann. Stat. (2003)
Google Scholar
McCullagh, P.: Marginal likelihood for distance matrices. Stat. Sin. 19, 631–649 (2009)
MathSciNet MATH Google Scholar
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (2001)
MATH Google Scholar
Roth, V., Laub, J., Buhmann, J.M., Müller, K.-R.: Going metric: denoising pairwise data. In: Thrun, S., Becker, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 817–824. MIT Press, Cambridge (2003)
Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)
Article MathSciNet MATH Google Scholar
Tannapfel, A., Hahn, H.A., Katalinic, A., Fietkau, R.J., Kühn, R., Wittekind, C.W.: Prognostic value of ploidy and proliferation markers in renal cell carcinoma. Cancer 77(1), 164–171 (1996)
Article Google Scholar
Fuchs, T.J., Wild, P.J., Moch, H., Buhmann, J.M.: Computational pathology analysis of tissue microarrays predicts survival of renal clear cell carcinoma patients. In: Medical Image Computing and Computer-Assisted Intervention. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 1–8. Springer, Berlin (2008)
Chapter Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Ahonen, T., Hadid, A., Pietikainen, M.: Face recognition with local binary patterns. In: ECCV 2004, vol. 3021, pp. 469–481 (2004)
Chapter Google Scholar
Buhmann, J.M.: Information theoretic model validation for clustering. In: International Symposium on Information Theory, Austin Texas, pp. 1398–1402. IEEE Press, New York (2010). doi:10.1109/ISIT.2010.5513616
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Basel, Basel, Switzerland
Volker Roth, Julia E. Vogt & Sandhya Prabhakaran
California Institute of Technology, Pasadena, CA, 91125, USA
Thomas J. Fuchs
Swiss Federal Institute of Technology Zurich, Zurich, Switzerland
Joachim M. Buhmann

Authors

Volker Roth
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
Julia E. Vogt
View author publications
You can also search for this author in PubMed Google Scholar
Sandhya Prabhakaran
View author publications
You can also search for this author in PubMed Google Scholar
Joachim M. Buhmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Volker Roth .

Editor information

Editors and Affiliations

DAIS, Ca' Foscari University of Venice, Venezia Mestre, Italy
Marcello Pelillo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Roth, V., Fuchs, T.J., Vogt, J.E., Prabhakaran, S., Buhmann, J.M. (2013). Structure Preserving Embedding of Dissimilarity Data. In: Pelillo, M. (eds) Similarity-Based Pattern Analysis and Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5628-4_7

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5628-4_7
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5627-7
Online ISBN: 978-1-4471-5628-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics