Skip to main content

Structure Preserving Embedding of Dissimilarity Data

  • Chapter
Book cover Similarity-Based Pattern Analysis and Recognition

Abstract

Partitioning methods for observations represented by pairwise dissimilarities are studied. Particular emphasis is put on their properties when applied to dissimilarity matrices that do not admit a loss-free embedding into a vector space. Specifically, the Pairwise Clustering cost function is shown to exhibit a shift invariance property which basically means that any symmetric dissimilarity matrix can be modified to allow a vector-space representation without distorting the optimal group structure. In an approximate sense, the same holds true for a probabilistic generalization of Pairwise Clustering, the so-called Wishart–Dirichlet Cluster Process. This shift-invariance property essentially means that these clustering methods are “blind” against Euclidean or metric violations. From the application side, such blindness against metric violations might be seen as a highly desired feature, since it broadens the applicability of certain algorithms. From the viewpoint of theory building, however, the same property might be viewed as a “negative” result, since studying these algorithms will not lead to any new insights on the role of metricity in clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)

    Book  Google Scholar 

  2. Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)

    Article  Google Scholar 

  3. Roth, V., Laub, J., Kawanabe, M., Buhmann, J.M.: Optimal cluster preserving embedding of non-metric proximity data. IEEE Trans. Pattern Anal. Mach. Intell. 25(12) (2003)

    Google Scholar 

  4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  5. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  6. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  7. Puzicha, J., Hofmann, T., Buhmann, J.: A theory of proximity based clustering: structure detection by optimization. Pattern Recognit. 33(4), 617–634 (1999)

    Article  Google Scholar 

  8. Hofmann, T., Buhmann, J.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997)

    Article  Google Scholar 

  9. Brucker, P.: On the complexity of clustering problems. In: Beckman, M., Kunzi, H.P. (eds.) Optimization and Operations Research: Lecture Notes in Economics and Mathematical Systems, pp. 45–54. Springer, Berlin (1978)

    Chapter  Google Scholar 

  10. Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York (1958)

    Google Scholar 

  11. Young, G., Householder, A.S.: Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22 (1938)

    Article  MATH  Google Scholar 

  12. Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)

    Article  Google Scholar 

  13. McCullagh, P., Yang, J.: How many clusters? Bayesian Anal. 3, 101–120 (2008)

    Article  MathSciNet  Google Scholar 

  14. Vogt, J., Prabhakaran, S., Fuchs, T., Roth, V.: The translation-invariant Wishart–Dirichlet process for clustering distance data. In: Proceedings of the 27th International Conference on Machine Learning (2010)

    Google Scholar 

  15. Prabhakaran, S., Boehm, A., Metzner, K.J., Roth, V.: Recovering networks from distance data. J. Mach. Learn. Res. 25, 349–364 (2012)

    Google Scholar 

  16. Pitman, J.: Combinatorial stochastic processes. In: Picard, J. (ed.) Ecole d’Ete de Probabilites de Saint-Flour XXXII-2002. Springer, Berlin (2006)

    Google Scholar 

  17. MacEachern, S.N.: Estimating normal means with a conjugate-style Dirichlet process prior. Commun. Stat., Simul. Comput. 23, 727–741 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  18. Dahl, D.B.: Sequentially-allocated merge-split sampler for conjugate and non-conjugate Dirichlet process mixture models. Technical report, Department of Statistics, Texas A&M University (2005)

    Google Scholar 

  19. Ewens, W.: The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  20. Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9, 249–265 (2000)

    MathSciNet  Google Scholar 

  21. Blei, D., Jordan, M.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2006)

    Article  MathSciNet  Google Scholar 

  22. Srivastava, M.S.: Singular Wishart and multivariate beta distributions. Ann. Stat. (2003)

    Google Scholar 

  23. McCullagh, P.: Marginal likelihood for distance matrices. Stat. Sin. 19, 631–649 (2009)

    MathSciNet  MATH  Google Scholar 

  24. Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (2001)

    MATH  Google Scholar 

  25. Roth, V., Laub, J., Buhmann, J.M., Müller, K.-R.: Going metric: denoising pairwise data. In: Thrun, S., Becker, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 817–824. MIT Press, Cambridge (2003)

    Google Scholar 

  26. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  27. Tannapfel, A., Hahn, H.A., Katalinic, A., Fietkau, R.J., Kühn, R., Wittekind, C.W.: Prognostic value of ploidy and proliferation markers in renal cell carcinoma. Cancer 77(1), 164–171 (1996)

    Article  Google Scholar 

  28. Fuchs, T.J., Wild, P.J., Moch, H., Buhmann, J.M.: Computational pathology analysis of tissue microarrays predicts survival of renal clear cell carcinoma patients. In: Medical Image Computing and Computer-Assisted Intervention. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 1–8. Springer, Berlin (2008)

    Chapter  Google Scholar 

  29. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  30. Ahonen, T., Hadid, A., Pietikainen, M.: Face recognition with local binary patterns. In: ECCV 2004, vol. 3021, pp. 469–481 (2004)

    Chapter  Google Scholar 

  31. Buhmann, J.M.: Information theoretic model validation for clustering. In: International Symposium on Information Theory, Austin Texas, pp. 1398–1402. IEEE Press, New York (2010). doi:10.1109/ISIT.2010.5513616

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volker Roth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Roth, V., Fuchs, T.J., Vogt, J.E., Prabhakaran, S., Buhmann, J.M. (2013). Structure Preserving Embedding of Dissimilarity Data. In: Pelillo, M. (eds) Similarity-Based Pattern Analysis and Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5628-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5628-4_7

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5627-7

  • Online ISBN: 978-1-4471-5628-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics