Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining

Simovici, Dan A.; Vetro, Rosanne; Hua, Kaixun

doi:10.1007/978-3-319-45763-5_8

Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining

Dan A. Simovici⁵,
Rosanne Vetro⁵ &
Kaixun Hua⁵

Chapter
First Online: 04 November 2016

681 Accesses
1 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 665))

Abstract

We introduce a measure of ultrametricity for dissimilarity spaces and examine transformations of dissimilarities that impact this measure. Then, we study the influence of ultrametricity on the behavior of two classes of data mining algorithms (kNN classification and PAM clustering) applied on dissimilarity spaces. We show that there is an inverse variation between ultrametricity and performance of classifiers. For clustering, increased ultrametricity generate clusterings with better separation. Lowering ultrametricity produces more compact clusters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amice, Y. (1975). Les nombres p-adiques. Paris: Presses Universitaires de France.
MATH Google Scholar
Barthélemy, J.-P., & Brucker, F. (2008). Binary clustering. Discrete Applied Mathematics, 156(8), 1237–1250.
Article MathSciNet MATH Google Scholar
Barthélemy, J.-P., Brucker, F., & Osswald, C. (2004). Combinatorial optimization and hierarchical classifications. 4OR, 2(3), 179–219.
Article MathSciNet MATH Google Scholar
Bertrand, P., & Janowitz, M. F. (2002). Pyramids and weak hierarchies in the ordinal model for clustering. Discrete Applied Mathematics, 122(1–3), 55–81.
Article MathSciNet MATH Google Scholar
Brucker, F. (2006). Sub-dominant theory in numerical taxonomy. Discrete Applied Mathematics, 154(7), 1085–1099.
Article MathSciNet MATH Google Scholar
Contreras, P., & Murtagh, F. (2012). Fast, linear time hierarchical clustering using the baire metric. Journal of Classification, 29(2), 118–143.
Article MathSciNet MATH Google Scholar
Deza, M. M., & Laurent, M. (1997). Geometry of cuts and metrics. Heidelberg: Springer.
Book MATH Google Scholar
Di Summa, M., Pritchard, D., & Sanità, L. (2015). Finding the closest ultrametric. Discrete Applied Mathematics, 180, 70–80.
Article MathSciNet MATH Google Scholar
Diatta, J., & Fichet, B. (1998). Quasi-ultrametrics and their 2-ball hypergraphs. Discrete Mathematics, 192(1–3), 87–102.
Article MathSciNet MATH Google Scholar
Gordon, A. D. (1981). Classification. London: Chapman and Hall.
MATH Google Scholar
Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society, Series (A), 150(2), 119–137.
Article MathSciNet MATH Google Scholar
Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. New York: Wiley.
MATH Google Scholar
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Book MATH Google Scholar
Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge: Cambridge University Press.
Book Google Scholar
Leclerc, B. (1985). La comparaison des hiérarchies: indices et métriques. Mathématiques et sciences humaines, 92, 5–40.
MATH Google Scholar
Lerman, I. C. (1981). Classification et Analyse Ordinale des Données. Paris: Dunod.
MATH Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010). Understanding of internal clustering validation measures. In 2010 IEEE 10th international conference on data mining (pp. 911–916). IEEE.
Google Scholar
Manning, C. D., Raghwan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650–1654.
Article Google Scholar
Murtagh, F., Downs, G., & Contreras, P. (2008). Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding. SIAM Journal on Scientific Computing, 30(2), 707–730.
Article MathSciNet MATH Google Scholar
Ninio, J. (1983). Molecular approaches to evolution. Princeton: Princeton University.
Book Google Scholar
Rammal, R., Angles d’Auriac, C., & Doucot, D. (1985). On the degree of ultrametricity. Le Journal de Physique - Letteres, 45, 945–952.
Article Google Scholar
Rammal, R., Toulouse, G., & Virasoro, M. A. (1986). Ultrametricity for physicists. Reviews of Modern Physics, 58, 765–788.
Article MathSciNet Google Scholar
Schikhof, W. H. (1984). Ultrametric calculus. Cambridge: Cambridge University Press.
MATH Google Scholar
Simovici, D. A., & Djeraba, C. (2014). Mathematical tools for data mining (2nd ed.). London: Springer.
Google Scholar
Tang, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley.
Google Scholar
Xiong, H., Wu, J., & Chen, J. (2009). K-means clustering versus validation measures: a data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(2), 318–331.
Article Google Scholar
Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 515–524). ACM.
Google Scholar

Download references

Acknowledgments

The authors wish to thank the referees for the careful reading of the paper and for their suggestions and observations.

Author information

Authors and Affiliations

University of Massachusetts Boston, Boston, USA
Dan A. Simovici, Rosanne Vetro & Kaixun Hua

Authors

Dan A. Simovici
View author publications
You can also search for this author in PubMed Google Scholar
Rosanne Vetro
View author publications
You can also search for this author in PubMed Google Scholar
Kaixun Hua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dan A. Simovici .

Editor information

Editors and Affiliations

Polytech Nantes, University of Nantes, Nantes Cedex 3, France
Fabrice Guillet
University of Bordeaux, Talence Cedex, France
Bruno Pinaud
Polytechnics Graduate School, University of Tours, Tours, France
Gilles Venturini

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Simovici, D.A., Vetro, R., Hua, K. (2017). Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-45763-5_8
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45762-8
Online ISBN: 978-3-319-45763-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics