Clustering Based on Kolmogorov Information

Said, Fouchal; Murat, Ahat; Ivan, Lavallée; Marc, Bui; Sofiane, Benamor

doi:10.1007/978-3-642-15387-7_49

Fouchal Said^23,24,
Ahat Murat^23,24,
Lavallée Ivan^23,24,
Bui Marc^23,24 &
…
Benamor Sofiane^23,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6276))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1789 Accesses

Abstract

In this paper we show how to reduce the computational cost of Clustering by Compression, proposed by Cilibrasi & Vitànyi, from O(n⁴) to O(n²). To that end, we adopte the Weighted Paired Group Method using Averages (WPGMA) method to the same similarity measure, based on compression, used in Clustering by Compression. Consequently, our proposed approach has easily classified thousands of data, where Cilibrasi & Vitànyi proposed algorithm shows its limits just for a hundred objects. We give also results of experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abrahams, J.: Code and parse trees for lossless source encoding. In: Proceedings of Compression and Complexity of Sequences, vol. 7.1, pp. 198–222 (1997)
Google Scholar
Bennett, C.H., Gàcs, P., Li, M., Vitànyi, P.M.B., Zurek, W.: Information Distance. IEEE Transactions on Information Theory 44(4), 1407–1423 (1998)
Article MATH Google Scholar
Cilibrasi, R.: Statistical Inference Through Data Compression. Phd thesis, Amsterdam Universtity (2007)
Google Scholar
Cilibrasi, R., Vitànyi, P.M.B.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Article Google Scholar
Cilibrasi, R., Vitànyi, P.M.B.: Clustering by compression. IEEE Transactions on Information Theory 51(4) (2005)
Google Scholar
Cilibrasi, R., Vitànyi, P.M.B.: A New Quartet Tree Heuristic for Hierarchical Clustering. In: IEEE/ACM Trans. Computat. Biol. Bioinf.; Presented at the EU- PASCAL Statistics and Optimization of Clustering Workshop, London, UK (2005) (submitted)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley & Sons, Chichester (1991)
Book MATH Google Scholar
Delahaye, J.P., Zenil, H.: Towards a stable definition of Kolmogorov-Chaitin complexity. Fundamenta informaticae, 1–15 (2008)
Google Scholar
Delahaye, J.P.: Complexités, Aux limites des mathématiques et de l’informatique. In: Belin, pour la science (2006)
Google Scholar
Gronau, I., Moran, S.: Optimal implementations of UPGMA and other common clustering algorithms. Information Processing Letters 104(6), 205–210 (2007)
Article MATH MathSciNet Google Scholar
Guindon, S.: Méthodes et algorithmes pour l’approche statistique en phylogénie. Phd thesis, Université Montpellier II (2003)
Google Scholar
Huffman, D.A.: A method for the construction of minimum redundancy codes. In: Proceeding of the IRE, pp. 1098–1101 (1951)
Google Scholar
Lavallée, I.: Complexité et algorithmique avancée “une introduction”. In: 2 éme édition Hermann éditeurs (2008)
Google Scholar
Levorato, V., Le, T.V., Lamure, M., Bui, M.: Classification prétopologique basée sur la complexité de Kolmogorov. Studia Informatica 7.1, 198–222 (2009)
Google Scholar
Levorato, V.: Contributions à la Modélisation des Réseaux Complexes: Prétopologie et Applications. Phd thesis, Université de Paris 8, Paris (2008)
Google Scholar
Loewenstein, Y., Portugaly, E., Former, M.L., Linial, M.: Effecient algorithms for accurate hierarchical clustering of huge datasets: tacking the entire protein space. Bioinformatics, 145–171 (2008)
Google Scholar
Li, M., Chen, X., Li, X., Ma, B., Vitànyi, P.M.B.: The similarity metric. IEEE Transactions on Information Theory 50(12) (2007)
Google Scholar
Li, M., Vitànyi, P.M.B.: An introduction to Kolmogorov Complexity and its applications, 2nd edn. Springer, Heidelberg (1997)
MATH Google Scholar
Murtagh, F.: Complexities of hierarchic clustering algorithms: State of art. Computational Statistics Quarterly 1(2), 101–113 (1984)
MATH MathSciNet Google Scholar
Salemi, M., Vandamme, A.M.: The phylogenetic handbook: a practical approach to DNA and protein phylogeny. The Press Syndicate of the University of cambridge (2003)
Google Scholar
Varré, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1998)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique et des Systèmes Complexes, 41, rue Gay Lussac, 75005, Paris
Fouchal Said, Ahat Murat, Lavallée Ivan, Bui Marc & Benamor Sofiane
CNRS UMI ESS 3189 UCAD Dakar BP 5005,
Fouchal Said, Ahat Murat, Lavallée Ivan, Bui Marc & Benamor Sofiane

Authors

Fouchal Said
View author publications
You can also search for this author in PubMed Google Scholar
Ahat Murat
View author publications
You can also search for this author in PubMed Google Scholar
Lavallée Ivan
View author publications
You can also search for this author in PubMed Google Scholar
Bui Marc
View author publications
You can also search for this author in PubMed Google Scholar
Benamor Sofiane
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, The Parade, Cardiff University, CF24 3AA, Cardiff, UK
Rossitza Setchi
Dept. of Computer Science and Software Engineering, BUckingham Building, Lion Terrace, University of Portsmouth, PO1 3HE, Portsmouth, UK
Ivan Jordanov
KES International, 145-157, St. John Street, EC1V 4PY, London, UK
Robert J. Howlett
School of Electrical and Information Engineering, University of South Australia, ,, Adelaide, Mawson Lakes Campus, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Said, F., Murat, A., Ivan, L., Marc, B., Sofiane, B. (2010). Clustering Based on Kolmogorov Information. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15387-7_49

Download citation

DOI: https://doi.org/10.1007/978-3-642-15387-7_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15386-0
Online ISBN: 978-3-642-15387-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics