Abstract
In this paper we show how to reduce the computational cost of Clustering by Compression, proposed by Cilibrasi & Vitànyi, from O(n4) to O(n2). To that end, we adopte the Weighted Paired Group Method using Averages (WPGMA) method to the same similarity measure, based on compression, used in Clustering by Compression. Consequently, our proposed approach has easily classified thousands of data, where Cilibrasi & Vitànyi proposed algorithm shows its limits just for a hundred objects. We give also results of experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abrahams, J.: Code and parse trees for lossless source encoding. In: Proceedings of Compression and Complexity of Sequences, vol. 7.1, pp. 198–222 (1997)
Bennett, C.H., Gàcs, P., Li, M., Vitànyi, P.M.B., Zurek, W.: Information Distance. IEEE Transactions on Information Theory 44(4), 1407–1423 (1998)
Cilibrasi, R.: Statistical Inference Through Data Compression. Phd thesis, Amsterdam Universtity (2007)
Cilibrasi, R., Vitànyi, P.M.B.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Cilibrasi, R., Vitànyi, P.M.B.: Clustering by compression. IEEE Transactions on Information Theory 51(4) (2005)
Cilibrasi, R., Vitànyi, P.M.B.: A New Quartet Tree Heuristic for Hierarchical Clustering. In: IEEE/ACM Trans. Computat. Biol. Bioinf.; Presented at the EU- PASCAL Statistics and Optimization of Clustering Workshop, London, UK (2005) (submitted)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley & Sons, Chichester (1991)
Delahaye, J.P., Zenil, H.: Towards a stable definition of Kolmogorov-Chaitin complexity. Fundamenta informaticae, 1–15 (2008)
Delahaye, J.P.: Complexités, Aux limites des mathématiques et de l’informatique. In: Belin, pour la science (2006)
Gronau, I., Moran, S.: Optimal implementations of UPGMA and other common clustering algorithms. Information Processing Letters 104(6), 205–210 (2007)
Guindon, S.: Méthodes et algorithmes pour l’approche statistique en phylogénie. Phd thesis, Université Montpellier II (2003)
Huffman, D.A.: A method for the construction of minimum redundancy codes. In: Proceeding of the IRE, pp. 1098–1101 (1951)
Lavallée, I.: Complexité et algorithmique avancée “une introduction”. In: 2 éme édition Hermann éditeurs (2008)
Levorato, V., Le, T.V., Lamure, M., Bui, M.: Classification prétopologique basée sur la complexité de Kolmogorov. Studia Informatica 7.1, 198–222 (2009)
Levorato, V.: Contributions à la Modélisation des Réseaux Complexes: Prétopologie et Applications. Phd thesis, Université de Paris 8, Paris (2008)
Loewenstein, Y., Portugaly, E., Former, M.L., Linial, M.: Effecient algorithms for accurate hierarchical clustering of huge datasets: tacking the entire protein space. Bioinformatics, 145–171 (2008)
Li, M., Chen, X., Li, X., Ma, B., Vitànyi, P.M.B.: The similarity metric. IEEE Transactions on Information Theory 50(12) (2007)
Li, M., Vitànyi, P.M.B.: An introduction to Kolmogorov Complexity and its applications, 2nd edn. Springer, Heidelberg (1997)
Murtagh, F.: Complexities of hierarchic clustering algorithms: State of art. Computational Statistics Quarterly 1(2), 101–113 (1984)
Salemi, M., Vandamme, A.M.: The phylogenetic handbook: a practical approach to DNA and protein phylogeny. The Press Syndicate of the University of cambridge (2003)
Varré, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Said, F., Murat, A., Ivan, L., Marc, B., Sofiane, B. (2010). Clustering Based on Kolmogorov Information. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15387-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-15387-7_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15386-0
Online ISBN: 978-3-642-15387-7
eBook Packages: Computer ScienceComputer Science (R0)