Skip to main content

Information Distance and Its Applications

  • Conference paper
Implementation and Application of Automata (CIAA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4094))

Included in the following conference series:

Abstract

We summarize the recent developments of a general theory of information distance and its applications in whole genome phylogeny, document comparison, internet query-answer systems, and many other data mining tasks. We also solve an open problem regarding the universality of the normalized information distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ané, C., Sanderson, M.J.: Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. Systematic Biology 54(1), 146–157 (2005)

    Article  Google Scholar 

  2. Bennett, C.H., Gacs, P., Li, M., Vitanyi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44(4), 1407–1423 (1998) (STOC, 1993)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bennett, C.H., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003) (feature article)

    Article  Google Scholar 

  4. Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. 88(4), 048702 (2002)

    Article  Google Scholar 

  5. Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Information Theory 50(7), 1545–1550 (2004)

    Article  MathSciNet  Google Scholar 

  6. Chernov, A.V., Muchnik, A., Romashchenko, A.E., Shen, A.K., Vereshchagin, N.K.: Upper semi-lattice of binary strings with the relation x is simple conditional to y. Theoret. Comput. Sci. 271, 69–95 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  7. Cilibrasi, R., Vitányi, P.M.B., de Wolf, R.: Algorithmic clustring of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)

    Article  Google Scholar 

  8. Cilibrasi, R., Vitányi, P.M.B.: Automatic semantics using Google (manuscript, 2005) (2004), http://arxiv.org/abs/cs.CL/0412098

  9. Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  Google Scholar 

  10. Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18(4), 1111–1123 (2005)

    Article  Google Scholar 

  11. Emanuel, K., Ravela, S., Vivant, E., Risi, C.: A combined statistical-deterministic approach of hurricane risk assessment. Program in Atmospheres, Oceans, and Climate. MIT (manuscript, 2005)

    Google Scholar 

  12. Hao, Y., Zhang, X., Zhu, X., Li, M.: Conditional normalized information distance (manuscript, 2006)

    Google Scholar 

  13. Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004, pp. 206–215 (2004)

    Google Scholar 

  14. Kirk, S.R., Jenkins, S.: Information theory-based software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)

    Article  Google Scholar 

  15. Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. Europhys. Lett. 70(2), 278–284 (2005)

    Article  MathSciNet  Google Scholar 

  16. Kocsor, A., Kertesz-Farkas, A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22(4), 407–412 (2006)

    Article  Google Scholar 

  17. Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)

    Article  Google Scholar 

  18. Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)

    Article  Google Scholar 

  19. Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Trans. Information Theory 50(12), 3250–3264 (2004)

    Article  MathSciNet  Google Scholar 

  20. Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications, 2nd edn., p. 637. Springer, Heidelberg (1997)

    MATH  Google Scholar 

  21. Muchnik, A.: Conditional comlexity and codes. Theoretical Computer Science 271(1), 97–109 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  22. Muchnik, A., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity II. In: Proc. 16th Conf. Comput. Complexity, pp. 256–265 (2001)

    Google Scholar 

  23. Otu, H.H., Sayood, K.: Bioinformatics. A new sequence distance measure for phylogenetic tree construction 19(6), 2122–2130 (2003)

    Google Scholar 

  24. Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: Int’l Conf. Comput. Intell., Istanbul Turkey, December 17-19 (2004)

    Google Scholar 

  25. Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Inf. Retrieval (2005), http://www.emse.fr/OSWIR05/

  26. Costa Santos, C., Bernardes, J., Vitányi, P.M.B., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proc. 19th IEEE Intn’l Symp. Computer-Based Medical Systems, Salt Lake City, Utah, June 22-23 (2006)

    Google Scholar 

  27. Shen, A.K., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity. Theoret. Comput. Sci. 271, 125–129 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  28. Taha, W., Crosby, S., Swadi, K.: A new approach to data mining for software design, Rice Univ. (manuscript, 2006)

    Google Scholar 

  29. Varre, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)

    Article  Google Scholar 

  30. Vereshchagin, N.K., V’yugin, M.V.: Independent minimum length programs to translate between given strings. Theoret. Comput. Sci. 271, 131–143 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  31. V’yugin, M.V.: Information distance and conditional complexities. Theoret. Comput. Sci. 271, 145–150 (2002)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, M. (2006). Information Distance and Its Applications. In: Ibarra, O.H., Yen, HC. (eds) Implementation and Application of Automata. CIAA 2006. Lecture Notes in Computer Science, vol 4094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11812128_1

Download citation

  • DOI: https://doi.org/10.1007/11812128_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37213-4

  • Online ISBN: 978-3-540-37214-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics