Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7070))

  • 1726 Accesses

Abstract

In this expository article, we discuss a complementary notion of information distance for partial matches. Information Distance D max (x,y) =  max { K(x|y), K(y|x) } measures the distance between two objects by the minimum number of bits that are needed to convert them to each other. However, in many applications, often some objects contain too much irrelevant information which overwhelms the relevant information we are interested in. Information distance based on partial information also should not satisfy the triangle inequality. To deal with such problems, we have introduced an information distance for partial matching. It turns out the new notion is precisely D min (x,y) =  min { K(x|y), K(y|x) }, complementary to the D max distance. I will give some recent applications of the D min distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ané, C., Sanderson, M.J.: Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. Systematic Biology 54(1), 146–157 (2005)

    Article  Google Scholar 

  2. Bennett, C.H., Gacs, P., Li, M., Vitanyi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44(4), 1407–1423 (1998) (STOC 1993)

    Google Scholar 

  3. Bennett, C.H., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003) (feature article)

    Google Scholar 

  4. Chaitin, G.J.: On the Simplicity and Speed of Programs for Computing Infinite Sets of Natural Numbers. Journal of the ACM 16(3), 407

    Google Scholar 

  5. Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Information Theory 50(7), 1545–1550 (2004)

    Article  MathSciNet  Google Scholar 

  6. Chernov, A.V., Muchnik, A.A., Romashchenko, A.E., Shen, A.K., Vereshchagin, N.K.: Upper semi-lattice of binary strings with the relation “x is simple conditional to y”. Theoret. Comput. Sci. 271, 69–95 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Cilibrasi, R., Vitányi, P.M.B., de Wolf Algorithmic, R.: clustring of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)

    Article  Google Scholar 

  8. Cilibrasi, R., Vitányi, P.M.B.: The Google similarity distance. IEEE Trans. Knowledge and Data Engineering 19(3), 370–383 (2007)

    Article  Google Scholar 

  9. Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  Google Scholar 

  10. Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18(4), 1111–1123 (2005)

    Article  Google Scholar 

  11. Fagin, R., Stockmeyer, L.: Relaxing the triangle inequality in pattern matching. Int’l J. Comput. Vision 28(3), 219–231 (1998)

    Article  Google Scholar 

  12. Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004, pp. 206–215 (2004)

    Google Scholar 

  13. Kirk, S.R., Jenkins, S.: Information theory-baed software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)

    Article  Google Scholar 

  14. Kolmogorov, A.N.: Three Approaches to the Quantitative Definition of Information. Problems Inform. Transmission 1(1), 1–7 (1965)

    MathSciNet  Google Scholar 

  15. Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. Europhys. Lett. 70(2), 278–284 (2005)

    Article  MathSciNet  Google Scholar 

  16. Kocsor, A., Kertesz-Farkas, A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22(4), 407–412 (2006)

    Article  Google Scholar 

  17. Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)

    Article  Google Scholar 

  18. Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)

    Article  Google Scholar 

  19. Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Trans. Information Theory 50(12), 3250–3264 (2004)

    Article  MathSciNet  Google Scholar 

  20. Li, M.: Information distance and its applications. Int’l J. Found. Comput. Sci. 18(4), 669–681 (2007)

    Article  MATH  Google Scholar 

  21. Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer (2008)

    Google Scholar 

  22. Muchnik, A.A.: Conditional comlexity and codes. Theoretical Computer Science 271(1), 97–109 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  23. Muchnik, A.A., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity II. In: Proc. 16th Conf. Comput. Complexity, pp. 256–265 (2001)

    Google Scholar 

  24. Nykter, M., Price, N.D., Larjo, A., Aho, T., Kauffman, S.A., Yli-Harja, O., Shmulevich, I.: Critical networks exhibit maximal information diversity in structure-dynamics relationships. Phy. Rev. Lett. 100, 058702(4) (2008)

    Google Scholar 

  25. Nykter, M., Price, N.D., Aldana, M., Ramsey, S.A., Kauffman, S.A., Hood, L.E., Yli-Harja, O., Shmulevich, I.: Gene expression dynamics in the macrophage exhibit criticality. Proc. Nat. Acad. Sci. USA 105(6), 1897–1900 (2008)

    Article  Google Scholar 

  26. Otu, H.H., Sayood, K.: Bioinformatics 19(6), 2122–2130 (2003); A new sequence distance measure for phylogenetic tree construction

    Google Scholar 

  27. Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: Int’l Conf. Comput. Intell., Istanbul, Turkey, December 17-19 (2004)

    Google Scholar 

  28. Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Inf. Retrieval (2005), http://www.emse.fr/OSWIR05

  29. Costa Santos, C., Bernardes, J., Vitányi, P.M.B., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proc. 19th IEEE Intn’l Symp. Computer-Based Medical Systems, Salt Lake City, Utah, June 22-23 (2006)

    Google Scholar 

  30. Shen, A.K., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity. Theoret. Comput. Sci. 271, 125–129 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  31. Solomonoff, R.: A Formal Theory of Inductive Inference, Part I. d Information and Control 7(1), 1–22 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  32. Solomonoff, R.: A Formal Theory of Inductive Inference, Part II. Information and Control 7(2), 224–254 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  33. Varre, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)

    Article  Google Scholar 

  34. Veltkamp, R.C.: Shape Matching: Similarity Measures and Algorithms, invited talk. In: Proc. Int’l Conf. Shape Modeling Applications 2001, Italy, pp. 188–197 (2001)

    Google Scholar 

  35. Vereshchagin, N.K., V’yugin, M.V.: Independent minimum length programs to translate between given strings. Theoret. Comput. Sci. 271, 131–143 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  36. V’yugin, M.V.: Information distance and conditional complexities. Theoret. Comput. Sci. 271, 145–150 (2002)

    Article  MathSciNet  Google Scholar 

  37. Wallace, C.S., Dowe, D.L.: Minimum Message Length and Kolmogorov Complexity. Computer Journal 42(4) (1999)

    Google Scholar 

  38. Yang, T., Wang, D., Zhu, X., Li, M.: Information distance between what I said and what it heard. Manuscript in preparation (August. 2011)

    Google Scholar 

  39. Zhang, X., Hao, Y., Zhu, X., Li, M.: Information distance from a question to an answer. In: Proc. 13th ACM SIGKDD, August 12-15, pp. 874–883 (2007)

    Google Scholar 

  40. Zhang, X., Hao, Y., Zhu, X., Li, M.: New information measure and its application in question answering system. J. Comput. Sci. Tech. 23(4), 557–572 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, M. (2013). Partial Match Distance. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44958-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44957-4

  • Online ISBN: 978-3-642-44958-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics