Abstract
In this expository article, we discuss a complementary notion of information distance for partial matches. Information Distance D max (x,y) = max { K(x|y), K(y|x) } measures the distance between two objects by the minimum number of bits that are needed to convert them to each other. However, in many applications, often some objects contain too much irrelevant information which overwhelms the relevant information we are interested in. Information distance based on partial information also should not satisfy the triangle inequality. To deal with such problems, we have introduced an information distance for partial matching. It turns out the new notion is precisely D min (x,y) = min { K(x|y), K(y|x) }, complementary to the D max distance. I will give some recent applications of the D min distance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ané, C., Sanderson, M.J.: Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. Systematic Biology 54(1), 146–157 (2005)
Bennett, C.H., Gacs, P., Li, M., Vitanyi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44(4), 1407–1423 (1998) (STOC 1993)
Bennett, C.H., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003) (feature article)
Chaitin, G.J.: On the Simplicity and Speed of Programs for Computing Infinite Sets of Natural Numbers. Journal of the ACM 16(3), 407
Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Information Theory 50(7), 1545–1550 (2004)
Chernov, A.V., Muchnik, A.A., Romashchenko, A.E., Shen, A.K., Vereshchagin, N.K.: Upper semi-lattice of binary strings with the relation “x is simple conditional to y”. Theoret. Comput. Sci. 271, 69–95 (2002)
Cilibrasi, R., Vitányi, P.M.B., de Wolf Algorithmic, R.: clustring of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)
Cilibrasi, R., Vitányi, P.M.B.: The Google similarity distance. IEEE Trans. Knowledge and Data Engineering 19(3), 370–383 (2007)
Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)
Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18(4), 1111–1123 (2005)
Fagin, R., Stockmeyer, L.: Relaxing the triangle inequality in pattern matching. Int’l J. Comput. Vision 28(3), 219–231 (1998)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004, pp. 206–215 (2004)
Kirk, S.R., Jenkins, S.: Information theory-baed software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)
Kolmogorov, A.N.: Three Approaches to the Quantitative Definition of Information. Problems Inform. Transmission 1(1), 1–7 (1965)
Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. Europhys. Lett. 70(2), 278–284 (2005)
Kocsor, A., Kertesz-Farkas, A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22(4), 407–412 (2006)
Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)
Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Trans. Information Theory 50(12), 3250–3264 (2004)
Li, M.: Information distance and its applications. Int’l J. Found. Comput. Sci. 18(4), 669–681 (2007)
Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer (2008)
Muchnik, A.A.: Conditional comlexity and codes. Theoretical Computer Science 271(1), 97–109 (2002)
Muchnik, A.A., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity II. In: Proc. 16th Conf. Comput. Complexity, pp. 256–265 (2001)
Nykter, M., Price, N.D., Larjo, A., Aho, T., Kauffman, S.A., Yli-Harja, O., Shmulevich, I.: Critical networks exhibit maximal information diversity in structure-dynamics relationships. Phy. Rev. Lett. 100, 058702(4) (2008)
Nykter, M., Price, N.D., Aldana, M., Ramsey, S.A., Kauffman, S.A., Hood, L.E., Yli-Harja, O., Shmulevich, I.: Gene expression dynamics in the macrophage exhibit criticality. Proc. Nat. Acad. Sci. USA 105(6), 1897–1900 (2008)
Otu, H.H., Sayood, K.: Bioinformatics 19(6), 2122–2130 (2003); A new sequence distance measure for phylogenetic tree construction
Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: Int’l Conf. Comput. Intell., Istanbul, Turkey, December 17-19 (2004)
Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Inf. Retrieval (2005), http://www.emse.fr/OSWIR05
Costa Santos, C., Bernardes, J., Vitányi, P.M.B., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proc. 19th IEEE Intn’l Symp. Computer-Based Medical Systems, Salt Lake City, Utah, June 22-23 (2006)
Shen, A.K., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity. Theoret. Comput. Sci. 271, 125–129 (2002)
Solomonoff, R.: A Formal Theory of Inductive Inference, Part I. d Information and Control 7(1), 1–22 (1964)
Solomonoff, R.: A Formal Theory of Inductive Inference, Part II. Information and Control 7(2), 224–254 (1964)
Varre, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)
Veltkamp, R.C.: Shape Matching: Similarity Measures and Algorithms, invited talk. In: Proc. Int’l Conf. Shape Modeling Applications 2001, Italy, pp. 188–197 (2001)
Vereshchagin, N.K., V’yugin, M.V.: Independent minimum length programs to translate between given strings. Theoret. Comput. Sci. 271, 131–143 (2002)
V’yugin, M.V.: Information distance and conditional complexities. Theoret. Comput. Sci. 271, 145–150 (2002)
Wallace, C.S., Dowe, D.L.: Minimum Message Length and Kolmogorov Complexity. Computer Journal 42(4) (1999)
Yang, T., Wang, D., Zhu, X., Li, M.: Information distance between what I said and what it heard. Manuscript in preparation (August. 2011)
Zhang, X., Hao, Y., Zhu, X., Li, M.: Information distance from a question to an answer. In: Proc. 13th ACM SIGKDD, August 12-15, pp. 874–883 (2007)
Zhang, X., Hao, Y., Zhu, X., Li, M.: New information measure and its application in question answering system. J. Comput. Sci. Tech. 23(4), 557–572 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, M. (2013). Partial Match Distance. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-44958-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44957-4
Online ISBN: 978-3-642-44958-1
eBook Packages: Computer ScienceComputer Science (R0)