Abstract
Jensen-Shannon divergence is a symmetrised, smoothed version of Küllback-Leibler. It has been shown to be the square of a proper distance metric, and has other properties which make it an excellent choice for many high-dimensional spaces in ℝ*.
The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible.
Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Connor, R.C.H., Simeoni, F., Iakovos, M., Moss, R.: Towards a universal information distance for structured data. In: Ferro, A. (ed.) SISAP, pp. 69–77. ACM (2011)
Dagan, I., Lee, L., Pereira, F.C.N.: Similarity-based models of word cooccurrence probabilities. Mach. Learn. 34(1-3), 43–69 (1999)
Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Transactions on Information Theory 49(7), 1858–1860 (2003)
Fuglede, B., Topsoe, F.: Jensen-shannon divergence and hilbert space embedding. In: Proceedings of International Symposium on Information Theory, ISIT 2004, p. 31 (2004)
Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: MIR 2008: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. ACM, New York (2008)
Küllback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Statist. 22(1), 79–86 (1951)
Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)
Österreicher, F., Vajda, I.: A new class of metric divergences on probability spaces and and its statistical applications. Ann. Inst. Statist. Math. 55, 639–653 (2003)
Radhakrishna Rao, C.: Diversity: Its measurement, decomposition, apportionment and analysis. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 44(1), 1–22 (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Connor, R., Cardillo, F.A., Moss, R., Rabitti, F. (2013). Evaluation of Jensen-Shannon Distance over Sparse Data. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-41062-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41061-1
Online ISBN: 978-3-642-41062-8
eBook Packages: Computer ScienceComputer Science (R0)