Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity

Eğecioğlu, Ömer

doi:10.1007/3-540-44794-6_7

Ömer Eğecioğlu³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2168))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2632 Accesses
4 Citations

Abstract

We introduce a spectrum of algorithms for measuring the similarity of high-dimensional vectors in Euclidean space. The algorithms proposed consist of a convex combination of two measures: one which contains summary data about the shape of a vector, and the other about the relative magnitudes of the coordinates. The former is based on a concept called bin-score permutations and a metric to quantify similarity of permutations, the latter on another novel approximation for inner-product computations based on power symmetric functions, which generalizes the Cauchy-Schwarz inequality. We present experiments on time-series data on labor statistics unemployment figures that show the effectiveness of the algorithm as a function of the parameter that combines the two parts.

Supported in part by NSF Grant No. CCR–9821038.

Download to read the full chapter text

Chapter PDF

Multivariate power series interpoint distances

Article 30 January 2020

Multidimensional Pólya-type functions

Article 21 June 2024

On (p₁,…,p_k)-spherical distributions

Article Open access 12 June 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

R. Agrawal, K-I. Lin, H. S. Sawhney, and K. Shim. Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. The VLDB Journal, pp. 490–501, 1995.
Google Scholar
B. Bollobas, G. Das, D. Gunopulos, and H. Mannila. Time-Series Similarity Problems and Well-Separated Geometric Sets. Proc. of 13th Annual ACM Symposium on Computational Geometry, Nice, France, pp. 454–456, 1997.
Google Scholar
R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In 4th Int. Conference on Foundations of Data Organization and Algorithms, pp. 69–84, 1993.
Google Scholar
S. Berchtold, D. Keim, and H. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the Int. Conf. on Very Large Data Bases, pp. 28–39, Bombay, India, 1996.
Google Scholar
S. Berchtold, C. Bohm, D. Keim, and H. Kriegel. A cost model for nearest neighbor search in high-dimensional data space. In Proc. ACM Symp. on Principles of Database Systems, Tuscon, Arizona, 1997.
Google Scholar
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Launder, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407, 1990.
Article Google Scholar
Ö. Eğecioğlu. How to approximate the inner-product: fast dynamic algorithms for Euclidean similarity. Technical Report TRCS98-37, Department of Computer Science, University of California at Santa Barbara, December 1998.
Google Scholar
Ö. Eğecioğlu and H. Ferhatosmanoğlu, Dimensionality reduction and similarity computation by inner product approximations. Proc. 9th Int. Conf. on Information and Knowledge Management (CIKM’00), Nov. 2000, Washington DC.
Google Scholar
V. Estivill-Castro and D. Wood. A Survey of Adaptive Sorting Algorithms. ACM Computing Surveys, Vol. 24, No. 4, pp. 441–476, 1992.
Article Google Scholar
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 419–429, Minneapolis, May 1994.
Google Scholar
A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 47–57, 1984.
Google Scholar
N.A.J. Hastings and J.B. Peacock. Statistical Distributions, Halsted Press, New York, 1975.
Google Scholar
D. Hull. Improving text retrieval for the routing problem using latent semantic indexing. In Proc. of the 17th ACM-SIGIR Conference, pp. 282–291, 1994.
Google Scholar
J. E. Humphreys. Reflection Groups and Coxeter Groups, Cambridge Studies in Advanced Mathematics, No. 29, Cambridge Univ. Press, Cambridge, 1990.
MATH Google Scholar
D. Knuth. The art of computer programming (Vol. III), Addison-Wesley, Reading, MA, 1973.
Google Scholar
Korn F., Sidiropoulos N., Faloutsos C., Siegel E., and Protopapas Z. Fast nearest neighbor search in medical image databases. In Proceedings of the Int. Conf. on Very Large Data Bases, pages 215–226, Mumbai, India, 1996.
Google Scholar
C-S. Perng, H. Wang, S. R. Zhang, and D. S. Parker. Landmarks: a new model for similarity-based pattern querying in time-series databases. Proc. of the 16-th ICDE, San Diego, CA, 2000.
Google Scholar
T. Seidl and Kriegel H.-P. Efficient user-adaptable similarity search in large multimedia databases. In Proceedings of the Int. Conf. on Very Large Data Bases, pages 506–515, Athens, Greece, 1997.
Google Scholar
D. White and R. Jain. Similarity indexing with the SS-tree. In Proc. Int. Conf. Data Engineering, pp. 516–523, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of California, Santa Barbara, CA, 93106, USA
Ömer Eğecioğlu

Authors

Ömer Eğecioğlu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Inst.of Information and Computing Sciences Dept. of Mathematics and Computer Science, University of Utrecht, Padualaan 14, de Uithof, 3508, TB Utrecht, The Netherlands
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eğecioğlu, Ö. (2001). Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_7

Download citation

DOI: https://doi.org/10.1007/3-540-44794-6_7
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics