skip to main content
10.1145/1132516.1132597acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

Published: 21 May 2006 Publication History

Abstract

We introduce a new low-distortion embedding of l2d into lpO(log n) (p=1,2), called the Fast-Johnson-Linden-strauss-Transform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for low-distortion embeddings. We overcome this handicap by exploiting the "Heisenberg principle" of the Fourier transform, ie, its local-global duality. The FJLT can be used to speed up search algorithms based on low-distortion embeddings in l1 and l2. We consider the case of approximate nearest neighbors in l2d. We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.

References

[1]
Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins, Journal of Comp. & Sys. Sci. 66 (2003), 671--687.
[2]
Alon, N., Spencer, J. The probabilistic method, John Wiley, 2nd edition, 2000.
[3]
Alon, N. Problems and results in extremal combinatorics, I, Discrete Math. 273 (2003), 31--53.
[4]
Arya, S., Mount, D.M. Approximate nearest neighbor searching, Proc. 4th Annu. ACM-SIAM Symp. Disc. Alg. (1993), 271--280.
[5]
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A. An optimal algorithm for approximate nearest neighbor searching, J. ACM 45 (1998), 891--923.
[6]
Bern, M. Approximate closest-point queries in high dimensions, Inform. Process. Lett. 45 (1993), 95--99.
[7]
Bingham, E., Mannila, H. Random projection in dimensionality reduction: applications to image and text data, Knowledge Discovery and Data Mining (2001), 245--250.
[8]
Borodin, A., Ostrovsky, R., Rabani, Y. Lower bounds for high dimensional nearest neighbor search and related problems, Proc. 31st STOC (1999), 312--321.
[9]
Carlen, E. A., Carvalho, M. C., Loss, M. Determination of the Spectral Gap for Kac's Master Equation and Related Stochastic Evolutions, Preprint arXiv:math-ph/0109003, (2000)
[10]
Chakrabarti, A., Regev, O. An optimal randomised cell probe lower bound for approximate nearest neighbor searching, Proc. 44th FOCS (2004).
[11]
Chan, T. Approximate nearest neighbor queries revisited, Proc. 13th Annu. ACM Symp. Comput. Geom. (1997), 352--358.
[12]
Clarkson, K.L. An algorithm for approximate closest-point queries, Proc. 10th Annu. ACM Symp. Comput. Geom. 10 (1994), 160--164.
[13]
Clarkson, K.L. Nearest neighbor queries in metric spaces, Proc. 29th Annu. ACM Sympos. Theory Comput., 1997.
[14]
Dasgupta, S., Gupta, A. An elementary proof of the Johnson-Lindenstrauss lemma, Technical Report 99-006, UC Berkeley, March 1999.
[15]
Diaconis, P., Saloff-Coste, L. Bounds for Kac's Master Equation, Communications in Mathematical Physics 209(3), (2000), 729--755
[16]
Farach-Colton, M., Indyk, P. Approximate nearest neighbor algorithms for Hausdorff metrics via embeddings, Proc. 40th FOCS (1999).
[17]
Feige, U., Peleg, D., Raghavan, P., Upfal, E. Computing with unreliable information, Proc. 20nd STOC (1990), 128--137.
[18]
Frankl, P., Maehara, H. The Johnson-Lindenstrauss lemma and the sphericity of some graphs, Journal of Combinatorial Theory Series A, 44 (1987), 355--362.
[19]
Har-Peled, S. A replacement for Voronoi diagrams of near linear size, Proc. FOCS (2001), 94--103.
[20]
Hastings, W. Monte Carlo sampling methods using Markov chains and their applications, Biometrica 57, 97--109
[21]
Indyk, P. On approximate nearest neighbors in non-Euclidean spaces, Proc. 39th FOCS (1999).
[22]
Indyk, P. High-dimensional computational geometry, Thesis (2000), Stanford University.
[23]
Indyk, P. Dimensionality reduction techniques for proximity problems, Proc. SODA (2000), 371--378.
[24]
Indyk, P. Nearest neighbors in high-dimensional spaces, Handbook of Discrete and Computational Geometry, eds., J.E. Goodman and J. O'Rourke, CRC Press (2004).
[25]
Indyk, P., Matousek, J. Low-distortion embeddings of finite metric spaces, Handbook of Discrete and Computational Geometry, eds., J.E. Goodman and J. O'Rourke, CRC Press (2004).
[26]
Indyk, P., Motwani, R. Approximate nearest neighbors: towards removing the curse of dimensionality, Proc. 30th STOC (1998), 604--613.
[27]
Johnson, W.B., Lindenstrauss, J. Extensions of Lipschitz mappings into a Hilbert space, Contemp. Math. 26 (1984), 189--206.
[28]
Kac, M. Probability and related topics in physical science, Wiley Interscience, N.Y.
[29]
Kleinberg, J. Two algorithms for nearest neighbor search in high dimensions, Proc. 29th STOC (1997), 599--608.
[30]
Kushilevitz, E., Ostrovsky, R., Rabani, Y. Efficient search for approximate nearest neighbor in high-dimensional spaces, SIAM J. Comput. 30 (2000), 457--474.
[31]
Matousek, J. Lectures on Discrete Geometry, Springer, May 2002.
[32]
Muthukrishnan, S., Sahinalp, S. C., Simple and practical sequence nearest neighbors with block operations, Proc. 13th Annual Symposium on Combinatorial Pattern Matching (2002)
[33]
Pak, I. Using Stopping Times to Bound Mixing Times, Proc. SODA (1998)
[34]
Yianilos, P.N. Data structures and algorithms for nearest neighbor search in general metric spaces, Proc. 2nd Annual ACM-SIAM Symp. Disc. Alg. (1993), 311--321.
[35]
Yianilos, P.N. Locally lifting the curse of dimensionality for nearest neighbor search, Proc. SODA (2000), 361--370.

Cited By

View all
  • (2024)Improved communication-privacy trade-offs in L2 mean estimation under streaming differential privacyProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692338(6973-6991)Online publication date: 21-Jul-2024
  • (2024)THCProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691891(1191-1211)Online publication date: 16-Apr-2024
  • (2024)Generalizing Random Butterfly Transforms to Arbitrary Matrix SizesACM Transactions on Mathematical Software10.1145/369971450:4(1-23)Online publication date: 8-Oct-2024
  • Show More Cited By

Index Terms

  1. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC '06: Proceedings of the thirty-eighth annual ACM symposium on Theory of Computing
    May 2006
    786 pages
    ISBN:1595931341
    DOI:10.1145/1132516
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 May 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Fourier transform
    2. Johnson-Lindenstrauss dimension reduction
    3. approximate nearest neighbor searching
    4. high-dimensional geometry

    Qualifiers

    • Article

    Conference

    STOC06
    Sponsor:
    STOC06: Symposium on Theory of Computing
    May 21 - 23, 2006
    WA, Seattle, USA

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improved communication-privacy trade-offs in L2 mean estimation under streaming differential privacyProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692338(6973-6991)Online publication date: 21-Jul-2024
    • (2024)THCProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691891(1191-1211)Online publication date: 16-Apr-2024
    • (2024)Generalizing Random Butterfly Transforms to Arbitrary Matrix SizesACM Transactions on Mathematical Software10.1145/369971450:4(1-23)Online publication date: 8-Oct-2024
    • (2024)Bridging Dense and Sparse Maximum Inner Product SearchACM Transactions on Information Systems10.1145/366532442:6(1-38)Online publication date: 19-Aug-2024
    • (2024)Variational Bayes Ensemble Learning Neural Networks With Compressed Feature SpaceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317227635:1(1379-1385)Online publication date: Jan-2024
    • (2024)Gradient Coding With Iterative Block Leverage Score SamplingIEEE Transactions on Information Theory10.1109/TIT.2024.342022270:9(6639-6664)Online publication date: Sep-2024
    • (2024)Wyner-Ziv Estimators for Distributed Mean Estimation With Side Information and OptimizationIEEE Transactions on Information Theory10.1109/TIT.2023.331571970:4(2779-2806)Online publication date: Apr-2024
    • (2024)TS-RTPM-Net: Data-Driven Tensor Sketching for Efficient CP DecompositionIEEE Transactions on Big Data10.1109/TBDATA.2023.331025410:1(1-11)Online publication date: Feb-2024
    • (2024)S$^\text{3}$Attention: Improving Long Sequence Attention With Smoothed Skeleton SketchingIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2024.344617318:6(985-996)Online publication date: Sep-2024
    • (2024)Iterative Sketching for Secure Coded RegressionIEEE Journal on Selected Areas in Information Theory10.1109/JSAIT.2024.33843955(148-161)Online publication date: 2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media