Skip to main content

An Efficient Exact Nearest Neighbor Search by Compounded Embedding

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Included in the following conference series:

  • 3551 Accesses

Abstract

Nearest neighbor search (NNS) in high dimensional space is a fundamental and essential operation in applications from many domains, such as machine learning, databases, multimedia and computer vision. In this paper, we first propose a novel and effective distance lower bound computation technique for Euclidean distance by using the combination of linear and non-linear embedding methods. As such, each point in a high dimensional space can be embedded into a low dimensional space such that the distance between two embedded points lower bounds their distance in the original space. Following the filter-and-verify paradigm, we develop an efficient exact NNS algorithm by pruning candidates using the new lower bounding technique and hence reducing the cost of expensive distance computation in high dimensional space. Our comprehensive experiments on 10 real-life and diverse datasets, including image, video, audio and text data, demonstrate that our new algorithm can significantly outperform the state-of-the-art exact NNS techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://hunch.net/~jl/projects/cover_tree/cover_tree.html.

  2. 2.

    http://research.yoonho.info/fnnne.

  3. 3.

    http://www.cs.toronto.edu/~kriz/cifar.html.

  4. 4.

    https://yadi.sk/d/I_yaFVqchJmoc.

  5. 5.

    http://groups.csail.mit.edu/vision/SUN/.

  6. 6.

    http://corpus-texmex.irisa.fr.

  7. 7.

    http://yann.lecun.com/exdb/mnist/.

  8. 8.

    http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.

  9. 9.

    http://phototour.cs.washington.edu/patches/default.htm.

  10. 10.

    http://www.cs.princeton.edu/cass/audio.tar.gz.

  11. 11.

    https://code.google.com/archive/p/word2vec/.

  12. 12.

    http://www.cs.tau.ac.il/~wolf/ytfaces/index.html.

References

  1. Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: KDD, pp. 29–38 (2015)

    Google Scholar 

  2. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  Google Scholar 

  3. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104. ACM (2006)

    Google Scholar 

  4. Dong, W., Charikar, M., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW, pp. 577–586 (2011)

    Google Scholar 

  5. Feng, X., Cui, J., Liu, Y., Li, H.: Effective optimizations of cluster-based nearest neighbor search in high-dimensional space. Multimedia Syst. 23(1), 139–153 (2017)

    Article  Google Scholar 

  6. Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2946–2953 (2013)

    Google Scholar 

  7. Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)

    MATH  Google Scholar 

  8. Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)

    Article  MathSciNet  Google Scholar 

  9. Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)

    Article  Google Scholar 

  10. Hwang, Y., Han, B., Ahn, H.-K.: A fast nearest neighbor search algorithm by nonlinear embedding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3053–3060. IEEE (2012)

    Google Scholar 

  11. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)

    Google Scholar 

  12. Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. (TODS) 30(2), 364–397 (2005)

    Article  Google Scholar 

  13. Jolliffe, I.T.: Principal component analysis and factor analysis. In: Jolliffe, I.T. (ed.) Principal Component Analysis, pp. 150–166. Springer, New York (2002). https://doi.org/10.1007/0-387-22440-8_7

    Chapter  MATH  Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  15. Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR, abs/1610.02455 (2016)

    Google Scholar 

  16. Liaw, Y.-C., Leou, M.-L., Wu, C.-M.: Fast exact k nearest neighbors search using an orthogonal search tree. Pattern Recogn. 43(6), 2351–2358 (2010)

    Article  Google Scholar 

  17. Liu, W., Wang, J., Kumar, S., Chang, S.: Hashing with graphs. In: ICML, pp. 1–8 (2011)

    Google Scholar 

  18. Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45, 61–68 (2014)

    Article  Google Scholar 

  19. Martinsson, P.-G., Rokhlin, V., Tygert, M.: A randomized algorithm for the decomposition of matrices. Appl. Comput. Harmonic Anal. 30(1), 47–68 (2011)

    Article  MathSciNet  Google Scholar 

  20. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  21. Ramaswamy, S., Rose, K.: Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans. Knowl. Data Eng. 23(6), 815–830 (2011)

    Article  Google Scholar 

  22. Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  23. Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. Proc. VLDB Endow. 8(1), 1–12 (2014)

    Article  Google Scholar 

  24. Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17(4), 401–419 (1952)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

Ying Zhang is supported by ARC DE140100679 and DP170103710. Wei Wang is supported by ARC DP170103710, and D2DCRC DC25002 and DC25003. Ivor W. Tsang is supported by ARC grant FT130100746, DP180100106, and LP150100671. Xuemin Lin is supported by NSFC 61672235, DP170101628 and DP180103096.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingjie Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, M., Zhang, Y., Sun, Y., Wang, W., Tsang, I.W., Lin, X. (2018). An Efficient Exact Nearest Neighbor Search by Compounded Embedding. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91452-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91451-0

  • Online ISBN: 978-3-319-91452-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics