Skip to main content

I/O Efficient Algorithm for c-Approximate Furthest Neighbor Search in High-Dimensional Space

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12114))

Included in the following conference series:

Abstract

Furthest Neighbor search in high-dimensional space has been widely used in many applications such as recommendation systems. Because of the “curse of dimensionality” problem, c-approximate furthest neighbor (C-AFN) is a substitute as a trade-off between result accuracy and efficiency. However, most of the current techniques for external memory are only suitable for low-dimensional space.

In this paper, we propose a novel algorithm called reverse incremental LSH based on Indyk’s LSH scheme to solve the problem with theoretical guarantee. Unlike the previous methods using hashing scheme, reverse incremental LSH (RI-LSH) is designed for external memory and can achieve a good performance on I/O cost. We provide rigorous theoretical analysis to prove that RI-LSH can return a \(c\)-AFN result with a constant possibility. Our comprehensive experiment results show that, compared with other \(c\)-AFN methods with theoretical guarantee, our algorithm can achieve better I/O efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, P.K., Matoušek, J., Suri, S.: Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Comput. Geom. 1(4), 189–201 (1992)

    Article  MathSciNet  Google Scholar 

  2. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD Record, vol. 19, pp. 322–331. ACM (1990)

    Google Scholar 

  3. Bentley, J.L.: Multidimensional binary search trees in database applications. IEEE Trans. Softw. Eng. 4, 333–340 (1979)

    Article  Google Scholar 

  4. Bespamyatnikh, S.: Dynamic algorithms for approximate neighbor searching. In: CCCG, pp. 252–257 (1996)

    Google Scholar 

  5. Curtin, R.R., et al.: MLPACK: a scalable C++ machine learning library. J. Mach. Learn. Res. 14, 801–805 (2013)

    MathSciNet  MATH  Google Scholar 

  6. Curtin, R.R., Gardner, A.B.: Fast approximate furthest neighbors with data-dependent hashing. arXiv preprint arXiv:1605.09784 (2016)

  7. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)

    Google Scholar 

  8. Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 541–552. ACM (2012)

    Google Scholar 

  9. Huang, Q., Feng, J., Fang, Q., Ng, W.: Two efficient hashing schemes for high-dimensional furthest neighbor search. IEEE Trans. Knowl. Data Eng. 29(12), 2772–2785 (2017)

    Article  Google Scholar 

  10. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)

    Google Scholar 

  11. Pagh, R., Silvestri, F., Sivertsen, J., Skala, M.: Approximate furthest neighbor with application to annulus query. Inf. Syst. 64, 152–162 (2017)

    Article  Google Scholar 

  12. Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1399–1408. ACM (2013)

    Google Scholar 

  13. Said, A., Kille, B., Jain, B.J., Albayrak, S.: Increasing diversity through furthest neighbor-based recommendation. In: Proceedings of the WSDM 2012 (2012)

    Google Scholar 

  14. Vasiloglou, N., Gray, A.G., Anderson, D.V.: Scalable semidefinite manifold learning. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 368–373. IEEE (2008)

    Google Scholar 

  15. Yao, B., Li, F., Kumar, P.: Reverse furthest neighbors in spatial databases. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 664–675. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanqi Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, W., Wang, H., Zhang, Y., Qin, L., Zhang, W. (2020). I/O Efficient Algorithm for c-Approximate Furthest Neighbor Search in High-Dimensional Space. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12114. Springer, Cham. https://doi.org/10.1007/978-3-030-59419-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59419-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59418-3

  • Online ISBN: 978-3-030-59419-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics