Skip to main content

Optimizing Fair Approximate Nearest Neighbor Searches Using Threaded B+-Trees

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2021)

Abstract

Similarity search in high-dimensional spaces is an important primitive operation in many diverse application domains. Locality Sensitive Hashing (LSH) is a popular technique for solving the Approximate Nearest Neighbor (ANN) problem in high-dimensional spaces. Along with creating fair machine learning models, there is also a need for creating data structures that target different types of fairness. In this paper, we propose a fair variant of the ANN problem that targets Equal opportunity in group fairness in the ANN domain. We formally introduce the notion of fair ANN for Equal opportunity in group fairness. Additionally, we present an efficient disk-based index structure for finding Fair approximate nearest neighbors using Locality Sensitive Hashing (FairLSH). Moreover, we present an advanced version of FairLSH that uses cost models to further balance the trade-off between I/O cost and processing time. Finally, we experimentally show that FairLSH returns fair results with a very low I/O cost and processing time when compared with the state-of-the-art LSH techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, A., et al.: A reductions approach to fair classification. arXiv (2018)

    Google Scholar 

  2. Aumüller, M., et al.: Fair near neighbor search via sampling. SIGMOD Rec. 50(1), 42–49 (2021)

    Article  Google Scholar 

  3. Aumüller, M., et al.: Fair near neighbor search: Independent range sampling in high dimensions. In: SIGMOD (2020)

    Google Scholar 

  4. Bera, S., et al.: Fair algorithms for clustering. In: NIPS (2019)

    Google Scholar 

  5. Chávez, E., et al.: Searching in metric spaces. CSUR 33(3), 273–321 (2001)

    Article  MathSciNet  Google Scholar 

  6. Chierichetti, F., et al.: Matroids, matchings, and fairness. In: AISTATS (2019)

    Google Scholar 

  7. Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5(2), 153–163 (2017)

    Article  Google Scholar 

  8. Datar, M., et al.: Locality-sensitive hashing scheme based on p-stable distributions. In: SOCG (2004)

    Google Scholar 

  9. Donini, M., et al.: Empirical risk minimization under fairness constraints. In: NIPS (2018)

    Google Scholar 

  10. Elzayn, H., et al.: Fair algorithms for learning in allocation problems. In: FAccT (2019)

    Google Scholar 

  11. Gan, J., et al.: Locality-sensitive hashing scheme based on dynamic collision counting. In: SIGMOD (2012)

    Google Scholar 

  12. Gionis, A., et al.: Similarity search in high dimensions via hashing. In: VLDB (1999)

    Google Scholar 

  13. Har-Peled, S., et al.: Near neighbor: who is the fairest of them all? In: NIPS (2019)

    Google Scholar 

  14. Hardt, M., et al.: Equality of opportunity in supervised learning. In: NIPS (2016)

    Google Scholar 

  15. Huang, Q., et al.: Query-aware locality-sensitive hashing for approximate nearest neighbor search. VLDB 9(1), 1–12 (2015)

    Google Scholar 

  16. Jafari, O., Nagarkar, P.: Experimental analysis of locality sensitive hashing techniques for high-dimensional approximate nearest neighbor searches. In: Qiao, M., Vossen, G., Wang, S., Li, L. (eds.) ADC 2021. LNCS, vol. 12610, pp. 62–73. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69377-0_6

    Chapter  Google Scholar 

  17. Jafari, O., et al.: A survey on locality sensitive hashing algorithms and their applications. arXiv (2021)

    Google Scholar 

  18. Kleinberg, J., et al.: Human decisions and machine predictions. QJE 133(1), 237–293 (2018)

    MATH  Google Scholar 

  19. Kleindessner, M., et al.: Guarantees for spectral clustering with fairness constraints. arXiv (2019)

    Google Scholar 

  20. Liu, W., et al.: I-LSH: I/O efficient c-approximate nearest neighbor search in high-dimensional space. In: ICDE (2019)

    Google Scholar 

  21. Lu, K., Kudo, M.: R2LSH: a nearest neighbor search scheme based on two-dimensional projected spaces. In: ICDE (2020)

    Google Scholar 

  22. Mehrabi, N., et al.: A survey on bias and fairness in machine learning. arXiv (2019)

    Google Scholar 

  23. MNIST (1998). http://yann.lecun.com/exdb/mnist

  24. Seagate ST2000DM001 Manual (2011). https://www.seagate.com/files/staticfiles/docs/pdf/datasheet/disc/barracuda-ds1737-1-1111us.pdf

  25. SIFT (2004). http://corpus-texmex.irisa.fr

  26. Zheng, B., et al.: PM-LSH: a fast and accurate LSH framework for high-dimensional approximate NN search. VLDB 13(5), 643–655 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omid Jafari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jafari, O., Maurya, P., Islam, K.M., Nagarkar, P. (2021). Optimizing Fair Approximate Nearest Neighbor Searches Using Threaded B+-Trees. In: Reyes, N., et al. Similarity Search and Applications. SISAP 2021. Lecture Notes in Computer Science(), vol 13058. Springer, Cham. https://doi.org/10.1007/978-3-030-89657-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89657-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89656-0

  • Online ISBN: 978-3-030-89657-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics