Skip to main content

LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data

  • Conference paper
  • First Online:
Data Science (ICDS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1179))

Included in the following conference series:

Abstract

Uncertain data is widely used in many practical applications, such as data cleaning, location-based services, privacy protection and so on. With the development of technology, the data has a tendency to high-dimensionality. The most common indexes for nearest neighbor search on uncertain data are the R-Tree and the KD-Tree. These indexes will inevitably bring about “curse of dimension”. Focus on this problem, this paper proposes a new hash algorithm, called the LSR-forest, which based on the locality-sensitive hashing and the R-Tree, to solve the high-dimensional uncertain data approximate neighbor search problem. The LSR-forest can hash similar high dimensional uncertain data into a same bucket with a high probability, and then constructs multiple R-Tree-based indexes for hashed buckets. When querying, it is possible to judge neighbors by checking the data in the hypercube which the query point is in. One can also adjust the query range automatically by different parameter k. Many experiments on different data sets are presented in this paper. The results show that LSR-forest has better effectiveness and efficiency than R-Tree on high-dimensional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xiaoye, M., Yunjun, G., Gang, C.: Processing incomplete k nearest neighbor search. IEEE Trans. Fuzzy Syst. 24(6), 1349–1363 (2016)

    Article  Google Scholar 

  2. Sistla, A., Wolfson, O., Xu, B.: Continuous nearest-neighbor queries with location uncertainty. VLDB 24(1), 25–50 (2015)

    Article  Google Scholar 

  3. Jian, L., Haiao, W.: Range queries on uncertain data. Theoret. Comput. Sci. 609(1), 32–48 (2016)

    MathSciNet  Google Scholar 

  4. Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017)

    Google Scholar 

  5. Ebrahimnejad, A., Tavana, M., Nasseri, S.H., Gholami, O.F.: A new method for solving dual DEA problems with fuzzy stochastic data. Int. J. Inf. Technol. Decis. Making 18(01), 147–170 (2019)

    Google Scholar 

  6. Jianhua, J., Yujun, C., Xianqiu, M., Limin, W., Keqin, L.: A novel density peaks clustering algorithm based on k nearest neighbours for improving assignment process. Phys. A 523, 702–713 (2019)

    Article  Google Scholar 

  7. Jiang, J., Chen, Y., Hao, D., Li, K.: DPC-LG: density peaks clustering based on logistic distribution and gravitation. Phys. A 514, 25–35 (2019)

    Article  Google Scholar 

  8. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD International Conference on Management of Data, New York, NY, USA, vol. 14(2), pp. 47–57 (1984)

    Google Scholar 

  9. Peng, Y., Li, H., Cui, J.: An efficient range query model over encrypted outsourced data using secure k-d tree. In: International Conference on Networking and Network Applications, pp. 250–253 (2016)

    Google Scholar 

  10. Weber, R., Schek, H., Blot, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: International Conference on Very Large Data Bases, New York, pp. 194–205 (1998)

    Google Scholar 

  11. Zhenyun, D., Xiaoshu, Z., Debo, C., Ming, Z., Shichao, Z.: Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)

    Article  Google Scholar 

  12. Giyasettin Ozcan, F.: Unsupervised learning from multi-dimensional data: a fast clustering algorithm utilizing canopies and statistical information. Int. J. Inf. Technol. Decis. Making 17(03), 841–856 (2018)

    Article  Google Scholar 

  13. Cheng, R., Dmitri, V., Sunil, P.: Evaluating probabilistic queries over imprecise data. In: ACM SIGMOD International Conference on Management of Data, New York, USA, pp. 551–562 (2003)

    Google Scholar 

  14. Lianmeng, J., Xiaojiao, G., Quanpan, B.: KNN: k-nearest neighbor classifier with pairwise distance metrics and belief function theory. IEEE Access 7, 48935–48947 (2019)

    Article  Google Scholar 

  15. Ljosa, V., Singh, A.: APLA: indexing arbitrary probability distributions. In: Proceedings ICDE, Turkey, pp. 946–955 (2007)

    Google Scholar 

  16. Reynold, C., Prabhakar, S., Dmitri, V.: Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. J. 16(9), 1112–1127 (2003)

    Google Scholar 

  17. Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 337–348. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71703-4_30

    Chapter  Google Scholar 

  18. Reynold, C., Jinchuan, C., Mohamed, M.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Data Engineering (ICDE). Piscataway, NJ, pp. 973–982. IEEE (2008)

    Google Scholar 

  19. Reynold, C., Lei, C., Jinchuan, C.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Extending Database Technology, New York, pp. 672–683 (2009)

    Google Scholar 

  20. Gionis, A., Indyky, P., Motwaniz, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases, Cairo, Egypt, pp. 518–529 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Qian, T., Yang, A., Wang, H., Qian, J. (2020). LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2810-1_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2809-5

  • Online ISBN: 978-981-15-2810-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics