LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data

Wang, Jiagang; Qian, Tu; Yang, Anbang; Wang, Hui; Qian, Jiangbo

doi:10.1007/978-981-15-2810-1_17

Jiagang Wang¹⁵,
Tu Qian¹⁵,
Anbang Yang¹⁵,
Hui Wang¹⁵ &
…
Jiangbo Qian¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1179))

Included in the following conference series:

International Conference on Data Service

1179 Accesses
1 Citations

Abstract

Uncertain data is widely used in many practical applications, such as data cleaning, location-based services, privacy protection and so on. With the development of technology, the data has a tendency to high-dimensionality. The most common indexes for nearest neighbor search on uncertain data are the R-Tree and the KD-Tree. These indexes will inevitably bring about “curse of dimension”. Focus on this problem, this paper proposes a new hash algorithm, called the LSR-forest, which based on the locality-sensitive hashing and the R-Tree, to solve the high-dimensional uncertain data approximate neighbor search problem. The LSR-forest can hash similar high dimensional uncertain data into a same bucket with a high probability, and then constructs multiple R-Tree-based indexes for hashed buckets. When querying, it is possible to judge neighbors by checking the data in the hypercube which the query point is in. One can also adjust the query range automatically by different parameter k. Many experiments on different data sets are presented in this paper. The results show that LSR-forest has better effectiveness and efficiency than R-Tree on high-dimensional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xiaoye, M., Yunjun, G., Gang, C.: Processing incomplete k nearest neighbor search. IEEE Trans. Fuzzy Syst. 24(6), 1349–1363 (2016)
Article Google Scholar
Sistla, A., Wolfson, O., Xu, B.: Continuous nearest-neighbor queries with location uncertainty. VLDB 24(1), 25–50 (2015)
Article Google Scholar
Jian, L., Haiao, W.: Range queries on uncertain data. Theoret. Comput. Sci. 609(1), 32–48 (2016)
MathSciNet Google Scholar
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017)
Google Scholar
Ebrahimnejad, A., Tavana, M., Nasseri, S.H., Gholami, O.F.: A new method for solving dual DEA problems with fuzzy stochastic data. Int. J. Inf. Technol. Decis. Making 18(01), 147–170 (2019)
Google Scholar
Jianhua, J., Yujun, C., Xianqiu, M., Limin, W., Keqin, L.: A novel density peaks clustering algorithm based on k nearest neighbours for improving assignment process. Phys. A 523, 702–713 (2019)
Article Google Scholar
Jiang, J., Chen, Y., Hao, D., Li, K.: DPC-LG: density peaks clustering based on logistic distribution and gravitation. Phys. A 514, 25–35 (2019)
Article Google Scholar
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD International Conference on Management of Data, New York, NY, USA, vol. 14(2), pp. 47–57 (1984)
Google Scholar
Peng, Y., Li, H., Cui, J.: An efficient range query model over encrypted outsourced data using secure k-d tree. In: International Conference on Networking and Network Applications, pp. 250–253 (2016)
Google Scholar
Weber, R., Schek, H., Blot, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: International Conference on Very Large Data Bases, New York, pp. 194–205 (1998)
Google Scholar
Zhenyun, D., Xiaoshu, Z., Debo, C., Ming, Z., Shichao, Z.: Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)
Article Google Scholar
Giyasettin Ozcan, F.: Unsupervised learning from multi-dimensional data: a fast clustering algorithm utilizing canopies and statistical information. Int. J. Inf. Technol. Decis. Making 17(03), 841–856 (2018)
Article Google Scholar
Cheng, R., Dmitri, V., Sunil, P.: Evaluating probabilistic queries over imprecise data. In: ACM SIGMOD International Conference on Management of Data, New York, USA, pp. 551–562 (2003)
Google Scholar
Lianmeng, J., Xiaojiao, G., Quanpan, B.: KNN: k-nearest neighbor classifier with pairwise distance metrics and belief function theory. IEEE Access 7, 48935–48947 (2019)
Article Google Scholar
Ljosa, V., Singh, A.: APLA: indexing arbitrary probability distributions. In: Proceedings ICDE, Turkey, pp. 946–955 (2007)
Google Scholar
Reynold, C., Prabhakar, S., Dmitri, V.: Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. J. 16(9), 1112–1127 (2003)
Google Scholar
Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 337–348. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71703-4_30
Chapter Google Scholar
Reynold, C., Jinchuan, C., Mohamed, M.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Data Engineering (ICDE). Piscataway, NJ, pp. 973–982. IEEE (2008)
Google Scholar
Reynold, C., Lei, C., Jinchuan, C.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Extending Database Technology, New York, pp. 672–683 (2009)
Google Scholar
Gionis, A., Indyky, P., Motwaniz, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases, Cairo, Egypt, pp. 518–529 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Ningbo University, Ningbo, 315211, Zhejiang, China
Jiagang Wang, Tu Qian, Anbang Yang, Hui Wang & Jiangbo Qian

Authors

Jiagang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tu Qian
View author publications
You can also search for this author in PubMed Google Scholar
Anbang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiangbo Qian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Swinburne University of Technology, Melbourne, VIC, Australia
Jing He
University of Illinois at Chicago, Chicago, USA
Philip S. Yu
College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA
Yong Shi
Research Institute of Extenics and Innovation Methods, Guangdong University of Technology, Guangzhou, China
Xingsen Li
Ningbo University, Ningbo, China
Zhijun Xie
Deakin University, Burwood, VIC, Australia
Guangyan Huang
Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
Jie Cao
Nanjing University of Posts and Telecommunications, Nanjing, China
Fu Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Qian, T., Yang, A., Wang, H., Qian, J. (2020). LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_17

Download citation

DOI: https://doi.org/10.1007/978-981-15-2810-1_17
Published: 02 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2809-5
Online ISBN: 978-981-15-2810-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics