Abstract
Uncertain data is widely used in many practical applications, such as data cleaning, location-based services, privacy protection and so on. With the development of technology, the data has a tendency to high-dimensionality. The most common indexes for nearest neighbor search on uncertain data are the R-Tree and the KD-Tree. These indexes will inevitably bring about “curse of dimension”. Focus on this problem, this paper proposes a new hash algorithm, called the LSR-forest, which based on the locality-sensitive hashing and the R-Tree, to solve the high-dimensional uncertain data approximate neighbor search problem. The LSR-forest can hash similar high dimensional uncertain data into a same bucket with a high probability, and then constructs multiple R-Tree-based indexes for hashed buckets. When querying, it is possible to judge neighbors by checking the data in the hypercube which the query point is in. One can also adjust the query range automatically by different parameter k. Many experiments on different data sets are presented in this paper. The results show that LSR-forest has better effectiveness and efficiency than R-Tree on high-dimensional datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xiaoye, M., Yunjun, G., Gang, C.: Processing incomplete k nearest neighbor search. IEEE Trans. Fuzzy Syst. 24(6), 1349–1363 (2016)
Sistla, A., Wolfson, O., Xu, B.: Continuous nearest-neighbor queries with location uncertainty. VLDB 24(1), 25–50 (2015)
Jian, L., Haiao, W.: Range queries on uncertain data. Theoret. Comput. Sci. 609(1), 32–48 (2016)
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017)
Ebrahimnejad, A., Tavana, M., Nasseri, S.H., Gholami, O.F.: A new method for solving dual DEA problems with fuzzy stochastic data. Int. J. Inf. Technol. Decis. Making 18(01), 147–170 (2019)
Jianhua, J., Yujun, C., Xianqiu, M., Limin, W., Keqin, L.: A novel density peaks clustering algorithm based on k nearest neighbours for improving assignment process. Phys. A 523, 702–713 (2019)
Jiang, J., Chen, Y., Hao, D., Li, K.: DPC-LG: density peaks clustering based on logistic distribution and gravitation. Phys. A 514, 25–35 (2019)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD International Conference on Management of Data, New York, NY, USA, vol. 14(2), pp. 47–57 (1984)
Peng, Y., Li, H., Cui, J.: An efficient range query model over encrypted outsourced data using secure k-d tree. In: International Conference on Networking and Network Applications, pp. 250–253 (2016)
Weber, R., Schek, H., Blot, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: International Conference on Very Large Data Bases, New York, pp. 194–205 (1998)
Zhenyun, D., Xiaoshu, Z., Debo, C., Ming, Z., Shichao, Z.: Efficient kNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)
Giyasettin Ozcan, F.: Unsupervised learning from multi-dimensional data: a fast clustering algorithm utilizing canopies and statistical information. Int. J. Inf. Technol. Decis. Making 17(03), 841–856 (2018)
Cheng, R., Dmitri, V., Sunil, P.: Evaluating probabilistic queries over imprecise data. In: ACM SIGMOD International Conference on Management of Data, New York, USA, pp. 551–562 (2003)
Lianmeng, J., Xiaojiao, G., Quanpan, B.: KNN: k-nearest neighbor classifier with pairwise distance metrics and belief function theory. IEEE Access 7, 48935–48947 (2019)
Ljosa, V., Singh, A.: APLA: indexing arbitrary probability distributions. In: Proceedings ICDE, Turkey, pp. 946–955 (2007)
Reynold, C., Prabhakar, S., Dmitri, V.: Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. J. 16(9), 1112–1127 (2003)
Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 337–348. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71703-4_30
Reynold, C., Jinchuan, C., Mohamed, M.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Data Engineering (ICDE). Piscataway, NJ, pp. 973–982. IEEE (2008)
Reynold, C., Lei, C., Jinchuan, C.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of International Conference on Extending Database Technology, New York, pp. 672–683 (2009)
Gionis, A., Indyky, P., Motwaniz, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases, Cairo, Egypt, pp. 518–529 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, J., Qian, T., Yang, A., Wang, H., Qian, J. (2020). LSR-Forest: An LSH-Based Approximate k-Nearest Neighbor Query Algorithm on High-Dimensional Uncertain Data. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_17
Download citation
DOI: https://doi.org/10.1007/978-981-15-2810-1_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2809-5
Online ISBN: 978-981-15-2810-1
eBook Packages: Computer ScienceComputer Science (R0)