Abstract
Similarity search is one of the critical issues in many applications. When using all attributes of objects to determine their similarity, most prior similarity search algorithms are easily influenced by a few attributes with high dissimilarity. The frequent k-n-match query is proposed to overcome the above problem. However, the prior algorithm to process frequent k-n-match queries is designed for static data, whose attributes are fixed, and is not suitable for dynamic data. Thus, we propose in this paper two schemes to process continuous frequent k-n-match queries over dynamic data. First, the concept of safe region is proposed and four formulae are devised to compute safe regions. Then, scheme CFKNMatchAD-C is developed to speed up the process of continuous frequent k-n-match queries by utilizing safe regions to avoid unnecessary query re-evaluations. To reduce the amount of data transmitted by networked data sources, scheme CFKNMatchAD-C also uses safe regions to eliminate transmissions of unnecessary data updates which will not affect the results of queries. Moreover, for large-scale environments, we further propose scheme CFKNMatchAD-D by extending scheme CFKMatchAD-C to employ multiple servers to process continuous frequent k-n-match queries. Experimental results show that scheme CFKNMatchAD-C and scheme CFKNMatchAD-D outperform the prior algorithm in terms of average response time and the amount of produced network traffic.
Similar content being viewed by others
References
Aggarwal CC, Yu PS (August 2000) The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 119–129
Agrawal R, Lin K-I, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the international conference on very large data bases, pp 490–501
Babcock B, Olston C (June 2003) Distributed Top-K monitoring. In: Proceedings of the ACM international conference on management of data
Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM international conference on management of data, pp 322–331
Berchtold S, Keim DA, Kriegel H-P (1996) The X-Tree: an index structure for high-dimensional data. In: Proceedings of the international conference on very large data bases, pp 28–39
Bernecker T, Emrich T, Graf F, Kriegel H-P, Króger P, Renz M, Schubert E, Zimek A (2010) Subspace similarity search: efficient k-NN queries in arbitrary subspaces. In: Proceedings of the international conference on scientific and statistical database management
Deng B, Jia Y, Yang S (June 2006) Supporting efficient distributed Top-k monitoring. In: Proceedings of the 7th international conference on Web-Age information management
Gao L, Yao Z, Wang X (2002) Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching. In: Proceedings of the ACM international conference on information and knowledge management, pp 485–492
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM international conference on management of data, pp 47–57
Hammouda KM, Kamel MS (2004) Document similarity using a phrase indexing graph model. Knowl Inf Syst 6(6): 710–727
Hjaltason GR, Samet H (1998) Distance browsing in spatial databases. ACM Trans Database Syst 24(2): 265–318
Hu H, Xu J, Lee DL (2005) A generic framework for monitoring continuous spatial queries over moving objects. In: Proceedings of the ACM international conference on management of data, pp 479–490
Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2): 364–397
Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM international conference on management of data, pp 369–380
Kelil A, Wang S, Jiang Q, Brzezinski R (2010) A general measure of similarity for categorical sequences. Knowl Inf Syst 24(2): 197–220
Korn F, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z (1996) Fast nearest neighbor serach in medical image databases. In: Proceedings of the international conference on very large data bases, pp 215–226
Koudas N, Ooi BC, Tan K-L, Zhang R (2004) Approximate NN queries on streams with guaranteed error/performance bounds. In: Proceegings of the 30th internatvonal confeyence on very large data bases
Kriegel H-P, Kroger P, Schubert M, Zhu Z (July 2006) Efficient query processing in arbitrary subspaces using vector approximations. In: Proceedings of the international conference on scientific and statistical database management
Lian X, Chen L (April 2008) Similarity search in arbitrary subspaces under Lp-Norm. In: Proceedings of the IEEE international conference on data engineering
Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3(4)
Mouratidis K, Bakiras S, Papadias D (June 2006) Continuous monitoring of Top-k queries over sliding windows. In: Proceedings of 25th ACM international conference on management of data
Nutanong S, Zhang R, Tanin E, Kulik L (2008) The V*Diagram: a querydependent approach to moving KNN queries. In: Proceedings of the 34th internatvonal conference on very large data bases
Prabhakar S, Xia Y, Kalashnikov D, Aref WG, Hambrusch S (2002) Query indexing and velocity constrained indexing: scalable techniques for continuous queries on moving objects. IEEE Trans Comput 55
Quan X, Liu G, Lu Z, Ni X, Wenyin L (2010) Short text similarity based on probabilistic topics. Knowl Inf Syst 25(3): 473–491
Seidl T, Kriegel H-P (1998) Optimal multi-step K-nearest neighbor search. In: Proceedings of the ACM international conference on management of data, pp 154–165
Sellis T, Roussopoulos N, Faloutsos C (1987) The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the international conference on very large data bases, pp 507–518
Shah S, Dharmarajan S, Ramamritham K (2003) An efficient and resilient approach to filtering and disseminating streaming data. In: Proceedings of the international conference on very large data bases
Silberstein A, Munagala K, Yang J (2006) Energy efficient monitoring of extreme values in sensor networks. In: Proceedings of the 26th ACM international conference on management of data, pp 169–180
Song Z, Roussopoulos N (2001) K-nearest neighbor search for moving query point. In: Proceedings of the international symposium on advances in spatial and temporal databases, pp 79–96
Tung AKH, Zhang R, Koudas N, Ooi BC (2006) Similarity search: a matching based approach. In: Proceedings of the international conference on very large data bases, pp 631–642
Wang C, Zhou BB, Zomaya AY (2009) A decentralized method for scaling up genome similarity search services. IEEE Trans Parallel Distribut Syst 20(3): 303–315
Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the international conference on very large data bases, pp 194–205
White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proceedings of the IEEE international conference on data engineering, pp 516–523
Xu J, Tang X, Lee W-C, Wu M (2007) Top-k monitoring in wireless sensor networks. IEEE Trans Knowl Data Eng 19(7): 962–976
Yeo MH, Seong DO, Yoo JS (October 2008) PRIM: priority-based Top-k monitoring in wireless sensor networks. In: Proceedings of international symposium on computer science and its applications, pp 326–331
Zhang M, Alhajj R (2010) Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22(1): 1–26
Zhou Y, Ooi BC, Tan K-L (2008) Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach. VLDB J 17
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chiu, SC., Huang, JL. & Huang, JH. On processing continuous frequent K-N-match queries for dynamic data over networked data sources. Knowl Inf Syst 31, 547–579 (2012). https://doi.org/10.1007/s10115-011-0413-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0413-5