On processing continuous frequent K-N-match queries for dynamic data over networked data sources

Chiu, Shih-Chuan; Huang, Jiun-Long; Huang, Jen-He

doi:10.1007/s10115-011-0413-5

On processing continuous frequent K-N-match queries for dynamic data over networked data sources

Regular Paper
Published: 20 May 2011

Volume 31, pages 547–579, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Shih-Chuan Chiu¹,
Jiun-Long Huang¹ &
Jen-He Huang¹

137 Accesses
Explore all metrics

Abstract

Similarity search is one of the critical issues in many applications. When using all attributes of objects to determine their similarity, most prior similarity search algorithms are easily influenced by a few attributes with high dissimilarity. The frequent k-n-match query is proposed to overcome the above problem. However, the prior algorithm to process frequent k-n-match queries is designed for static data, whose attributes are fixed, and is not suitable for dynamic data. Thus, we propose in this paper two schemes to process continuous frequent k-n-match queries over dynamic data. First, the concept of safe region is proposed and four formulae are devised to compute safe regions. Then, scheme CFKNMatchAD-C is developed to speed up the process of continuous frequent k-n-match queries by utilizing safe regions to avoid unnecessary query re-evaluations. To reduce the amount of data transmitted by networked data sources, scheme CFKNMatchAD-C also uses safe regions to eliminate transmissions of unnecessary data updates which will not affect the results of queries. Moreover, for large-scale environments, we further propose scheme CFKNMatchAD-D by extending scheme CFKMatchAD-C to employ multiple servers to process continuous frequent k-n-match queries. Experimental results show that scheme CFKNMatchAD-C and scheme CFKNMatchAD-D outperform the prior algorithm in terms of average response time and the amount of produced network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Continuous Top-k Query over Sliding Window

Article 11 January 2017

OSim: An OLAP-Based Similarity Search Service Solver for Dynamic Information Networks

An Adaptive Similarity Search in Massive Datasets

References

Aggarwal CC, Yu PS (August 2000) The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 119–129
Agrawal R, Lin K-I, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the international conference on very large data bases, pp 490–501
Babcock B, Olston C (June 2003) Distributed Top-K monitoring. In: Proceedings of the ACM international conference on management of data
Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM international conference on management of data, pp 322–331
Berchtold S, Keim DA, Kriegel H-P (1996) The X-Tree: an index structure for high-dimensional data. In: Proceedings of the international conference on very large data bases, pp 28–39
Bernecker T, Emrich T, Graf F, Kriegel H-P, Króger P, Renz M, Schubert E, Zimek A (2010) Subspace similarity search: efficient k-NN queries in arbitrary subspaces. In: Proceedings of the international conference on scientific and statistical database management
Deng B, Jia Y, Yang S (June 2006) Supporting efficient distributed Top-k monitoring. In: Proceedings of the 7th international conference on Web-Age information management
Gao L, Yao Z, Wang X (2002) Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching. In: Proceedings of the ACM international conference on information and knowledge management, pp 485–492
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM international conference on management of data, pp 47–57
Hammouda KM, Kamel MS (2004) Document similarity using a phrase indexing graph model. Knowl Inf Syst 6(6): 710–727
Article Google Scholar
Hjaltason GR, Samet H (1998) Distance browsing in spatial databases. ACM Trans Database Syst 24(2): 265–318
Article Google Scholar
Hu H, Xu J, Lee DL (2005) A generic framework for monitoring continuous spatial queries over moving objects. In: Proceedings of the ACM international conference on management of data, pp 479–490
Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2): 364–397
Article Google Scholar
Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM international conference on management of data, pp 369–380
Kelil A, Wang S, Jiang Q, Brzezinski R (2010) A general measure of similarity for categorical sequences. Knowl Inf Syst 24(2): 197–220
Article Google Scholar
Korn F, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z (1996) Fast nearest neighbor serach in medical image databases. In: Proceedings of the international conference on very large data bases, pp 215–226
Koudas N, Ooi BC, Tan K-L, Zhang R (2004) Approximate NN queries on streams with guaranteed error/performance bounds. In: Proceegings of the 30th internatvonal confeyence on very large data bases
Kriegel H-P, Kroger P, Schubert M, Zhu Z (July 2006) Efficient query processing in arbitrary subspaces using vector approximations. In: Proceedings of the international conference on scientific and statistical database management
Lian X, Chen L (April 2008) Similarity search in arbitrary subspaces under Lp-Norm. In: Proceedings of the IEEE international conference on data engineering
Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3(4)
Mouratidis K, Bakiras S, Papadias D (June 2006) Continuous monitoring of Top-k queries over sliding windows. In: Proceedings of 25th ACM international conference on management of data
Nutanong S, Zhang R, Tanin E, Kulik L (2008) The V*Diagram: a querydependent approach to moving KNN queries. In: Proceedings of the 34th internatvonal conference on very large data bases
Prabhakar S, Xia Y, Kalashnikov D, Aref WG, Hambrusch S (2002) Query indexing and velocity constrained indexing: scalable techniques for continuous queries on moving objects. IEEE Trans Comput 55
Quan X, Liu G, Lu Z, Ni X, Wenyin L (2010) Short text similarity based on probabilistic topics. Knowl Inf Syst 25(3): 473–491
Article Google Scholar
Seidl T, Kriegel H-P (1998) Optimal multi-step K-nearest neighbor search. In: Proceedings of the ACM international conference on management of data, pp 154–165
Sellis T, Roussopoulos N, Faloutsos C (1987) The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the international conference on very large data bases, pp 507–518
Shah S, Dharmarajan S, Ramamritham K (2003) An efficient and resilient approach to filtering and disseminating streaming data. In: Proceedings of the international conference on very large data bases
Silberstein A, Munagala K, Yang J (2006) Energy efficient monitoring of extreme values in sensor networks. In: Proceedings of the 26th ACM international conference on management of data, pp 169–180
Song Z, Roussopoulos N (2001) K-nearest neighbor search for moving query point. In: Proceedings of the international symposium on advances in spatial and temporal databases, pp 79–96
Tung AKH, Zhang R, Koudas N, Ooi BC (2006) Similarity search: a matching based approach. In: Proceedings of the international conference on very large data bases, pp 631–642
Wang C, Zhou BB, Zomaya AY (2009) A decentralized method for scaling up genome similarity search services. IEEE Trans Parallel Distribut Syst 20(3): 303–315
Article Google Scholar
Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the international conference on very large data bases, pp 194–205
White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proceedings of the IEEE international conference on data engineering, pp 516–523
Xu J, Tang X, Lee W-C, Wu M (2007) Top-k monitoring in wireless sensor networks. IEEE Trans Knowl Data Eng 19(7): 962–976
Article Google Scholar
Yeo MH, Seong DO, Yoo JS (October 2008) PRIM: priority-based Top-k monitoring in wireless sensor networks. In: Proceedings of international symposium on computer science and its applications, pp 326–331
Zhang M, Alhajj R (2010) Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22(1): 1–26
Article MATH Google Scholar
Zhou Y, Ooi BC, Tan K-L (2008) Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach. VLDB J 17

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC
Shih-Chuan Chiu, Jiun-Long Huang & Jen-He Huang

Authors

Shih-Chuan Chiu
View author publications
You can also search for this author inPubMed Google Scholar
Jiun-Long Huang
View author publications
You can also search for this author inPubMed Google Scholar
Jen-He Huang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jiun-Long Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiu, SC., Huang, JL. & Huang, JH. On processing continuous frequent K-N-match queries for dynamic data over networked data sources. Knowl Inf Syst 31, 547–579 (2012). https://doi.org/10.1007/s10115-011-0413-5

Download citation

Received: 09 December 2010
Revised: 31 March 2011
Accepted: 06 May 2011
Published: 20 May 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10115-011-0413-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On processing continuous frequent K-N-match queries for dynamic data over networked data sources

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Approximate Continuous Top-k Query over Sliding Window

OSim: An OLAP-Based Similarity Search Service Solver for Dynamic Information Networks

An Adaptive Similarity Search in Massive Datasets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now