Skip to main content
Log in

On processing continuous frequent K-N-match queries for dynamic data over networked data sources

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Similarity search is one of the critical issues in many applications. When using all attributes of objects to determine their similarity, most prior similarity search algorithms are easily influenced by a few attributes with high dissimilarity. The frequent k-n-match query is proposed to overcome the above problem. However, the prior algorithm to process frequent k-n-match queries is designed for static data, whose attributes are fixed, and is not suitable for dynamic data. Thus, we propose in this paper two schemes to process continuous frequent k-n-match queries over dynamic data. First, the concept of safe region is proposed and four formulae are devised to compute safe regions. Then, scheme CFKNMatchAD-C is developed to speed up the process of continuous frequent k-n-match queries by utilizing safe regions to avoid unnecessary query re-evaluations. To reduce the amount of data transmitted by networked data sources, scheme CFKNMatchAD-C also uses safe regions to eliminate transmissions of unnecessary data updates which will not affect the results of queries. Moreover, for large-scale environments, we further propose scheme CFKNMatchAD-D by extending scheme CFKMatchAD-C to employ multiple servers to process continuous frequent k-n-match queries. Experimental results show that scheme CFKNMatchAD-C and scheme CFKNMatchAD-D outperform the prior algorithm in terms of average response time and the amount of produced network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal CC, Yu PS (August 2000) The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 119–129

  2. Agrawal R, Lin K-I, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the international conference on very large data bases, pp 490–501

  3. Babcock B, Olston C (June 2003) Distributed Top-K monitoring. In: Proceedings of the ACM international conference on management of data

  4. Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM international conference on management of data, pp 322–331

  5. Berchtold S, Keim DA, Kriegel H-P (1996) The X-Tree: an index structure for high-dimensional data. In: Proceedings of the international conference on very large data bases, pp 28–39

  6. Bernecker T, Emrich T, Graf F, Kriegel H-P, Króger P, Renz M, Schubert E, Zimek A (2010) Subspace similarity search: efficient k-NN queries in arbitrary subspaces. In: Proceedings of the international conference on scientific and statistical database management

  7. Deng B, Jia Y, Yang S (June 2006) Supporting efficient distributed Top-k monitoring. In: Proceedings of the 7th international conference on Web-Age information management

  8. Gao L, Yao Z, Wang X (2002) Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching. In: Proceedings of the ACM international conference on information and knowledge management, pp 485–492

  9. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM international conference on management of data, pp 47–57

  10. Hammouda KM, Kamel MS (2004) Document similarity using a phrase indexing graph model. Knowl Inf Syst 6(6): 710–727

    Article  Google Scholar 

  11. Hjaltason GR, Samet H (1998) Distance browsing in spatial databases. ACM Trans Database Syst 24(2): 265–318

    Article  Google Scholar 

  12. Hu H, Xu J, Lee DL (2005) A generic framework for monitoring continuous spatial queries over moving objects. In: Proceedings of the ACM international conference on management of data, pp 479–490

  13. Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2): 364–397

    Article  Google Scholar 

  14. Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM international conference on management of data, pp 369–380

  15. Kelil A, Wang S, Jiang Q, Brzezinski R (2010) A general measure of similarity for categorical sequences. Knowl Inf Syst 24(2): 197–220

    Article  Google Scholar 

  16. Korn F, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z (1996) Fast nearest neighbor serach in medical image databases. In: Proceedings of the international conference on very large data bases, pp 215–226

  17. Koudas N, Ooi BC, Tan K-L, Zhang R (2004) Approximate NN queries on streams with guaranteed error/performance bounds. In: Proceegings of the 30th internatvonal confeyence on very large data bases

  18. Kriegel H-P, Kroger P, Schubert M, Zhu Z (July 2006) Efficient query processing in arbitrary subspaces using vector approximations. In: Proceedings of the international conference on scientific and statistical database management

  19. Lian X, Chen L (April 2008) Similarity search in arbitrary subspaces under Lp-Norm. In: Proceedings of the IEEE international conference on data engineering

  20. Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3(4)

  21. Mouratidis K, Bakiras S, Papadias D (June 2006) Continuous monitoring of Top-k queries over sliding windows. In: Proceedings of 25th ACM international conference on management of data

  22. Nutanong S, Zhang R, Tanin E, Kulik L (2008) The V*Diagram: a querydependent approach to moving KNN queries. In: Proceedings of the 34th internatvonal conference on very large data bases

  23. Prabhakar S, Xia Y, Kalashnikov D, Aref WG, Hambrusch S (2002) Query indexing and velocity constrained indexing: scalable techniques for continuous queries on moving objects. IEEE Trans Comput 55

  24. Quan X, Liu G, Lu Z, Ni X, Wenyin L (2010) Short text similarity based on probabilistic topics. Knowl Inf Syst 25(3): 473–491

    Article  Google Scholar 

  25. Seidl T, Kriegel H-P (1998) Optimal multi-step K-nearest neighbor search. In: Proceedings of the ACM international conference on management of data, pp 154–165

  26. Sellis T, Roussopoulos N, Faloutsos C (1987) The R+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the international conference on very large data bases, pp 507–518

  27. Shah S, Dharmarajan S, Ramamritham K (2003) An efficient and resilient approach to filtering and disseminating streaming data. In: Proceedings of the international conference on very large data bases

  28. Silberstein A, Munagala K, Yang J (2006) Energy efficient monitoring of extreme values in sensor networks. In: Proceedings of the 26th ACM international conference on management of data, pp 169–180

  29. Song Z, Roussopoulos N (2001) K-nearest neighbor search for moving query point. In: Proceedings of the international symposium on advances in spatial and temporal databases, pp 79–96

  30. Tung AKH, Zhang R, Koudas N, Ooi BC (2006) Similarity search: a matching based approach. In: Proceedings of the international conference on very large data bases, pp 631–642

  31. Wang C, Zhou BB, Zomaya AY (2009) A decentralized method for scaling up genome similarity search services. IEEE Trans Parallel Distribut Syst 20(3): 303–315

    Article  Google Scholar 

  32. Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the international conference on very large data bases, pp 194–205

  33. White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proceedings of the IEEE international conference on data engineering, pp 516–523

  34. Xu J, Tang X, Lee W-C, Wu M (2007) Top-k monitoring in wireless sensor networks. IEEE Trans Knowl Data Eng 19(7): 962–976

    Article  Google Scholar 

  35. Yeo MH, Seong DO, Yoo JS (October 2008) PRIM: priority-based Top-k monitoring in wireless sensor networks. In: Proceedings of international symposium on computer science and its applications, pp 326–331

  36. Zhang M, Alhajj R (2010) Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22(1): 1–26

    Article  MATH  Google Scholar 

  37. Zhou Y, Ooi BC, Tan K-L (2008) Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach. VLDB J 17

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiun-Long Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiu, SC., Huang, JL. & Huang, JH. On processing continuous frequent K-N-match queries for dynamic data over networked data sources. Knowl Inf Syst 31, 547–579 (2012). https://doi.org/10.1007/s10115-011-0413-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0413-5

Keywords

Navigation