Skip to main content
Log in

Efficient monitoring of skyline queries over distributed data streams

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Data management and data mining over distributed data streams have received considerable attention within the database community recently. This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources. Skyline query returns a set of interesting objects which are not dominated by any other objects within the base dataset. Previous work is concentrated on skyline computations over static data or centralized data streams. We present an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams. BOCS consists of an efficient centralized algorithm GridSky and an associated communication protocol. Based on the strategy of progressive refinement in BOCS, the skyline is incrementally computed by two phases. In the first phase, local skylines on remote sites are maintained by GridSky. At each time, only skyline increments on remote sites are sent to the coordinator. In the second phase, a global skyline is obtained by integrating remote increments with the latest global skyline. A theoretical analysis shows that BOCS is communication-optimal among all algorithms which use a share-nothing strategy. Extensive experiments demonstrate that our proposals are efficient, scalable, and stable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Borzsonyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of ICDE, pp 421–430

  2. Tao Y, Papadias D (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng 18(3): 377–391

    Article  Google Scholar 

  3. Lin X, Yuan Y, Wang W et al (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of ICDE, pp 502–513

  4. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD, pp 47–57

  5. Theodoridis Y, Sellis T (1996) A model for the prediction of R-tree performance. In: Proceedings of the ACM PODS, pp 161–171

  6. Gibbons P, Tirthapura S (2001) Estimating simple functions on the union of data streams. In: Proceedings of the ACM symposium on parallel algorithms and architectures, pp 281–291

  7. Keralapura R, Cormode G, Ramamirtham J (2006) Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the ACM SIGMOD, pp 289–300

  8. Cormode G, Garofalakis M (2005) Sketching streams through the net: distributed approximate query tracking. In: Proceedings of VLDB, pp 13–24

  9. Cormode G, Garofalakis M, Muthukrishnan S et al (2005) Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of the ACM SIGMOD, pp 25–36

  10. Das A, Ganguly S, Garofalakis M et al (2004) Distributed set-expression cardinality estimation. In: Proceedings of VLDB, pp 312–323

  11. Madden S, Franklin M, Hellerstein J et al (2003) The design of an acquisitional query processor for sensor networks. In: Proceedings of the ACM SIGMOD, pp 491–502

  12. Cormode G, Muthukrishnan S, Rozenbaum I (2005) Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In: Proceedings of VLDB, pp 25–36

  13. Chomicki J, Godfrey P, Gryz J et al (2003) Skyline with presorting. In: Proceedings of ICDE, pp 717–719

  14. Ganesan P, Bawa M, Garcia-Molina H (2004) Online balancing of range-partitioned data with applications to peer-to-peer systems. In: Proceedings of VLDB, pp 444–455

  15. Wu P, Zhang C, Feng Y et al (2006) Parallelizing skyline queries for scalable distribution. In: Proceedings of EDBT, pp 112–130

  16. Manjhi A, Shkapenyuk V, Dhamdhere K et al (2005) Finding (recently) frequent items in distributed data streams. In: Proceedings of ICDE, pp 767–778

  17. Babcock B, Olston C. (2003) Distributed top-k monitoring. In: Proceedings of the ACM SIGMOD, pp 28–39

  18. Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of VLDB, pp 275–286

  19. Papadias D, Tao Y, Fu G et al (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1): 41–82

    Article  Google Scholar 

  20. Balke W, Guntzer U, Zheng J (2004) Efficient distributed skylining for web information systems. In: Proceedings of EDBT, pp 256–273

  21. Vlachou A, Doulkeridis C, Kotidis Y et al (2007) SKYPEER: Efficient subspace skyline computation over distributed data. In: Proceedings of ICDE, pp 416–425

  22. Xin J, Wang G, Chen L et al (2007) Continuously maintaining sliding window skylines in a sensor network. In: Proceedings of the 12th international conference on database systems for advanced applications, pp 509–521

  23. Greenwaldand M, Khanna S (2004) Power-conserving computation of order-statistics over sensor networks. In: Proceedings of the ACM PODS, pp 275–285

  24. Cormode G, Muthukrishnan S, Zhuang W (2007) Conquering the divide: continuous clustering of distributed data streams. In: Proceedings of ICDE, pp 1036–1045

  25. Pottie G, Kaiser W (2000) Wireless integrated network sensors. Commun ACM 43(5): 51–58

    Article  Google Scholar 

  26. Berchtold S, Keim D, Kriegel H (1996) The X-Tree: an index structure for high-dimensional data. In: Proceedings of VLDB, pp 28–39

  27. Beckmann N, Kriegel H, Schneider R et al (1990) The R*-tree: an efficient and robust access method for points and rectangles. ACM SIGMOD Record 19(2): 322–331

    Article  Google Scholar 

  28. Kung H, Luccio F, Preparata F (1975) On finding the maxima of a set of vectors. J ACM 22(4): 469–476

    Article  MATH  MathSciNet  Google Scholar 

  29. Bentley J, Kung H, Schkolnick M et al (1978) On the average number of maxima in a set of vectors and applications. J ACM 25(4): 536–543

    Article  MATH  MathSciNet  Google Scholar 

  30. Buchta C (1989) On the average number of maxima in a set of vectors. Inform Process Lett 33(2): 63–65

    Article  MathSciNet  Google Scholar 

  31. Babcock B, Babu S, Datar M et al (2002) Models and issues in data stream systems. In: Proceedings of the ACM PODS, pp 1–16

  32. Aggarwal C (2009) On classification and segmentation of massive audio data streams. Knowl Inform Syst 20(2): 137–156

    Article  MathSciNet  Google Scholar 

  33. Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inform Syst 16(1): 1–27

    Article  MathSciNet  Google Scholar 

  34. Muthukrishnan S (2003) Data streams: algorithms and applications. Technical report, Rutgers University, Piscataway, NJ

    Google Scholar 

  35. Li H, Shan M, Lee S (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inform Syst 17(1): 79–97

    Article  Google Scholar 

  36. Dang X, Ng W, Ong K (2008) Online mining of frequent sets in data streams with error guarantee. Knowl Inform Syst 16(2): 245–258

    Article  Google Scholar 

  37. Zhou A, Cao F, Qian W et al (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inform Syst 15(2): 181–214

    Article  Google Scholar 

  38. Huang Z, Sun S, Wang W (2009) Efficient mining of skyline objects in subspaces over data streams. Knowl Inform Syst. doi:10.1007/s10115-008-0185-8

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengli Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, S., Huang, Z., Zhong, H. et al. Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25, 575–606 (2010). https://doi.org/10.1007/s10115-009-0269-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0269-0

Keywords

Navigation