Abstract
Data management and data mining over distributed data streams have received considerable attention within the database community recently. This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources. Skyline query returns a set of interesting objects which are not dominated by any other objects within the base dataset. Previous work is concentrated on skyline computations over static data or centralized data streams. We present an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams. BOCS consists of an efficient centralized algorithm GridSky and an associated communication protocol. Based on the strategy of progressive refinement in BOCS, the skyline is incrementally computed by two phases. In the first phase, local skylines on remote sites are maintained by GridSky. At each time, only skyline increments on remote sites are sent to the coordinator. In the second phase, a global skyline is obtained by integrating remote increments with the latest global skyline. A theoretical analysis shows that BOCS is communication-optimal among all algorithms which use a share-nothing strategy. Extensive experiments demonstrate that our proposals are efficient, scalable, and stable.
Similar content being viewed by others
References
Borzsonyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of ICDE, pp 421–430
Tao Y, Papadias D (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng 18(3): 377–391
Lin X, Yuan Y, Wang W et al (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of ICDE, pp 502–513
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD, pp 47–57
Theodoridis Y, Sellis T (1996) A model for the prediction of R-tree performance. In: Proceedings of the ACM PODS, pp 161–171
Gibbons P, Tirthapura S (2001) Estimating simple functions on the union of data streams. In: Proceedings of the ACM symposium on parallel algorithms and architectures, pp 281–291
Keralapura R, Cormode G, Ramamirtham J (2006) Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the ACM SIGMOD, pp 289–300
Cormode G, Garofalakis M (2005) Sketching streams through the net: distributed approximate query tracking. In: Proceedings of VLDB, pp 13–24
Cormode G, Garofalakis M, Muthukrishnan S et al (2005) Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of the ACM SIGMOD, pp 25–36
Das A, Ganguly S, Garofalakis M et al (2004) Distributed set-expression cardinality estimation. In: Proceedings of VLDB, pp 312–323
Madden S, Franklin M, Hellerstein J et al (2003) The design of an acquisitional query processor for sensor networks. In: Proceedings of the ACM SIGMOD, pp 491–502
Cormode G, Muthukrishnan S, Rozenbaum I (2005) Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In: Proceedings of VLDB, pp 25–36
Chomicki J, Godfrey P, Gryz J et al (2003) Skyline with presorting. In: Proceedings of ICDE, pp 717–719
Ganesan P, Bawa M, Garcia-Molina H (2004) Online balancing of range-partitioned data with applications to peer-to-peer systems. In: Proceedings of VLDB, pp 444–455
Wu P, Zhang C, Feng Y et al (2006) Parallelizing skyline queries for scalable distribution. In: Proceedings of EDBT, pp 112–130
Manjhi A, Shkapenyuk V, Dhamdhere K et al (2005) Finding (recently) frequent items in distributed data streams. In: Proceedings of ICDE, pp 767–778
Babcock B, Olston C. (2003) Distributed top-k monitoring. In: Proceedings of the ACM SIGMOD, pp 28–39
Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of VLDB, pp 275–286
Papadias D, Tao Y, Fu G et al (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1): 41–82
Balke W, Guntzer U, Zheng J (2004) Efficient distributed skylining for web information systems. In: Proceedings of EDBT, pp 256–273
Vlachou A, Doulkeridis C, Kotidis Y et al (2007) SKYPEER: Efficient subspace skyline computation over distributed data. In: Proceedings of ICDE, pp 416–425
Xin J, Wang G, Chen L et al (2007) Continuously maintaining sliding window skylines in a sensor network. In: Proceedings of the 12th international conference on database systems for advanced applications, pp 509–521
Greenwaldand M, Khanna S (2004) Power-conserving computation of order-statistics over sensor networks. In: Proceedings of the ACM PODS, pp 275–285
Cormode G, Muthukrishnan S, Zhuang W (2007) Conquering the divide: continuous clustering of distributed data streams. In: Proceedings of ICDE, pp 1036–1045
Pottie G, Kaiser W (2000) Wireless integrated network sensors. Commun ACM 43(5): 51–58
Berchtold S, Keim D, Kriegel H (1996) The X-Tree: an index structure for high-dimensional data. In: Proceedings of VLDB, pp 28–39
Beckmann N, Kriegel H, Schneider R et al (1990) The R*-tree: an efficient and robust access method for points and rectangles. ACM SIGMOD Record 19(2): 322–331
Kung H, Luccio F, Preparata F (1975) On finding the maxima of a set of vectors. J ACM 22(4): 469–476
Bentley J, Kung H, Schkolnick M et al (1978) On the average number of maxima in a set of vectors and applications. J ACM 25(4): 536–543
Buchta C (1989) On the average number of maxima in a set of vectors. Inform Process Lett 33(2): 63–65
Babcock B, Babu S, Datar M et al (2002) Models and issues in data stream systems. In: Proceedings of the ACM PODS, pp 1–16
Aggarwal C (2009) On classification and segmentation of massive audio data streams. Knowl Inform Syst 20(2): 137–156
Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inform Syst 16(1): 1–27
Muthukrishnan S (2003) Data streams: algorithms and applications. Technical report, Rutgers University, Piscataway, NJ
Li H, Shan M, Lee S (2008) DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inform Syst 17(1): 79–97
Dang X, Ng W, Ong K (2008) Online mining of frequent sets in data streams with error guarantee. Knowl Inform Syst 16(2): 245–258
Zhou A, Cao F, Qian W et al (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inform Syst 15(2): 181–214
Huang Z, Sun S, Wang W (2009) Efficient mining of skyline objects in subspaces over data streams. Knowl Inform Syst. doi:10.1007/s10115-008-0185-8
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, S., Huang, Z., Zhong, H. et al. Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25, 575–606 (2010). https://doi.org/10.1007/s10115-009-0269-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0269-0