Abstract
Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributed parallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques.
Similar content being viewed by others
References
Afrati FN, Koutris P, Suciu D, Ullman JD (2012) Parallel skyline queries. In: Proceedings of international conference on data theory (ICDT). ACM
Atallah M, Qi Y (2009) Computing all skyline probabilities for uncertain data. In: Proceedings of the ACM symposium on principles of database systems (PODS), pp 279–287
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream. In: Proceedings of ACM Symposium on principles of database systems (PODS), pp 1–16
Böhm C, Fiedler F, Oswald A, Plant C, Wackersreuther B (2009) Probabilistic skyline queries. In: Proceedings of ACM CIKM, pp 651–660
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering (ICDE), pp 421–430
Chan C, Jagadish H, Tan K, Tung A, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the international conference on management of data (SIGMOD), ACM, pp 503–514
Chen L, Lian X (2008) Dynamic skyline queries in metric spaces. In: Proceedings of the 11th international conference on Extending database technology: advances in database technology (EDBT), ACM. pp 333–343
Chen L, Özsu M, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD, pp 491–502
Cheng R, Kalashnikov D, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng (TKDE) 16(9):1112–1127
Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544
Dehne F, Fabri A, Rau-Chaplin A (1993) Scalable parallel geometric algorithms for coarse grained multicomputers. In: Proceedings of the ninth annual symposium on computational geometry, ACM. pp 298–307
Diao Y, Li B, Liu A, Peng L, Sutton C, Tran T, Zink M (2009) Capturing data uncertainty in high-volume stream processing. In: Conference on innovative data systems research (CIDR)
Ding X, Jin H (2012) Efficient and progressive algorithms for distributed skyline queries over uncertain data. IEEE Trans Knowl Data Eng (TKDE) 24(8):1448–1462
Ding X, Lian X, Chen L, Jin H (2012) Continuous monitoring of skylines over uncertain data streams. Inf Sci 184(1):196–214
Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data stream. ACM SIGMOD Rec 30(2):13–24
Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384
Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst (KAIS) 22(2):159–183
Jayram T, McGregor A, Muthukrishnan S, Vee E (2008) Estimating statistical aggregates on probabilistic data streams. ACM Trans Database Syst (TODS) 33(4):26
Jeffery S, Franklin M, Garofalakis M (2008) An adaptive RFID middleware for supporting metaphysical data independence. VLDB J 17(2):265–289
Jiang B, Pei J (2009) Online interval skyline queries on time series. In: Proceedings of the 25th international conference on data engineering (ICDE), IEEE. pp. 1036–1047
Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th international conference on data engineering (ICDE), IEEE
Khalefa M, Mokbel M, Levandoski J (2010) Skyline query processing for uncertain data. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM), pp 1293–1296
Köhler H, Yang J, Zhou X (2011) Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the international conference on Management of data (SIGMOD), ACM. pp 85–96
Kontaki M, Papadopoulos A, Manolopoulos Y (2008) Continuous k-dominant skyline computation on multidimensional data streams. In: Proceedings of the ACM symposium on applied computing, ACM. pp 956–960
Kontaki M, Papadopoulos A, Manolopoulos Y (2011) Continuous top-k dominating queries. In: IEEE transactions on knowledge and data engineering (TKDE)
Kurose J, Lyons E, McLaughlin D, Pepyne D, Philips B, Westbrook D, Zink M (2006) An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response. In: Proceedings of AINTEC, pp 1–15
Lee K, Zheng B, Li H, Lee W (2007) Approaching the skyline in z order. In: Proceedings of the 33rd international conference on very large data bases (VLDB), VLDB Endowment. pp 279–290
Lian X, Chen L (2008) Monochromatic and bichromatic reverse skyline search over uncertain data. In: Proceedings of ACM SIGMOD, pp 213–226
Lin X, Lu H, Xu J, Yu J (2004) Continuously maintaining quantile summaries of the most recent n elements over a data stream. In: Proceedings of IEEE ICDE, pp 362–373
Lin X, Yuan Y, Wang W, Lu H (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the 21st international conference on data engineering (ICDE), IEEE. pp 502–513
Lin X, Zhang Y, Zhang W, Cheema M (2011) Stochastic skyline operator. In: Proceedings of the 27th international conference on data engineering (ICDE), IEEE. pp 721–732
Lu H, Zhou Y, Haustad J (2013) Efficient and scalable continuous skyline monitoring in two-tier streaming settings. Inf Syst 38(1):68–81
Lu X, Wang H, Wang J, Xu J, Li D (2011) Internet-based virtual computing environment: beyond the data center as a computer. Future Gen Comput Syst (FGCS) 29:309–322
Morse M, Patel J, Grosky W (2007) Efficient continuous skyline computation. Inf Sci 177(17):3411–3437
Park S, Kim T, Park J, Kim J, Im H (2009) Parallel skyline computation on multicore architectures. In: Proceedings of the 25th international conference on data engineering (ICDE), IEEE. pp. 760–771
Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd international conference on very large data bases (VLDB), pp 15–26
Ré C, Dalvi N, Suciu D (2007) Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 886–895
Ré C, Letchner J, Balazinksa M, Suciu D (2008) Event queries on correlated probabilistic streams. In: Proceedings of ACM SIGMOD, pp 715–728
Rocha-Junior J, Vlachou A, Doulkeridis C, Nørvåg K (2009) Agids: A grid-based strategy for distributed skyline query processing. In: Proceedings of data management in grid and peer-to-peer systems (Globe)
Rocha-Junior J, Vlachou A, Doulkeridis C, Nørvåg K (2011) Efficient execution plans for distributed skyline query processing. In: Proceedings of the 14th international conference on extending database technology (EDBT), ACM. pp 271–282
Sharifzadeh M, Shahabi C (2006) The spatial skyline queries. In: Proceedings of the 32nd international conference on very large data bases (VLDB), VLDB Endowment, pp 751–762
Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst (KAIS) 25(3):575–606
Tao Y, Papadias D (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng (TKDE) 18(3):377–391
Vlachou A, Doulkeridis C, Kotidis Y (2008) Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the international conference on management of data (SIGMOD), ACM, pp 227–238
Wang S, Ooi B, Tung A, Xu L (2007) Efficient skyline query processing on peer-to-peer networks. In: Proceedings of the 23rd international conference on data engineering (ICDE), IEEE, pp 1126–1135
Wang Y, Li S (2006) Research and performance evaluation of data replication technology in distributed storage systems. Comput Math Appl 51(11):1625–1632
Wang Y, Li X, Li X, Wang Y (2013) A survey of queries over uncertain data. Knowl Inf Syst (KAIS) 37(3):485–530
Wu P, Zhang C, Feng Y, Zhao B, Agrawal D, El Abbadi A (2006) Parallelizing skyline queries for scalable distribution. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT), pp 112–130
Xin J, Wang G, Chen L, Zhang X, Wang Z (2007) Continuously maintaining sliding window skylines in a sensor network. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 509–521
Yang Y, Wang Y (2011) Towards estimating expected sizes of probabilistic skylines. Sci China Inf Sci 54(12):2554–2564
Zhang W, Lin X, Zhang Y, Wang W, Yu J (2009) Probabilistic skyline operator over sliding windows. In: Proceedings of international conference on data engineering (ICDE), pp 1060–1071
Zhang Y, Zhang W, Lin X, Jiang B, Pei J (2011) Ranking uncertain sky: the probabilistic top-k skyline operator. Inf Syst 36(5):898–915
Zhang Z, Cheng R, Papadias D, Tung A (2009) Minimizing the communication cost for continuous skyline maintenance. In: Proceedings of the international conference on management of data (SIGMOD), ACM, pp 495–508
Acknowledgments
We are grateful to the anonymous reviewers for their very useful comments and suggestions. This work was supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2011CB302601), the National Natural Science Foundation of China (Grant No. 61379052), the National High Technology Research and Development 863 Program of China (Grant No. 2013AA01A213), the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (Grant No. S2010J5050), and Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20124307110015).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, X., Wang, Y., Li, X. et al. Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index. Knowl Inf Syst 41, 277–309 (2014). https://doi.org/10.1007/s10115-013-0725-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0725-8