Skip to main content
Log in

Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributed parallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Afrati FN, Koutris P, Suciu D, Ullman JD (2012) Parallel skyline queries. In: Proceedings of international conference on data theory (ICDT). ACM

  2. Atallah M, Qi Y (2009) Computing all skyline probabilities for uncertain data. In: Proceedings of the ACM symposium on principles of database systems (PODS), pp 279–287

  3. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream. In: Proceedings of ACM Symposium on principles of database systems (PODS), pp 1–16

  4. Böhm C, Fiedler F, Oswald A, Plant C, Wackersreuther B (2009) Probabilistic skyline queries. In: Proceedings of ACM CIKM, pp 651–660

  5. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering (ICDE), pp 421–430

  6. Chan C, Jagadish H, Tan K, Tung A, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the international conference on management of data (SIGMOD), ACM, pp 503–514

  7. Chen L, Lian X (2008) Dynamic skyline queries in metric spaces. In: Proceedings of the 11th international conference on Extending database technology: advances in database technology (EDBT), ACM. pp 333–343

  8. Chen L, Özsu M, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD, pp 491–502

  9. Cheng R, Kalashnikov D, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng (TKDE) 16(9):1112–1127

    Article  Google Scholar 

  10. Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544

    Article  Google Scholar 

  11. Dehne F, Fabri A, Rau-Chaplin A (1993) Scalable parallel geometric algorithms for coarse grained multicomputers. In: Proceedings of the ninth annual symposium on computational geometry, ACM. pp 298–307

  12. Diao Y, Li B, Liu A, Peng L, Sutton C, Tran T, Zink M (2009) Capturing data uncertainty in high-volume stream processing. In: Conference on innovative data systems research (CIDR)

  13. Ding X, Jin H (2012) Efficient and progressive algorithms for distributed skyline queries over uncertain data. IEEE Trans Knowl Data Eng (TKDE) 24(8):1448–1462

    Article  Google Scholar 

  14. Ding X, Lian X, Chen L, Jin H (2012) Continuous monitoring of skylines over uncertain data streams. Inf Sci 184(1):196–214

    Google Scholar 

  15. Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data stream. ACM SIGMOD Rec 30(2):13–24

    Article  Google Scholar 

  16. Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384

    Article  Google Scholar 

  17. Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst (KAIS) 22(2):159–183

    Article  Google Scholar 

  18. Jayram T, McGregor A, Muthukrishnan S, Vee E (2008) Estimating statistical aggregates on probabilistic data streams. ACM Trans Database Syst (TODS) 33(4):26

    Article  Google Scholar 

  19. Jeffery S, Franklin M, Garofalakis M (2008) An adaptive RFID middleware for supporting metaphysical data independence. VLDB J 17(2):265–289

    Article  Google Scholar 

  20. Jiang B, Pei J (2009) Online interval skyline queries on time series. In: Proceedings of the 25th international conference on data engineering (ICDE), IEEE. pp. 1036–1047

  21. Kanagal B, Deshpande A (2008) Online filtering, smoothing and probabilistic modeling of streaming data. In: Proceedings of the 24th international conference on data engineering (ICDE), IEEE

  22. Khalefa M, Mokbel M, Levandoski J (2010) Skyline query processing for uncertain data. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM), pp 1293–1296

  23. Köhler H, Yang J, Zhou X (2011) Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the international conference on Management of data (SIGMOD), ACM. pp 85–96

  24. Kontaki M, Papadopoulos A, Manolopoulos Y (2008) Continuous k-dominant skyline computation on multidimensional data streams. In: Proceedings of the ACM symposium on applied computing, ACM. pp 956–960

  25. Kontaki M, Papadopoulos A, Manolopoulos Y (2011) Continuous top-k dominating queries. In: IEEE transactions on knowledge and data engineering (TKDE)

  26. Kurose J, Lyons E, McLaughlin D, Pepyne D, Philips B, Westbrook D, Zink M (2006) An end-user-responsive sensor network architecture for hazardous weather detection, prediction and response. In: Proceedings of AINTEC, pp 1–15

  27. Lee K, Zheng B, Li H, Lee W (2007) Approaching the skyline in z order. In: Proceedings of the 33rd international conference on very large data bases (VLDB), VLDB Endowment. pp 279–290

  28. Lian X, Chen L (2008) Monochromatic and bichromatic reverse skyline search over uncertain data. In: Proceedings of ACM SIGMOD, pp 213–226

  29. Lin X, Lu H, Xu J, Yu J (2004) Continuously maintaining quantile summaries of the most recent n elements over a data stream. In: Proceedings of IEEE ICDE, pp 362–373

  30. Lin X, Yuan Y, Wang W, Lu H (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the 21st international conference on data engineering (ICDE), IEEE. pp 502–513

  31. Lin X, Zhang Y, Zhang W, Cheema M (2011) Stochastic skyline operator. In: Proceedings of the 27th international conference on data engineering (ICDE), IEEE. pp 721–732

  32. Lu H, Zhou Y, Haustad J (2013) Efficient and scalable continuous skyline monitoring in two-tier streaming settings. Inf Syst 38(1):68–81

    Article  Google Scholar 

  33. Lu X, Wang H, Wang J, Xu J, Li D (2011) Internet-based virtual computing environment: beyond the data center as a computer. Future Gen Comput Syst (FGCS) 29:309–322

    Article  Google Scholar 

  34. Morse M, Patel J, Grosky W (2007) Efficient continuous skyline computation. Inf Sci 177(17):3411–3437

    Article  MathSciNet  Google Scholar 

  35. Park S, Kim T, Park J, Kim J, Im H (2009) Parallel skyline computation on multicore architectures. In: Proceedings of the 25th international conference on data engineering (ICDE), IEEE. pp. 760–771

  36. Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd international conference on very large data bases (VLDB), pp 15–26

  37. Ré C, Dalvi N, Suciu D (2007) Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd international conference on data engineering (ICDE), pp 886–895

  38. Ré C, Letchner J, Balazinksa M, Suciu D (2008) Event queries on correlated probabilistic streams. In: Proceedings of ACM SIGMOD, pp 715–728

  39. Rocha-Junior J, Vlachou A, Doulkeridis C, Nørvåg K (2009) Agids: A grid-based strategy for distributed skyline query processing. In: Proceedings of data management in grid and peer-to-peer systems (Globe)

  40. Rocha-Junior J, Vlachou A, Doulkeridis C, Nørvåg K (2011) Efficient execution plans for distributed skyline query processing. In: Proceedings of the 14th international conference on extending database technology (EDBT), ACM. pp 271–282

  41. Sharifzadeh M, Shahabi C (2006) The spatial skyline queries. In: Proceedings of the 32nd international conference on very large data bases (VLDB), VLDB Endowment, pp 751–762

  42. Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst (KAIS) 25(3):575–606

    Article  Google Scholar 

  43. Tao Y, Papadias D (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng (TKDE) 18(3):377–391

    Google Scholar 

  44. Vlachou A, Doulkeridis C, Kotidis Y (2008) Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the international conference on management of data (SIGMOD), ACM, pp 227–238

  45. Wang S, Ooi B, Tung A, Xu L (2007) Efficient skyline query processing on peer-to-peer networks. In: Proceedings of the 23rd international conference on data engineering (ICDE), IEEE, pp 1126–1135

  46. Wang Y, Li S (2006) Research and performance evaluation of data replication technology in distributed storage systems. Comput Math Appl 51(11):1625–1632

    Article  Google Scholar 

  47. Wang Y, Li X, Li X, Wang Y (2013) A survey of queries over uncertain data. Knowl Inf Syst (KAIS) 37(3):485–530

    Article  Google Scholar 

  48. Wu P, Zhang C, Feng Y, Zhao B, Agrawal D, El Abbadi A (2006) Parallelizing skyline queries for scalable distribution. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT), pp 112–130

  49. Xin J, Wang G, Chen L, Zhang X, Wang Z (2007) Continuously maintaining sliding window skylines in a sensor network. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 509–521

  50. Yang Y, Wang Y (2011) Towards estimating expected sizes of probabilistic skylines. Sci China Inf Sci 54(12):2554–2564

    Article  MathSciNet  MATH  Google Scholar 

  51. Zhang W, Lin X, Zhang Y, Wang W, Yu J (2009) Probabilistic skyline operator over sliding windows. In: Proceedings of international conference on data engineering (ICDE), pp 1060–1071

  52. Zhang Y, Zhang W, Lin X, Jiang B, Pei J (2011) Ranking uncertain sky: the probabilistic top-k skyline operator. Inf Syst 36(5):898–915

    Article  Google Scholar 

  53. Zhang Z, Cheng R, Papadias D, Tung A (2009) Minimizing the communication cost for continuous skyline maintenance. In: Proceedings of the international conference on management of data (SIGMOD), ACM, pp 495–508

Download references

Acknowledgments

We are grateful to the anonymous reviewers for their very useful comments and suggestions. This work was supported by the National Grand Fundamental Research 973 Program of China (Grant No. 2011CB302601), the National Natural Science Foundation of China (Grant No. 61379052), the National High Technology Research and Development 863 Program of China (Grant No. 2013AA01A213), the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (Grant No. S2010J5050), and Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20124307110015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yijie Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Wang, Y., Li, X. et al. Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index. Knowl Inf Syst 41, 277–309 (2014). https://doi.org/10.1007/s10115-013-0725-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0725-8

Keywords

Navigation