Skip to main content

Toward Sliding Time Window of Low Watermark to Detect Delayed Stream Arrival

  • Conference paper
  • First Online:
  • 744 Accesses

Abstract

Some emergency events such as time interval between input streams, operator’s misoperation, and network delay might cause stream processing system produce unbounded out-of-order data streams. Recent work on this issue focuses on explicit punctuation or heartbeats to handle faults and stragglers (outlier data). Most parallel and distributed models on stream processing, such as Google MillWheel and Apache Flink, require hot replication, logging, and upstream backup in an expensive manner. But these frameworks ignore straggler processing. Some latest frameworks such as Google MillWheel and Apache Flink only process disorder on an operator level, but only point-in-time and fixed window of low watermarks are discussed. Therefore, we propose a new sliding time window of low watermarks to detect delayed stream arrival. Contributions of our methods conclude as adaptive low watermarks, distinguishing stragglers from late data, and dynamic rectification of low watermark. The experiments show that our method is better in tolerating more late data to detect stragglers accurately.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Akidau, T., et al.: MillWheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6(11), 1033–1044 (2013)

    Article  Google Scholar 

  2. Awad, A., Traub, J., Sakr, S.: Adaptive watermarks: a concept drift-based approach for predicting event-time progress in data streams. In: EDBT, pp. 622–625 (2019)

    Google Scholar 

  3. Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in Apache Flink®: consistent stateful distributed stream processing. Proc. VLDB Endow. 10(12), 1718–1729 (2017)

    Article  Google Scholar 

  4. Carbone, P., et al.: Large-scale data stream processing systems. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 219–260. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_7

    Chapter  Google Scholar 

  5. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28 (2015)

    Google Scholar 

  6. Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2013)

    Article  MathSciNet  Google Scholar 

  7. Grulich, P.M., Saitenmacher, R., Traub, J., Breß, S., Rabl, T., Markl, V.: Scalable detection of concept drifts on data streams with parallel adaptive windowing. In: EDBT, pp. 477–480 (2018)

    Google Scholar 

  8. Hwang, J.H., Balazinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 779–790. IEEE (2005)

    Google Scholar 

  9. Iqbal, M.H., Soomro, T.R.: Big data analysis: Apache storm perspective. Int. J. Comput. Trends Technol. 19(1), 9–14 (2015)

    Article  Google Scholar 

  10. Jerzak, Z., Heinze, T., Fehr, M., Gröber, D., Hartung, R., Stojanovic, N.: The debs 2012 grand challenge. In: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, pp. 393–398. ACM (2012)

    Google Scholar 

  11. Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. Proc. VLDB Endow. 1(1), 274–288 (2008)

    Article  Google Scholar 

  12. Nagano, K., Itokawa, T., Kitasuka, T., Aritsugi, M.: Exploitation of backup nodes for reducing recovery cost in high availability stream processing systems. In: Proceedings of the Fourteenth International Database Engineering & Applications Symposium, pp. 61–63. ACM (2010)

    Google Scholar 

  13. Shah, M.A., Hellerstein, J.M., Brewer, E.: Highly available, fault-tolerant, parallel dataflows. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 827–838. ACM (2004)

    Google Scholar 

  14. Shoro, A.G., Soomro, T.R.: Big data analysis: Apache spark perspective. Glob. J. Comput. Sci. Technol. 15, 7 (2015)

    Google Scholar 

  15. Toshniwal, A., et al.: Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 147–156. ACM (2014)

    Google Scholar 

  16. Traub, J., et al.: Scotty: efficient window aggregation for out-of-order stream processing. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1300–1303. IEEE (2018)

    Google Scholar 

  17. Weiss, W., Jiménez, V.J.E., Zeiner, H.: A dataset and a comparison of out-of-order event compensation algorithms. In: IoTBDS, pp. 36–46 (2017)

    Google Scholar 

  18. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 423–438. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61772231), the Shandong Provincial Natural Science Foundation (ZR2017MF025), the Project of Shandong Provincial Social Science Program (18CHLJ39), the Science and Technology Program of University of Jinan (XKY1734 & XKY1828), and the Project of Independent Cultivated Innovation Team of Jinan City (2018GXRC002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., Ma, K. (2021). Toward Sliding Time Window of Low Watermark to Detect Delayed Stream Arrival. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-67540-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67540-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67539-4

  • Online ISBN: 978-3-030-67540-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics