Abstract
Some emergency events such as time interval between input streams, operator’s misoperation, and network delay might cause stream processing system produce unbounded out-of-order data streams. Recent work on this issue focuses on explicit punctuation or heartbeats to handle faults and stragglers (outlier data). Most parallel and distributed models on stream processing, such as Google MillWheel and Apache Flink, require hot replication, logging, and upstream backup in an expensive manner. But these frameworks ignore straggler processing. Some latest frameworks such as Google MillWheel and Apache Flink only process disorder on an operator level, but only point-in-time and fixed window of low watermarks are discussed. Therefore, we propose a new sliding time window of low watermarks to detect delayed stream arrival. Contributions of our methods conclude as adaptive low watermarks, distinguishing stragglers from late data, and dynamic rectification of low watermark. The experiments show that our method is better in tolerating more late data to detect stragglers accurately.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Akidau, T., et al.: MillWheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6(11), 1033–1044 (2013)
Awad, A., Traub, J., Sakr, S.: Adaptive watermarks: a concept drift-based approach for predicting event-time progress in data streams. In: EDBT, pp. 622–625 (2019)
Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in Apache Flink®: consistent stateful distributed stream processing. Proc. VLDB Endow. 10(12), 1718–1729 (2017)
Carbone, P., et al.: Large-scale data stream processing systems. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 219–260. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_7
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28 (2015)
Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2013)
Grulich, P.M., Saitenmacher, R., Traub, J., Breß, S., Rabl, T., Markl, V.: Scalable detection of concept drifts on data streams with parallel adaptive windowing. In: EDBT, pp. 477–480 (2018)
Hwang, J.H., Balazinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., Zdonik, S.: High-availability algorithms for distributed stream processing. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 779–790. IEEE (2005)
Iqbal, M.H., Soomro, T.R.: Big data analysis: Apache storm perspective. Int. J. Comput. Trends Technol. 19(1), 9–14 (2015)
Jerzak, Z., Heinze, T., Fehr, M., Gröber, D., Hartung, R., Stojanovic, N.: The debs 2012 grand challenge. In: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, pp. 393–398. ACM (2012)
Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. Proc. VLDB Endow. 1(1), 274–288 (2008)
Nagano, K., Itokawa, T., Kitasuka, T., Aritsugi, M.: Exploitation of backup nodes for reducing recovery cost in high availability stream processing systems. In: Proceedings of the Fourteenth International Database Engineering & Applications Symposium, pp. 61–63. ACM (2010)
Shah, M.A., Hellerstein, J.M., Brewer, E.: Highly available, fault-tolerant, parallel dataflows. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 827–838. ACM (2004)
Shoro, A.G., Soomro, T.R.: Big data analysis: Apache spark perspective. Glob. J. Comput. Sci. Technol. 15, 7 (2015)
Toshniwal, A., et al.: Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 147–156. ACM (2014)
Traub, J., et al.: Scotty: efficient window aggregation for out-of-order stream processing. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1300–1303. IEEE (2018)
Weiss, W., Jiménez, V.J.E., Zeiner, H.: A dataset and a comparison of out-of-order event compensation algorithms. In: IoTBDS, pp. 36–46 (2017)
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 423–438. ACM (2013)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61772231), the Shandong Provincial Natural Science Foundation (ZR2017MF025), the Project of Shandong Provincial Social Science Program (18CHLJ39), the Science and Technology Program of University of Jinan (XKY1734 & XKY1828), and the Project of Independent Cultivated Innovation Team of Jinan City (2018GXRC002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Zhang, X., Ma, K. (2021). Toward Sliding Time Window of Low Watermark to Detect Delayed Stream Arrival. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-67540-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-67540-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67539-4
Online ISBN: 978-3-030-67540-0
eBook Packages: Computer ScienceComputer Science (R0)