Abstract
Stream processing applications continuously process large amounts of online streaming data in real time or near real time. They have strict latency constraints. However, the continuous processing makes them vulnerable to any failures, and the recoveries may slow down the entire processing pipeline and break latency constraints. The upstream backup scheme is one of the most widely applied fault-tolerant schemes for stream processing systems. It introduces complex backup dependencies to tasks, which increases the difficulty of controlling recovery latencies. Moreover, when dependent tasks are located on the same processor, they fail at the same time in processor-level failures, bringing extra recovery latencies that increase the impacts of failures. This paper studies the relationship between the task allocation and the recovery latency of a stream processing application. We present a correlated failure effect model to describe the recovery latency of a stream topology in processor-level failures under a task allocation plan. We introduce a recovery-latency aware task allocation problem (RTAP) that seeks task allocation plans for stream topologies that will achieve guaranteed recovery latencies. We discuss the difference between RTAP and classic task allocation problems and present a heuristic algorithm with a computational complexity of O(n log2 n) to solve the problem. Extensive experiments were conducted to verify the correctness and effectiveness of our approach. It improves the resource usage by 15%–20% on average.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Stonebraker M, Çetintemel U, Zdonik S. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 2005, 34(4): 42-47.
Arasu A, Babcock B, Babu S, Datar M, Ito K, Nishizawa I et al. STREAM: The Stanford stream data manager (demonstration description). In Proc. ACM SIGMOD International Conference on Management of Data, June 2003, pp.665-665.
Hesse G, Lorenz M. Conceptual survey on data stream processing systems. In Proc. the 21st IEEE International Conference on Parallel and Distributed Systems, January 2015, pp.797-802.
Chandrasekaran S, Cooper O, Deshpande A et al. TelegraphCQ: Continuous dataflow processing. In Proc. the 2003 ACM SIGMOD International Conference on Management of Data, June 2003, pp.668-668.
Akidau T, Balikov A, Bekiroğlu K, Chernyak S et al. Mill-Wheel: Fault-tolerant stream processing at Internet scale. Proceedings of the VLDB Endowment, 2013, 6(11): 1033-1044.
Toshniwal A, Taneja S, Shukla A, Ramasamy K et al. Storm@ Twitter. In Proc. ACM SIGMOD International Conference on Management of Data, June 2014, pp.147-156.
Neumeyer L, Robbins B, Nair A, Kesari A. S4: Distributed stream computing platform. In Proc. IEEE International Conference on Data Mining Workshops, Dec. 2010, pp.170-177.
Kulkarni S, Bhagat N, Fu M, Kedigehalli V et al. Twitter heron: Stream processing at scale. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.239-250.
Zhao J, Ou S, Hu L, Ding Y, Xu G. A heuristic placement selection approach of partitions of mobile applications in mobile cloud computing model based on community collaboration. Cluster Computing, 2017, 20(4): 3131-3146.
Eidenbenz R, Locher T. Task allocation for distributed stream processing. In Proc. the 35th Annual IEEE International Conference on Computer Communications, April 2016.
Hwang J, Balazinska M, Rasin A, Cetintemel U, Stonebraker M, Zdonik S. High-availability algorithms for distributed stream processing. In Proc. the 21st IEEE International Conference on Data Engineering, April 2005, pp.779-790.
Babcock B, Babu S, Motwani R, Datar M. Chain: Operator scheduling for memory minimization in data stream systems. In Proc. ACM SIGMOD International Conference on Management of Data, June 2003, pp.253-264.
Cardellini V, Grassi C, Presti L F, Nardelli M. Optimal operator placement for distributed stream processing applications. In Proc. the 10th ACM International Conference on Distributed and Event-Based Systems, June 2016, pp.69-80.
Chatzistergiou A, Viglas S D. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proc. the 23rd ACM International Conference on Information and Knowledge Management, Nov. 2014, pp.1579-1588.
Lohrmann B, Janacik P, Kao O. Elastic stream processing with latency guarantees. In Proc. the 35th IEEE Distributed Computing Systems, June 2015, pp.399-410.
Li H, Wu J, Jiang Z, Li X, Wei X, Zhuang Y. Integrated recovery and task allocation for stream processing. In Proc. the 36th IEEE Performance Computing and Communications Conference, Dec. 2017.
Li H, Wu J, Jiang Z, Li X, Wei X. Minimum backups for stream processing with recovery latency guarantees. IEEE Transactions on Reliability, 2017, 66(3): 783-94.
Sun D, Zhang G, Wu C, Li K, Zheng W. Building a fault tolerant framework with deadline guarantee in big data stream computing environments. Journal of Computer and System Sciences, 2017, 89: 4-23.
Krempl G, Žliobaite I, Brzeziński D, Hüllermeier E et al. Open challenges for data stream mining research. ACM SIGKDD Explorations Newsletter, 2014, 16(1): 1-10.
Li H, Wu J, Jiang Z, Li X, Wei X. Task allocation for stream processing with recovery latency guarantee. In Proc. IEEE International Conference on Cluster Computing, Sept. 2017, pp.379-383.
Ananthanarayanan R, Basker V, Das S, Gupta A, Jiang H, Qiu T, Reznichenko A, Ryabkov D, Singh M, Venkataraman S. Photon: Fault-tolerant and scalable joining of continuous data streams. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.577-588.
Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, Zhou L, Yu Y, Zhang Z. TimeStream: Reliable stream computation in the cloud. In Proc. the 8th ACM European Conference on Computer Systems, Apr. 2013, pp.1-14.
Su L, Zhou Y. Tolerating correlated failures in massively parallel stream processing engines. In Proc. the 32nd IEEE International Conference on Data Engineering, May 2016, pp.517-528.
Upadhyaya P, Kwon Y, Balazinska M. A latency and fault-tolerance optimizer for online parallel query plans. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2011, pp.241-252.
Fernandez C R, Migliavacca M, Kalyvianaki E, Pietzuch P. Integrating scale out and fault tolerance in stream processing using operator state management. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.725-736.
Balazinska M, Balakrishnan H, Madden S R, Stonebraker M. Fault-tolerance in the Borealis distributed stream processing system. ACM Transactions on Database Systems, 2008, 33(1): Article No. 3.
Heinze T, Zia M, Krahn R, Jerzak Z, Fetzer C. An adaptive replication scheme for elastic data stream processing systems. In Proc. the 9th ACM International Conference on Distributed Event-Based Systems, June 2015, pp.150-161.
Stanoi I, Mihaila G, Palpanas T, Lang C. WhiteWater: Distributed processing of fast streams. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(9): 1214-1226.
de Matteis T, Mencagli G. Proactive elasticity and energy awareness in data stream processing. Journal of Systems and Software, 2017, 127: 302-319.
Wu J. Distributed System Design. CRC Press, Inc., 2017.
Wu J, Huang K. The balanced hypercube: A cube-based system for fault-tolerant applications. IEEE Transactions on Computers, 1997, 46(4): 484-90.
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K. Apache Flink™: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015, 36(4): 28-38.
Zhuang Y, Wei X, Li H, Wang Y, He X. An optimal check-pointing model with online OCI adjustment for stream processing applications. In Proc. the 27th IEEE International Conference on Computer Communication and Networks, July 2018.
Salama A, Binnig C, Kraska T, Zamanian E. Cost-based fault-tolerance for parallel data processing. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.285-297.
Young J W. A first order approximation to the optimum checkpoint interval. Communications of the ACM, 1974, 17(9): 530-531.
Xu H, Xing L, Huang L. Regional science and technology resource allocation optimization based on improved genetic algorithm. KSII Transactions on Internet & Information Systems, 2017, 11(4): 1972-1986.
Jin Z, Xu G, Li Y, Liu P. A novel cloud scheduling algorithm optimization for energy consumption of data centres based on user QoS priori knowledge under the background of WSN and mobile communication. Cluster Computing, 2017, 20(2): 1587-1597.
Christensen H I, Khan A, Pokutta S, Tetali P. Approximation and online algorithms for multidimensional bin packing: A survey. Computer Science Review, 2017, 24: 63-79.
Lodi A, Martello S, Monaci M. Two-dimensional packing problems: A survey. European Journal of Operational Research, 2002, 141(2): 241-252.
Coffman Jr E G, Csirik J, Galambos G, Martello S, Vigo D. Bin packing approximation algorithms: Survey and classification. In Handbook of Combinatorial Optimization, Pardalos P A, Du D Z, Graham R L (eds.), Springer, 2013, pp.455-531.
Johnson D S. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 1974, 9(3): 256-278.
Chung F R, Garey M R, Johnson D S. On packing two-dimensional bins. SIAM Journal on Algebraic Discrete Methods, 1982, 3(1): 66-76.
Garey M R, Graham R L, Ullman J D. Worst-case analysis of memory allocation algorithms. In Proc. the 4th ACM Symposium on Theory of computing, May 1972, pp.143-150.
Jiang Y, Huang Z, Tsang D H. Towards max-min fair resource allocation for stream big data analytics in shared clouds. IEEE Transactions on Big Data, 2016, 4(1): 130-137.
Kreps J, Narkhede N, Rao J. Kafka: A distributed messaging system for log processing. In Proc. the 6th Workshop on Networking Meets Databases, June 2011.
Sadykov R, Vanderbeck F. Bin packing with conflicts: A generic branch-and-price algorithm. INFORMS Journal on Computing, 2013, 25(2): 244-255.
Koukoumidis E, Peh L S, Martonosi M R. SignalGuru: Leveraging mobile phones for collaborative traffic signal schedule advisory. In Proc. the 9th ACM International Conference on Mobile Systems, Applications, and Services, June 2011, pp.127-140.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 182 kb)
Rights and permissions
About this article
Cite this article
Li, HL., Wu, J., Jiang, Z. et al. A Task Allocation Method for Stream Processing with Recovery Latency Constraint. J. Comput. Sci. Technol. 33, 1125–1139 (2018). https://doi.org/10.1007/s11390-018-1876-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-018-1876-6