I-Scheduler: Iterative scheduling for distributed stream processing systems
Introduction
In the era of big data, with streaming applications such as social media, surveillance monitoring and real-time search generating large volumes of data, efficient Data Stream Processing Systems (DSPSs) have become essential. More than 30,000 GB of data is generated every second and the rate is accelerating [2]. According to IBM,2 90% of the data that existed in 2012 was created in the two years prior. These data sources need to be analysed to gain insights, and find trends such as determining the most frequent events for a continuous dataflow occurring over a certain period of time. DSPSs are designed to process such dataflows, by operating on data streams, which are a continuous, unbounded sequence of data items, with a number of data attributes, processed in the order in which they arrive. In comparison, batch processing systems store the data before performing ad hoc queries, which is not suited for real-time analysis. A general purpose DSPS faces a number of competing challenges, such as task allocation, scalability, fault tolerance, QoS, parallelism degree, and state management, among others. It is not possible to optimise for all of the challenges at the same time as there will be tradeoffs. The specific streaming application requirements determine which challenges need to be addressed. While each of these challenges are current areas of research, we focus on scheduling in this paper as low latency response times are a consistent priority across many streaming applications. The scheduling policy determines how tasks are distributed in the DSPS, which can have a significant impact on the performance metrics of the system such as tuple latency (the time taken to process a tuple) and system throughput (the number of tuples processed in a given time) [3]. A scheduling policy needs to strike a balance between system performance, the use of system resources and run-time overhead.
Finding an optimal placement for DSPSs is NP-hard [4], [5], [6]. Thus, approximate approaches are required to improve the performance of DSPSs [7]. An efficient task scheduler will adapt to changes in the communication pattern of a streaming application, ensuring that the communication between compute nodes, referred to as inter-node communication, is minimised. Specifically, by placing highly communicating tasks on the same node, communication between compute nodes can be reduced [8], [9], [10], [11]. The term “highly communicating tasks”, which is used throughout this paper, refers to a pair or group of tasks which exchange a larger amount of data than other neighbouring tasks. Additionally, prioritising the use of higher capacity compute nodes allows more highly communicating tasks to be co-located within a compute node, requiring fewer nodes to be used, which helps to further reduce inter-node communication. To achieve this, the scheduler monitors the run-time communication of a streaming application, logging the communication rates between tasks and tasks’ load, which is then used when rescheduling.
Existing work on task schedulers that aim to minimise the inter-node communication have a number of limitations. Firstly, the compute nodes might be underutilised, which results in using more compute nodes than required. Secondly, many of the schedulers are not designed for heterogeneous clusters, which is an important requirement for many deployments, as they evolve over time, as new hardware is added. Further, a multi-user homogeneous cluster where not all of the system resources are available can be viewed as a heterogeneous system. Finally, offline schedulers are incapable of adapting to the run-time changes in the traffic patterns of streaming applications. In this paper, we aim to address these limitations and propose I-Scheduler, for DAG-based DSPSs, which partitions the DAG in order to minimise the communication between each part, such that inter-node communication is minimised when each part is assigned to a compute node. The contributions of this paper are summarised as follows:
- •
We propose I-Scheduler, for homogeneous and heterogeneous DSPSs, which reduces the size of the task graph by fusing highly communicating tasks, allowing mathematical optimisation software to be used to find an efficient task assignment. A fallback heuristic is also proposed for cases where the optimisation software cannot be used, which iteratively partitions the application graph based on the capacity of the heterogeneous nodes and assigns each partition to a node with a relative capacity.
- •
We evaluate the communication cost of I-Scheduler by comparing it to a theoretically optimal scheduler, implemented in CPLEX, when run on three micro-benchmarks, each representing a different communication pattern. The results show that I-Scheduler can achieve results that are close to optimal in a number of different cluster configurations.
- •
We implement the proposed scheduler in Apache Storm 1.1.1 and through experimental results, show that our proposed scheduler can outperform state of the art R-Storm [9] and Aniello et al.’s ‘Online scheduler’ [8] (for brevity we refer to this scheduler as OLS in this paper). The results show that I-Scheduler outperforms OLS, increasing throughput by 20%–86% and R-Storm by 3%–30% for the real-world applications.
The rest of this paper is organised as follows. In Section 2, we discuss key related work. We then present the problem definition in Section 3. Section 4 describes our proposed scheduler, which is then followed by a comparison with an optimal scheduler in Section 5. Section 6 evaluates our proposed scheduler with two real world applications. Finally, Section 7 concludes the paper.
Section snippets
Related work
There is a broad range of research on task placement in DSPSs with different optimisation goals. Generally, there are three main approaches to tackle each optimisation problem [12]: mathematical programming, graph-based approximation and heuristics. In the following, we discuss a number of scheduling algorithms that use different approaches to meet specific optimisation goals.
In the mathematical programming approach, the scheduling problem is formulated as an optimisation problem and solved
System model and problem definition
Data stream processing systems, such as Apache S4 [32], Flink,4 Storm5 and Twitter Heron [33], process large volumes of data, on the fly, as it flows through the system, without first being stored. This makes them ideal for providing real-time analysis of fast flowing information, with a common example being Twitter’s top trending topics, where real-time streams of tweets are analysed to determine the most popular topics at a given time.
I-Scheduler algorithm
As discussed in Section 3, using optimisation software to find an optimal solution to the scheduling problem is often not practical. One approach is to reduce the size of the task graph by fusing each pair of tasks into a single task [12]. However, this has a number of drawbacks. Firstly, fusing task pairs takes a localised view of the task communications and may be unreliable at placing communicating tasks within the same node. Secondly, this approach may not be able to scale to larger problem
I-Scheduler versus optimal scheduler
In this section, we compare the communication cost and resolution time of I-Scheduler with a theoretical optimal scheduler when run on three micro-benchmarks. Each of these micro-benchmarks, shown in Fig. 3, is named after its shape of the topology and communication pattern: linear, diamond and star, and are based on the implementation originally presented in [9]. These are common communication patterns that can be found in real world applications, which consists of either just one of these
Experimental evaluation
In this section, we evaluate I-Scheduler on a homogeneous and heterogeneous cluster, using three micro-benchmarks and two real-world applications. These results are then compared with Aniello et al.’s ‘Online scheduler’ (referred to as ‘OLS’) [8] and R-Storm [9], previously discussed in Section 2, which were chosen for two main reasons. Firstly, OLS and R-Storm have the same optimisation criteria as I-Scheduler, which is to reduce the inter-node communication. Secondly, the source code for both
Conclusions and future work
In this paper, we have presented I-Scheduler, which iteratively uses graph partitioning to efficiently fuse groups of highly communicating tasks into a single task, in order to reduce the task size of the application graph. Once the task graph is sufficiently small, optimisation software is able to determine the task assignment within a user defined time tolerance. In cases where the task graph is too large to be solved by the optimisation software or the user cannot tolerate any delays in
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Leila Eskandari received her Ph.D. in Computer Science from the University of Otago, New Zealand. She previously completed an M.Sc. degree from Sharif University of Technology in Network Engineering and a B.Sc. degree from Ferdowsi University of Mashhad in Computer Engineering, Iran. Her research interests include big data processing, distributed and cloud computing and computer networks.
References (44)
- et al.
QoS-aware resource allocation for stream processing engines using priority channels
- et al.
Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments
Inform. Sci.
(2015) - et al.
Improving the predictability of distributed stream processors
Future Gener. Comput. Syst.
(2015) - et al.
Scheduling linear chain streaming applications on heterogeneous systems with failures
Future Gener. Comput. Syst.
(2013) - et al.
T3-Scheduler: A Topology and Traffic aware Two-level Scheduler for stream processing systems in a heterogeneous cluster
Future Gener. Comput. Syst.
(2018) - et al.
Multilevel K-way partitioning scheme for irregular graphs
J. Parallel Distrib. Comput.
(1998) - et al.
Model-driven scheduling for distributed stream processing systems
J. Parallel Distrib. Comput.
(2018) - et al.
Self-adaptive processing graph with operator fission for elastic stream processing
J. Syst. Softw.
(2017) - et al.
Iterative scheduling for distributed stream processing systems
- et al.
Big Data: Principles and Best Practices of Scalable Realtime Data Systems
(2015)
Stream Data Processing: A Quality of Service Perspective: Modeling, Scheduling, Load Shedding, and Complex Event Processing, Vol. 36
Computers and intractability: A guide to the theory of NP-completeness
J. Symbolic Logic
Operator placement for in-network stream query processing
Task allocation for distributed stream processing
Placement strategies for internet-scale data stream systems
IEEE Internet Comput.
R-Storm: Resource-aware scheduling in Storm
T-Storm: Traffic-aware online scheduling in Storm
Fast heuristics for near-optimal task allocation in data stream processing over clusters
Task allocation in distributed data processing
IEEE Comput.
Optimal operator placement for distributed stream processing applications
Optimal operator replication and placement for distributed stream processing systems
Cited by (9)
An energy efficient and runtime-aware framework for distributed stream computing systems
2022, Future Generation Computer SystemsCitation Excerpt :In [28], an adaptive online scheme was proposed to schedule and enforce resource allocation in stream processing systems, which guaranteed that the system could achieve less congestion under heavy load and less resource waste under light load. In [29], the task assignment problem under stream computing systems was mainly studied by finding highly communicative tasks for which tasks were assigned by using graph partitioning algorithms and mathematical packages. However, the elastic variation of the data stream and the energy consumption of the compute nodes were likewise not considered.
Recent implications towards sustainable and energy efficient AI and big data implementations in cloud-fog systems: A newsworthy inquiry
2022, Journal of King Saud University - Computer and Information SciencesCitation Excerpt :ExpREsS (Maroulis et al., 2019) is a proactive scheduler deploying Random Forest on stream time series to estimate execution time and energy consumption, then a modified DVFS technique to reduce the CPU cores voltage-frequency. I-Scheduler (Eskandari et al., 2021) attempts to merge highly communicating stream tasks for efficient workload mapping across heterogeneous flow nodes. Re-Stream (Sun et al., 2015), in turn, consolidates non-critical vertices above non-critical path in order to enhance energy efficiency.
Cost-Efficient Scheduling of Streaming Applications in Apache Flink on Cloud
2023, IEEE Transactions on Big DataWG-Storm Scheduler for Distributed Stream Processing Engines
2023, Research Square
Leila Eskandari received her Ph.D. in Computer Science from the University of Otago, New Zealand. She previously completed an M.Sc. degree from Sharif University of Technology in Network Engineering and a B.Sc. degree from Ferdowsi University of Mashhad in Computer Engineering, Iran. Her research interests include big data processing, distributed and cloud computing and computer networks.
Jason Mair received his Ph.D. in Computer Science in 2015 and a B.Sc. Honors degree in 2010 from the University of Otago, New Zealand. His research interests include multicore architectures, green computing, big data processing, distributed and cloud computing.
Zhiyi Huang received the B.Sc. degree in 1986 and the Ph.D. degree in 1992 in computer science from the National University of Defense Technology (NUDT) in China. He is an Associate Professor at the Department of Computer Science, University of Otago, New Zealand. He was a visiting professor at EPFL (Swiss Federal Institute of Technology Lausanne) and Tsinghua University in 2005, and a visiting scientist at MIT CSAIL in 2009. His research fields include parallel/distributed computing, multicore architectures, operating systems, green computing, cluster/grid/cloud computing, high-performance computing, and computer networks.
David Eyers received his Ph.D. in Computer Science in 2006 from the University of Cambridge, UK, having previously attained his BE in Computer Engineering from the University of New South Wales in Sydney, Australia. He is an Associate Professor at the Department of Computer Science at the University of Otago in New Zealand, and a visiting research fellow at the University of Cambridge Computer Laboratory. He has broad research interests including green computing, information flow control, network security, and distributed and cloud computing.