Abstract:
Distributed stream processing platforms is a new class of real-time monitoring systems that analyze and extracts knowledge from large continuous streams of data. This typ...Show MoreMetadata
Abstract:
Distributed stream processing platforms is a new class of real-time monitoring systems that analyze and extracts knowledge from large continuous streams of data. This type of systems is crucial for providing high throughput and low latency required by Big Data or Internet of Things monitoring applications. This paper describes and analyzes three main open-source distributed stream- processing platforms: Storm Flink, and Spark Streaming. We analyze the system architectures and we compare their main features. We carry out two experiments concerning anomaly detection on network traffic to evaluate the throughput efficiency and the resilience to node failures. Results show that the performance of native stream processing systems, Storm and Flink, is up to 15 times higher than the micro-batch processing system, Spark Streaming. On the other hand, Spark Streaming is more robust to node failures and provides recovery without losses.
Published in: 2016 IEEE Global Communications Conference (GLOBECOM)
Date of Conference: 04-08 December 2016
Date Added to IEEE Xplore: 06 February 2017
ISBN Information: