Abstract
Recent work in network measurements focuses on scaling the performance of monitoring platforms to 10Gb/s and beyond. Concurrently, IT community focuses on scaling the analysis of big-data over a cluster of nodes. So far, combinations of these approaches have targeted flexibility and usability over real-timeliness of results and efficient allocation of resources. In this paper we show how to meet both objectives with BlockMon, a network monitoring platform originally designed to work on a single node, which we extended to run distributed stream-data analytics tasks. We compare its performance against Storm and Apache S4, the state-of-the-art open-source stream-processing platforms, by implementing a phone call anomaly detection system and a Twitter trending algorithm: our enhanced BlockMon has a gain in performance of over 2.5x and 23x, respectively. Given the different nature of those applications and the performance of BlockMon as single-node network monitor [1], we expect our results to hold for a broad range of applications, making distributed BlockMon a good candidate for the convergence of network-measurement and IT-analysis platforms.
- A. di Pietro, F. Huici, N. Bonelli, B. Trammell, P. Kastovsky, T. Groleat, S. Vaton, and M. Dusi. Blockmon: Toward high-speed composable network traffic measurement. In Proceedings of the IEEE Infocom Conference (mini-conference), 2013.Google Scholar
- G. Iannaccone. Fast prototyping of network data mining applications. In Proceeding of the Passive and Active Measurement Conference, 2006.Google Scholar
- N. Bonelli, A. Di Pietro, S. Giordano, and G. Procissi. On multi-gigabit packet capturing with multi-core commodity hardware. In Proceedings of the Passive and Active Measurement Conference, 2012. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- Apache Hadoop. http://hadoop.apache.org (accessed 2012--11--10).Google Scholar
- T. Condie, N. Conway, P. Alvaro, J. Hellerstein, K. Elmeleegy, and R. Sears. Mapreduce online. In Proceedings of the USENIX NSDI Conference, 2010. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: a fault tolerant abstraction for in-memory cluster computing. In Proceedings of the USENIX NSDI conference, 2012. Google ScholarDigital Library
- Storm. http://storm-project.net (accessed 2012--11--10).Google Scholar
- L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In Proceedings of the International Conference on Data Mining Workshops, 2010. Google ScholarDigital Library
- G. Bianchi, N. d'Heureuse, and S. Niccolini. On-demand time-decaying bloom filters for telemarketer detection. Comput. Commun. Rev., 41(5):5--12, Sep. 2011. Google ScholarDigital Library
- FP7 Demons Project. http://fp7-demons.eu (accessed 2012--11--10).Google Scholar
- BlockMon. http://blockmon.github.com/blockmon (accessed 2012--11--10).Google Scholar
- The 0MQ Project. http://www.zeromq.org.Google Scholar
- The Nimbus Project. http://www.nimbusproject.org.Google Scholar
- Apache S4. http://incubator.apache.org/s4 (accessed 2012--11--10).Google Scholar
- GNIP. http://gnip.com.Google Scholar
- Kestrel Queues. https://github.com/robey/kestrel.Google Scholar
- D. Eyers, T. Freudenreich, A. Margara, S. Frischbier, P. Pietzuch, and P. Eugster. Living in the present: on-the-y information processing in scalable web architectures. In Proceedings of the ACM International Workshop on Cloud Computing Platforms, 2012. Google ScholarDigital Library
- Scribe. https://github.com/facebook/scribe.Google Scholar
- J. Kreps, N. Narkhede, and J. Rao. Kafka: A distributed messaging system for log processing. In Proceedings of the International Workshop on Networking Meets Databases, 2011.Google Scholar
- Cloud MapReduce. http://code.google.com/p/cloudmapreduce.Google Scholar
- HStreaming. http://www.hstreaming.com.Google Scholar
- Brisk. http://www.datastax.com/products/enterprise.Google Scholar
- C. Bockermann and H. Blom. Processing data streams with the rapidminer streams-plugin. In Proceedings of the RapidMiner Community Meeting and Conference, 2012.Google Scholar
- Y. Lee and Y. Lee. Toward scalable internet traffic measurement and analysis with hadoop. Comput. Commun. Rev., 43(1):5--13, Jan. 2013. Google ScholarDigital Library
Recommendations
A Survey of Distributed Stream Processing Systems for Smart City Data Analytics
SCIOT '18: Proceedings of the international conference on smart cities and internet of thingsThe widespread grow of big data and the evolution of Internet of Things (IoT) technologies enable cities to obtain valuable intelligence from a large amount of real-time produced data. In a Smart City various IoT devices generate data continuously which ...
Blockmon: a high-performance composable network traffic measurement system
SIGCOMM '12: Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communicationPassive network monitoring and data analysis, crucial to the correct operation of networks and the systems that rely on them, has become an increasingly difficult task given continued growth and diversification of the Internet. In this demo we present ...
Scaling out the performance of service monitoring applications with blockmon
PAM'13: Proceedings of the 14th international conference on Passive and Active MeasurementTo cope with real-time data analysis as the amount of data being exchanged over the network increases, an idea is to re-design algorithms originally implemented on the monitoring probe to work in a distributed manner over a stream-processing platform. ...
Comments