ABSTRACT
In this paper we propose a streaming approach for real-time processing of huge amounts of data. CATENAE is a library for easy building and execution of Python topologies (e.g., web crawler, classifier). Topologies are designed for their deployment inside Docker containers and, thus, horizontal scaling, granular resource assignment and isolation can be achieved easily. Furthermore, micromodules can have its own dependencies (including the Python version), allowing the user to limit resources such as CPU or memory by instance. We describe an implementation of a use case composed of two topologies: (1) a crawler for tracking users in social media and (2) an early risk detector of depression. We also explain how CATENAE topologies can be connected to non-Python systems.
- About Reddit. 2018. https://www.redditinc.com/. {Online; accessed April, 2018}.Google Scholar
- Aerospike. 2018. https://www.aerospike.com/. {Online; accessed April, 2018}.Google Scholar
- Apache Hadoop. 2018. https://hadoop.apache.org/. {Online; accessed April, 2018}.Google Scholar
- Apache Kafka. 2018. https://kafka.apache.org/. {Online; accessed April, 2018}.Google Scholar
- Apache Storm. 2018. https://storm.apache.org/. {Online; accessed April, 2018}.Google Scholar
- Apache Thrift. 2018. https://thrift.apache.org/. {Online; accessed April, 2018}.Google Scholar
- J. Dean and S. Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Symposium on Operating System Design and Implementation. 10--10. Google ScholarDigital Library
- Docker. 2018. http://www.docker.com/. {Online; accessed April, 2018}.Google Scholar
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. 2011. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In Proc. of the 8th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 295--308. Google ScholarDigital Library
- D. Losada and F. Crestani. 2016. A Test Collection for Research on Depression and Language Use. In Proc. of CLEF. 28--39.Google Scholar
- D. Losada, F. Crestani, and J. Parapar. 2017. eRISK 2017: CLEF Lab on Early Risk Prediction on the Internet: Experimental Foundations. In Proc. of CLEF. 346--360.Google Scholar
- R. Martínez-Castaño, J. C. Pichel, and P. Gamallo. 2018. Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis. CoRR abs/1801.03710 (2018). arXiv:1801.03710Google Scholar
- R. Martínez-Castaño, J. C. Pichel, D. E. Losada, and F. Crestani. 2018. A Micromodule Approach for Building Real-Time Systems with Python-Based Models: Application to Early Risk Detection of Depression on Social Media. In Advances in Information Retrieval. Springer International Publishing, 801--805.Google Scholar
- Reddit on Alexa. 2018. https://www.alexa.com/siteinfo/reddit.com/. {Online; accessed April, 2018}.Google Scholar
- V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proc. of the 4th Annual Symposium on Cloud Computing (SOCC). 5:1--5:16. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, and I. Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proc. of the 2nd USENIX Conf. on Hot Topics in Cloud Computing (HotCloud). 10--10. Google ScholarDigital Library
Index Terms
- Building Python-Based Topologies for Massive Processing of Social Media Data in Real Time
Recommendations
Web development with python and django (abstract only)
SIGCSE '12: Proceedings of the 43rd ACM technical symposium on Computer Science EducationMany instructors have already discovered the joy of teaching programming using the Python programming language. Now it's time to take Python to the next level. This workshop will introduce Django, an open source Python web framework that saves you time ...
Uses and gratifications of social networking sites for bridging and bonding social capital
Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...
Comments