Abstract
The presented paper describes the design and implementation of R functions for twitter feeds analysis and visualization based on a combination of analytical technologies with big data processing tools. The main idea was to utilize the Hadoop processing framework and its storage and computational capabilities in analytical tasks designed and implemented in R language. For such purposes, we decided to use the Hadoop HDFS and MapReduce v2 for storage and handling of the processing logic connected via Tessera framework to analytical functions written in R. The results of the analysis were presented as the graph visualizations. Visualizations were implemented using the Trelliscope framework for flexible visualizations of large complex data in R environment in fast and effective fashion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Tessera—http://tessera.io/.
- 2.
R project—https://www.r-project.org/.
- 3.
Datadr—http://tessera.io/docs-datadr/.
- 4.
- 5.
Trelliscope—http://tessera.io/docs-trelliscope/.
- 6.
UrbanSensing project—http://urban-sensing.eu/.
- 7.
- 8.
- 9.
References
White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media, Inc. (2009)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters, In Sixth Symposium on Operating System Design and Implementation, OSDI’04, pp. 107–113. San Francisco, CA (2004)
Tan, Y.S.: Hadoop framework: impact of data organization on performance. J. Softw. Pract. Exp. (2011). ISSN: 0038-0644
Vavilapalli, V.K., et. al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC’13). ACM, New York, Article 5 (2013)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). Berkeley, CA (2010)
Mittal, A., Pathak, S., Bannard, T.: RHadoop: An Improved Execution Environment for Restricted MapReduce Programs (2013)
Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., Cleveland, W.: Large complex data: divide and recombine (D&R) with RHIPE. Stat 1, 53–67 (2012)
Hafen, R., Gosink, L., McDermott, J., Rodland, K., Kleese-Van Dam, K., Cleveland, W.S: Trelliscope: a system for detailed visualization in the deep analysis of large complex. In: Proceedings of the 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV), pp. 105–112 (2013)
Acknowledgments
The work presented in this paper was supported by the KEGA project under grant No. 025TUKE-4/2015 and also by the VEGA project under grant No. 1/0493/16.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sarnovsky, M., Butka, P., Paulina, J. (2017). Social-Media Data Analysis Using Tessera Framework in the Hadoop Cluster Environment. In: Grzech, A., Świątek, J., Wilimowska, Z., Borzemski, L. (eds) Information Systems Architecture and Technology: Proceedings of 37th International Conference on Information Systems Architecture and Technology – ISAT 2016 – Part II. Advances in Intelligent Systems and Computing, vol 522. Springer, Cham. https://doi.org/10.1007/978-3-319-46586-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-46586-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46585-2
Online ISBN: 978-3-319-46586-9
eBook Packages: EngineeringEngineering (R0)