Social-Media Data Analysis Using Tessera Framework in the Hadoop Cluster Environment

Sarnovsky, Martin; Butka, Peter; Paulina, Jakub

doi:10.1007/978-3-319-46586-9_19

Martin Sarnovsky⁶,
Peter Butka⁶ &
Jakub Paulina⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 522))

1020 Accesses

Abstract

The presented paper describes the design and implementation of R functions for twitter feeds analysis and visualization based on a combination of analytical technologies with big data processing tools. The main idea was to utilize the Hadoop processing framework and its storage and computational capabilities in analytical tasks designed and implemented in R language. For such purposes, we decided to use the Hadoop HDFS and MapReduce v2 for storage and handling of the processing logic connected via Tessera framework to analytical functions written in R. The results of the analysis were presented as the graph visualizations. Visualizations were implemented using the Trelliscope framework for flexible visualizations of large complex data in R environment in fast and effective fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Tessera—http://tessera.io/.
2.
R project—https://www.r-project.org/.
3.
Datadr—http://tessera.io/docs-datadr/.
4.
RHIPE—http://tessera.io/docs-RHIPE/.
5.
Trelliscope—http://tessera.io/docs-trelliscope/.
6.
UrbanSensing project—http://urban-sensing.eu/.
7.
http://bokeh.pydata.org/.
8.
http://hafen.github.io/rbokeh/.
9.
http://leafletjs.com/.

References

White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media, Inc. (2009)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters, In Sixth Symposium on Operating System Design and Implementation, OSDI’04, pp. 107–113. San Francisco, CA (2004)
Google Scholar
Tan, Y.S.: Hadoop framework: impact of data organization on performance. J. Softw. Pract. Exp. (2011). ISSN: 0038-0644
Google Scholar
Vavilapalli, V.K., et. al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC’13). ACM, New York, Article 5 (2013)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). Berkeley, CA (2010)
Google Scholar
Mittal, A., Pathak, S., Bannard, T.: RHadoop: An Improved Execution Environment for Restricted MapReduce Programs (2013)
Google Scholar
Guha, S., Hafen, R., Rounds, J., Xia, J., Li, J., Xi, B., Cleveland, W.: Large complex data: divide and recombine (D&R) with RHIPE. Stat 1, 53–67 (2012)
Article Google Scholar
Hafen, R., Gosink, L., McDermott, J., Rodland, K., Kleese-Van Dam, K., Cleveland, W.S: Trelliscope: a system for detailed visualization in the deep analysis of large complex. In: Proceedings of the 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV), pp. 105–112 (2013)
Google Scholar

Download references

Acknowledgments

The work presented in this paper was supported by the KEGA project under grant No. 025TUKE-4/2015 and also by the VEGA project under grant No. 1/0493/16.

Author information

Authors and Affiliations

Department of Cybernetics and Artificial Intelligence, Faculty of Electrical Engineering and Informatics, Technical University of Kosice, Letna 9/A, 04200, Kosice, Slovakia
Martin Sarnovsky, Peter Butka & Jakub Paulina

Authors

Martin Sarnovsky
View author publications
You can also search for this author in PubMed Google Scholar
Peter Butka
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Paulina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Sarnovsky .

Editor information

Editors and Affiliations

Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Technology Department of Computer Science, Wrocław, Poland
Adam Grzech
Department of Computer Science, Faculty of Computer Science and Management, Wrocław University of Technology Department of Computer Science, Wrocław, Poland
Jerzy Świątek
Department of Management Systems, Faculty of Computer Science and Management, Wrocław University of Technology, Wrocła Department of Management Systems, Wrocław, Poland
Zofia Wilimowska
Department of IT and Management, Faculty of Computer Science and Management, Wrocław University of Technology Department of IT and Management, Wrocław, Poland
Leszek Borzemski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarnovsky, M., Butka, P., Paulina, J. (2017). Social-Media Data Analysis Using Tessera Framework in the Hadoop Cluster Environment. In: Grzech, A., Świątek, J., Wilimowska, Z., Borzemski, L. (eds) Information Systems Architecture and Technology: Proceedings of 37th International Conference on Information Systems Architecture and Technology – ISAT 2016 – Part II. Advances in Intelligent Systems and Computing, vol 522. Springer, Cham. https://doi.org/10.1007/978-3-319-46586-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-46586-9_19
Published: 24 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46585-2
Online ISBN: 978-3-319-46586-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics