Abstract
Real-time online data processing is quickly becoming an essential tool in the analysis of social media for political trends, advertising, public health awareness programs and policy making. Traditionally, processes associated with offline analysis are productive and efficient only when the data collection is a one-time process. Currently, cutting edge research requires real-time data analysis that comes with a set of challenges, particularly the efficiency of continuous data fetching within the context of present NoSQL and relational databases. In this paper, we demonstrate a solution to effectively adsress the challenges of real-time analysis using a configurable Elasticsearch search engine. We are using a distributed database architecture, pre-build indexing and standardizing the Elasticsearch framework for large scale text mining. The results from the query engine are visulized in almost real-time.
Similar content being viewed by others
References
Cervellini, P., Menezes, A. G., & Mago, V. K. (2016). Finding trendsetters on yelp dataset. In 2016 IEEE symposium series on computational intelligence (SSCI) (pp. 1–7). IEEE.
Belyi, E., Giabbanelli, P. J., Patel, I., Balabhadrapathruni, N. H., Abdallah, A. B., Hameed, W., et al. (2016). Combining association rule mining and network analysis for pharmacosurveillance. The Journal of Supercomputing, 72(5), 2014–2034.
Kononenko, O., Baysal, O., Holmes, R., & Godfrey, M. W. (2014). Mining modern repositories with Elasticsearch. In Proceedings of the 11th working conference on mining software repositories (pp. 328–331). ACM.
Liu, Q., Kumar, S., & Mago, V. (2017). Safernet: Safe transportation routing in the era of internet of vehicles and mobile crowd sensing. In 2017 14th IEEE annual consumer communications and networking conference (CCNC) (pp. 299–304). IEEE.
Kim, M. G., & Koh, J. H. (2016). Recent research trends for geospatial information explored by twitter data. Spatial Information Research, 24(2), 65–73.
Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., & Buyya, R. (2015). Big data computing and clouds: Trends and future directions. Journal of Parallel and Distributed Computing, 79, 3–15.
Bsch, C., Hartel, P., Jonker, W., & Peter, A. (2014). A survey of provably secure searchable encryption. ACM Computing Surveys, 47(2), 18:1–18:51. https://doi.org/10.1145/2636328.
Kumar, P., Kumar, P., Zaidi, N., & Rathore, V. S. (2018). Analysis and comparative exploration of elastic search, Mongodb and Hadoop big data processing. In Soft computing: Theories and applications, (pp. 605–615). New York: Springer.
Cea, D., Nin, J., Tous, R., Torres, J., & Ayguadé, E (2014). Towards the cloudification of the social networks analytics. In Modeling decisions for artificial intelligence (pp. 192–203). New York: Springer.
Bai, J. (2013). Feasibility analysis of big log data real time search based on hbase and elasticsearch. In 2013 ninth international conference on natural computation (ICNC) (pp. 1166–1170). IEEE.
Elasticsearch-elastic.co. Retrieved April 30, 2018, from https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index.html.
Gormley, C., & Tong, Z. (2015). Elasticsearch: The definitive guide: A distributed real-time search and analytics engine. Sebastopol: O’Reilly Media, Inc.
Your Window into the Elastic Stack. Retrieved 30, 2018, from https://www.elastic.co/products/kibana.
Python Elasticsearch Client. Retrieved April 30, 2018, from https://elasticsearch-py.readthedocs.io/en/master/.
Java Elasticsearch library-Elastic. Retrieved April 30, 2018, from https://www.elastic.co/guide/en/Elasticsearch/client/java-api/6.2/index.html.
Getting Started with Logstash. Retrieved April 30, 2018, from https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html.
Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., & Ganguli, D. (2014). Druid: A real-time analytical data store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 157–168). ACM.
Burkitt, K. J., Dowling, E. G., & Branon, T. R. (2014). System and method for real-time processing, storage, indexing, and delivery of segmented video. US Patent 8,769,576.
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of big data on cloud computing: Review and open research issues. Information Systems, 47, 98–115.
Yang, H., Park, M., Cho, M., Song, M., & Kim, S. (2014). A system architecture for manufacturing process analysis based on big data and process mining techniques. In 2014 IEEE international conference on big data (pp. 1024–1029). IEEE.
Stelzer, G., Plaschkes, I., Oz-Levi, D., Alkelai, A., Olender, T., Zimmerman, S., et al. (2016). Varelect: The phenotype-based variation prioritizer of the genecards suite. BMC Genomics, 17(2), 444.
Bagnasco, S., Berzano, D., Guarise, A., Lusso, S., Masera, M., & Vallero, S. (2015). Monitoring of IAAS and scientific applications on the cloud using the elasticsearch ecosystem. In Journal of physics: Conference series (Vol. 608, p. 012016). Bristol: IOP Publishing.
Chen, D., Chen, Y., Brownlow, B. N., Kanjamala, P. P., Arredondo, C. A. G., Radspinner, B. L., et al. (2017). Real-time or near real-time persisting daily healthcare data into hdfs and elasticsearch index inside a big data platform. IEEE Transactions on Industrial Informatics, 13(2), 595–606.
Coronel, J. B., & Mock, S. (2017). Designsafe: Using elasticsearch to share and search data on a science web portal. In Proceedings of the practice and experience in advanced research computing 2017 on sustainability, success and impact (p. 25). ACM.
Acknowledgements
This research is funded by the NSERC Discovery Grant; computing resources are provided by the High Performance Computing (HPC) Lab and Department of Computer Science at Lakehead University, Canada. Authors are grateful to Gaurav Sharma for initially setting up the data collection stream, Salimur Choudhury for providing insight on the data analysis and Andrew Heppner for reviewing and editing drafts.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shah, N., Willick, D. & Mago, V. A framework for social media data analytics using Elasticsearch and Kibana. Wireless Netw 28, 1179–1187 (2022). https://doi.org/10.1007/s11276-018-01896-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11276-018-01896-2