Streaming Social Media Data Analysis for Events Extraction and Warehousing using Hadoop and Storm: Drug Abuse Case Study

https://doi.org/10.1016/j.procs.2019.09.316Get rights and content
Under a Creative Commons license
open access

Abstract

In the age of big data, entreprises’ information systems are ingested with data generated from social media which raises the need to integrate it in their business intelligence process for better decision making. However, these new data, streaming, voluminous, unstructured and variant, bring existing data warehousing systems and integration tools to their knees which motivated us to conduct this research work.

In this paper, we propose a large scale system based on distributed storage and parallel processing to succeed social media data warehousing. In fact, we combine Storm and Hadoop for structured events extraction from social media data and their integration in the data warehouse. We take the advantage of real time analysis of streaming data offered by Storm and batch processing of large volumes of data of Hadoop which facilitated streaming social media data analysis task.

For conceptual representation, we propose a customized multidimensional model in which we add an intermediate table to connect the social media data warehouse with the enterprise data warehouse. We implement it using Oracle 12c and we fed it with events extracted from 1000 000 tweets using Pentaho data integration tool.

Keywords

Data Warehouse
Social Media Data
Twitter
Streaming Data
Storm
Hadoop
Large Scale system
Events

Cited by (0)