A Service-Based System for Sentiment Analysis and Visualization of Twitter Data in Realtime

Taher, Yehia; Haque, Rafiqul; AlShaer, Mohammed; Heuvel, Willem Jan v. d.; Zeitouni, Karine; Araujo, Renata; Hacid, Mohand-Saïd; Dbouk, Mohamed

doi:10.1007/978-3-319-68136-8_24

Yehia Taher²³,
Rafiqul Haque²⁴,
Mohammed AlShaer^25,27,
Willem Jan v. d. Heuvel²⁶,
Karine Zeitouni²³,
Renata Araujo²⁶,
Mohand-Saïd Hacid²⁵ &
…
Mohamed Dbouk^27,28

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10380))

Included in the following conference series:

International Conference on Service-Oriented Computing

1038 Accesses

Abstract

The existing solutions for sentiment analysis suffer from serious shortcomings to effectively deal with Twitter data as they can merely exploit hashtags. In this demo, we present SANA: a reusable, service-based architecture for dealing with streaming data, analysing this data on the fly taking into account more comprehensive semantics of Tweets, and dynamically monitoring and visualising trends in sentiments through dasbboarding and query facilities.

You have full access to this open access chapter, Download conference paper PDF

A Context-Aware Analytics for Processing Tweets and Analysing Sentiment in Realtime (Short Paper)

A Distributed Framework for Real-Time Twitter Sentiment Analysis and Visualization

RAPID: Real-time Analytics Platform for Interactive Data Mining

Keywords

1 Motivation and Challenges

Recently, organizations have commenced to rely heavily on external data -specially Twitter data - to perform sentiment analysis in order to get a better grasp on how their enterprise, products, services and processes are perceived by customers at real-time. In particular, a vast volume of the Twitter data exhibit emotions of consumers. A realtime analysis with Twitter data results in timely decisions and interventions from the organization, such as adapting their offer to the consumer expectations. However, realtime analysis on Twitter data is enormously challenging. The most critical challenges are two-fold: (i) unlike the classic relational data, Twitter data are unstructured, whilst (ii) the velocity (speed) of data is extremely high and unpredictable. For instance, on average, more than 6000 tweets are tweeted every second. Several sentiment analytics are proposed in literature e.g., [1,2,3,4]. Unfortunately however, to the best of our knowledge, these solutions merely exploit hashtags which contain a small fragment of a tweet. In our view, this is clearly not sufficient for performing complete analysis because it lacks the ability to realize the contexts of tweets. In addition, these solutions are built-on traditional architectural paradigm. Therefore, in this paper, we propose SANA, a service-based solution for realtime sentiment analysis with the Twitter data, which takes into account the context and the content of the tweets.

2 System Overview

The multi-layered architecture of SANA consists of various components, which are briefly described in the following.

Data Collection and Ingestion Layer: This layer contains two components: a data collector and a data ingestor. The data collector is a client which binds one or many data source APIs that enable an access to remote repositories with an authentication check through their public keys. Once the connection is established, the data collector starts fetching data streams (i.e., tweets) in realtime. The data ingestor consists of two interfaces. The first interface taps data into SANA data lake which is a distributed Hadoop cluster, reside in the storage layer. The other interface opens a channel to push tweets directly to the data processing components.

Data Processing Layer: The components contained in this layer perform several tasks. The two main tasks are carried out in this layer include data analysis and visualization. Data distribution and query execution are two additional tasks performed in this layer. The analysis starts with filtering incoming Twitter data. SANA’s data filter eliminates unnecessary strings from tweets and keeps the core text required for analysis. Also, it allocates an unique identifier to each tweet. Then, the text classifier extracts and classifies positive and negative sentiments from the texts. We used the multinomial naïve-bayes classifier (a machine learning technique for supervised learning) along with Chi Square (\(\chi ^2\)) feature selection. The multinomial naïve-bayes classifier is used to train our model with labeled training datasets that are classified as positive or negative sentiment. The Chi Square (\(\chi ^2\)) function tests whether the occurrence of a specific string and the occurrence of a specific class is independent. The NER tagger extracts the contexts of classified texts. It labels the sequence of context related strings (e.g., person, location, and organization) in a tweet. After the classification is done, the data distributor sends the results to local disk, the data lake (Hadoop cluster), and the graph storage. Queries to find the comprehensive detail of the results are submitted through SANA’s query interface.

Data Storage Layer: Two different types of storage is integrated in SANA: data lake and graph storage where the results are stored. The data lake is a cluster of nodes where data blocks are distributed. SANA adopts data lake to deal with massive-scale data. The graph based storage of SANA assists to building knowledge graph of classified texts and their contexts.

Presentation Layer: SANA provides a graphical user interface (GUI) which consists of a control panel and a textbox for data visualization. The control panel provides three services. The data collection service calls and loads the data collector. The backend services call processing servers, the graph database server, the coordination server which maintains configuration information, and provides the distributed synchronization service. The query execution service calls and loads the query processor. Lastly, the visualization interface loads the data visualizer and visualizes pie chart that shows the percentage of positive and negative sentiment.

3 Demonstration

SANA is offered as a desktop-based solution and a software as a service (SaaS) on the cloud. Therefore, it provides two different user interfaces: desktop based and web based. In this paper, we describe the former. In the first step, an user starts all the servers by clicking a button called running background services provided in the user interface (We assume that these servers are installed and configured in user’s machine). This starts data acquisition server, processing servers, Haddop cluster, and graph storage server. In the next step, the user starts SANA sentiment analytics application. Upon clicking on start application button a window pops up, the user then selects the application jar file provided by SANA. Once the file is imported, the SANA realtime application starts and the tasks are performed automatically from this point until visualization. SANA’s data collector establishes a connection with the Twitter data center using an authentication API and starts fetching data (the user can view the data collection step on the screen); then it ingests the raw data into SANA’s topology which is essentially the processing logic. Figure 1A shows the topology.

The topology contains: Tweet filter, Tweet classifier, and Tweet NER which perform three tasks, filtering data, sentiment classification, and context extraction. Then, in the next step three tasks are carried out in parallel. First, the consumer sentiments are visualized in a pie chart which shows the percentages of positive and negative views on a concept/product or service which in our demonstration is a land. Figure 1B presents the results produced in every less than a second. The users will observe that sentiment analysis results are updated constantly, as classification is carried out in realtime over the incoming tweets. Second, SANA’s data distributor stores the results in data lake (Hadoop cluster), and the graph storage server. Also the results are stored in local disk. Third, the knowledge graph – consisting of extracted sentiments and their contexts – are visualized by our graph storage. Figure 1C shows the knowledge graph.

Finally, an user might be interested to perform correlated queries to extract more knowledge from the tweets. The user clicks Analysis button, a textbox appears on the screen. Then the user types queries such as, “match (n) - -> 2 with n, count (*) as rel-cnt where rel_cnt> 2 return n.Id n.text Limit 15”. This demonstration query in our demo returned 15 tweets. Each of these tweet contants more than two relations among the nodes that represent context and sentiment. Figure 1D shows the textual representation of the results of the query. We provided a video of this demonstration in here: http://cognitus-research.webs.com which gives more detail.

References

Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics, June 2011
Google Scholar
Chong, W.Y., Selvaretnam, B., Soon, L.K.: Natural language processing for sentiment analysis: an exploratory analysis on tweets. In: 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology (ICAIET), pp. 212–217, Kota Kinabalu (2014)
Google Scholar
Nasukawa, T., Yi, J.: Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd International Conference on Knowledge Capture, pp. 70–77. ACM, October 2003
Google Scholar
Neethum M.S., Rajasree, R.: Sentiment analysis in twitter using machine learning techniques. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp. 1–5, Tiruchengode (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

DAVID, Université de Versailles Saint-Quentin-en-Yvelines, Versailles, France
Yehia Taher & Karine Zeitouni
Cognitus, Paris, France
Rafiqul Haque
LIRIS, Université Claude Bernard 1, Villeurbanne, France
Mohammed AlShaer & Mohand-Saïd Hacid
ERISS, Tilburg University, Tilburg, The Netherlands
Willem Jan v. d. Heuvel & Renata Araujo
Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
Mohammed AlShaer & Mohamed Dbouk
Lebanese University, Beirut, Lebanon
Mohamed Dbouk

Authors

Yehia Taher
View author publications
You can also search for this author in PubMed Google Scholar
Rafiqul Haque
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed AlShaer
View author publications
You can also search for this author in PubMed Google Scholar
Willem Jan v. d. Heuvel
View author publications
You can also search for this author in PubMed Google Scholar
Karine Zeitouni
View author publications
You can also search for this author in PubMed Google Scholar
Renata Araujo
View author publications
You can also search for this author in PubMed Google Scholar
Mohand-Saïd Hacid
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Dbouk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yehia Taher .

Editor information

Editors and Affiliations

Université de Toulouse, Toulouse, France
Khalil Drira
Southeast University, Jiangsu, China
Hongbing Wang
Rochester Institute of Technology, Rochester, New York, USA
Qi Yu
Macquarie University, Sydney, New South Wales, Australia
Yan Wang
Concordia University, Montreal, Québec, Canada
Yuhong Yan
CNRS, Université de Lorraine, Nancy, France
François Charoy
Vienna University of Economics and Business, Vienna, Austria
Jan Mendling
IBM Research, San Jose, California, USA
Mohamed Mohamed
Harbin Institute of Technology, Harbin, China
Zhongjie Wang
University of Monastir, Monastir, Tunisia
Sami Bhiri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taher, Y. et al. (2017). A Service-Based System for Sentiment Analysis and Visualization of Twitter Data in Realtime. In: Drira, K., et al. Service-Oriented Computing – ICSOC 2016 Workshops. ICSOC 2016. Lecture Notes in Computer Science(), vol 10380. Springer, Cham. https://doi.org/10.1007/978-3-319-68136-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-68136-8_24
Published: 27 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68135-1
Online ISBN: 978-3-319-68136-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics