Conferences >2018 IEEE 17th International ...

Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Apache Spark allows us to write a distributed version of any machine learning algorithm, which can be easily scaled up for a larger dataset on a cluster of commodity hard...Show More

Metadata

Abstract:

Apache Spark allows us to write a distributed version of any machine learning algorithm, which can be easily scaled up for a larger dataset on a cluster of commodity hardware. In this paper, we propose the hybridization of paragraph vector with distributed, parallel versions of well-known six machine learning techniques for sentiment analysis. We employed a distributed implementation of neural network language model to obtain paragraph vectors for a given corpus. On the paragraph vectors so obtained, we employed a host of distributed classification algorithms available in Apache Spark to perform sentiment classification. We considered two approaches viz. Bag-of-Words based document-term matrix (DTM) and hashing-trick based DTM as two baseline methods for comparison. We experimented with a movie review dataset of size 992 MB. Among the six classifiers employed, MLP turned out to be statistically the same as GBT and SVM, while it statistically significantly outperformed the rest of classifiers by yielding an area under of ROC curve (AUC) of 95.44%.

Published in: 2018 IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)

Date of Conference: 16-18 July 2018

Date Added to IEEE Xplore: 07 October 2018

ISBN Information:

DOI: 10.1109/ICCI-CC.2018.8482085

Conference Location: Berkeley, CA, USA