Loading [a11y]/accessibility-menu.js
Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark | IEEE Conference Publication | IEEE Xplore

Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark


Abstract:

Apache Spark allows us to write a distributed version of any machine learning algorithm, which can be easily scaled up for a larger dataset on a cluster of commodity hard...Show More

Abstract:

Apache Spark allows us to write a distributed version of any machine learning algorithm, which can be easily scaled up for a larger dataset on a cluster of commodity hardware. In this paper, we propose the hybridization of paragraph vector with distributed, parallel versions of well-known six machine learning techniques for sentiment analysis. We employed a distributed implementation of neural network language model to obtain paragraph vectors for a given corpus. On the paragraph vectors so obtained, we employed a host of distributed classification algorithms available in Apache Spark to perform sentiment classification. We considered two approaches viz. Bag-of-Words based document-term matrix (DTM) and hashing-trick based DTM as two baseline methods for comparison. We experimented with a movie review dataset of size 992 MB. Among the six classifiers employed, MLP turned out to be statistically the same as GBT and SVM, while it statistically significantly outperformed the rest of classifiers by yielding an area under of ROC curve (AUC) of 95.44%.
Date of Conference: 16-18 July 2018
Date Added to IEEE Xplore: 07 October 2018
ISBN Information:
Conference Location: Berkeley, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.