Abstract
With the development of internet, more and more people share reviews. Efficient sentiment analysis over such reviews using deep learning techniques has become an emerging research topic, which has attracted more and more attention from the natural language processing community. However, improving performance of a deep neural network remains an open question. In this paper, we propose a sophisticated algorithm based on deep learning, fuzzy clustering and information geometry. In particular, the distribution of training samples is treated as prior knowledge and is encoded in fuzzy deep belief networks using an improved Fuzzy C-Means (FCM) clustering algorithm. We adopt information geometry to construct geodesic distance between the distributions over features for classification, improving the FCM. Based on the clustering results, we then embed the fuzzy rules learned by FCM into fuzzy deep belief networks in order to improve their performance. Finally, we evaluate our proposal using empirical data sets that are dedicated for sentiment classification. The results show that our algorithm brings out significant improvement over existing methods.



Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Shoushan L, Lee SYM, Chen Y, Huang C, Zhou G (2010) Sentiment classification and polarity shifting. In: Proceedings of the 23rd international conference on computational linguistics, pp 635–643
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2010) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Ravishankar N, Raghunathan S (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. Comput Sci 8(2):172–178
HaCohen-Kerner Y, Badash H (2016) Positive and negative sentiment words in a blog corpus written in hebrew. Procedia Comput Sci 96(50):733–743
Gao K, Su S, Wang J (2015) A sentiment analysis hybrid approach for microblogging and E-commerce corpus. In: 7th international conference on modelling, identification and control (ICMIC), pp 1–6
Bo P, Lillian L, Shivakumar V (2002) Thumbs up? Sentiment classification using machine learning techniques. Proc EMNLP-02 10(2):79–86
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Annual meeting of the association of computational linguistics, pp 417–424
Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(1):315–346
Da Silva NFF, Coletta LFS, Hruschka ER, Hruschka ER Jr (2016) Using unsupervised information to improve semi-supervised tweet sentiment classification. Inf Sci 355(1):348–365
Torresani L (2014) Weakly supervised learning. Comput Vis A Ref Guide 10(2–3):883–885
Guan Z, Chen L, Zhao W, Zheng Y, Tan S, Cai D (2016) Weakly-supervised deep learning for customer review sentiment classification. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI-16)
Hady MFA, Schwenker F (2013) Semi-supervised learning. In: Bianchini M, Maggini M, Jain L (eds) Handbook on neural information processing. intelligent systems reference library, vol 49. Springer, Berlin
Li S, Wang Z, Zhou G, Lee SYatM (2017) Semi-supervised learning for imbalanced sentiment classification. J R Stat Soc 172(2):530–530
Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(1):1527–1554
Zhou S, Chen Q, Wang X (2014) Fuzzy deep belief networks for semi-supervised sentiment classification. Neurocomputing 131(1):312–322
Zadeh LA (1965) A Fuzzy sets. Inf Control 8:338–353
Basseville M (2013) Divergence measures for statistical data processing—an annotated bibliography. Signal Process 93(4):621–633
Zhao K, Alavi A, Wiliem A, Lovell BC (2005) A novel information geometric approach to variable selection in MLP networks. Neural Netw 18(2):1309–1318
Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Zhao J (2015) Natural gradient learning algorithms for RBF networks. Neural Comput 27(2):481–505
Bezdek AC, Ehrlich R, Full W (1984) FCM: the Fuzzy C-means clustering algorithm. Comput Geosci 10(2–3):191–203
Zhuang L, Jing F, Zhu Z (2006) Movie review mining and summarization. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 43–50
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pp 1–12
Wu F, Song Y, Huang Y (2015) Microblog sentiment classification with contextual knowledge regularization. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2332–2338
Xia Y, Wang AL, Wong KF, Xu M (2008) Lyric-based song sentiment classification with sentiment vector space model. In: Annual meeting of the association of computational linguistics, pp 133–136
Mcdonald R, Hannan K, Neylon T (2007) Structured models for fine-to-coarse sentiment analysis. In: Annual meeting of the association of computational linguistics, pp 432–439
Deng Z, Luo K, Yu H (2014) A study of supervised term weighting scheme for sentiment analysis. Expert Syst Appl 41(1):3506–3513
Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: International conference on recent advances in natural language processing, pp 210–231
Tan S, Wu G, Tang AH, Cheng X (2007) A novel scheme for domain-transfer problem in the context of sentiment analysis. In: ACM conference on information & knowledge management, pp 979–982
Li S, Zong C (2008) Multi-domain sentiment classification. In: Annual meeting of the association of computational linguistics, association for computational linguistics, pp 257–260
Pan J, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: International World Wide Web Conference, ACM, pp 751–760
Biagioni R (2016) Unsupervised sentiment classification. Springer, Cham
Read J, Carroll J (2009) Weakly supervised techniques for domain-independent sentiment classification. In: Proceedings of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion, TSA’09, pp 45–52
Zhao ZW, Guan L, Chen X, He D, Cai B, Wang, Wang Q (2018) Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans Knowl Data Eng 30(1):1–23
Zhu X (2007) Semi-supervised learning literature survey. Ph.D. thesis
Goldberg AB, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of text graphs: the first workshop on graph based methods for natural language processing, association for computational linguistics, pp 45–52
Sindhwani V, Melville P (2008) Document-word co-regularization for semi-supervised sentiment analysis. In: IEEE international conference on data mining, pp 1025–1030
Zhou S, Qingcai C, Xiaolong W (2010) Active deep networks for semi-supervised sentiment classification. In: International conference on computational linguistics, poster, pp 1515–1523
Smolensky S (1986) Information processing in dynamical systems: foundations of harmony theory. Parallel Distrib Process Explor Micro Struct Cognit 1:194–281
Park K-J, Lee J-P, Lee DY (2012) Optimal design of fuzzy clustering-based fuzzy neural networks for pattern classification. Int J Grid Distrib Comput 5(3):361–831
Rubio JJ, Pacheco J (2009) An stable online clustering fuzzy neural network for nonlinear system identification. Neural Comput Appl 18(1):633–641
Anuar N, Zakaria Z (2012) Electricity load profile determination by using Fuzzy C-means and probability neural network. Energy Procedia 14(5):1861–1869
Kass RE, Vos PW (1997) Geometrical foundations of asymptotic inference. Wiley, New York
Amari S, Kawanabe M (1997) Information geometry of estimating functions in semiparametric statistical models. Bernoulli 3:29–54
Dasgupta S, Ng V (2009) Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In: Joint conference of the 47th annual meeting of the association for computational linguistics and 4th international joint conference on natural language processing of the Asian federation of natural language processing, pp 701–709
Sergey I, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Comput Sci 3(21):15–23
Frieden BR (2004) Science from Fisher information: a unification. Cambridge Univ. Press, Cambridge
Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin. ISBN:0-3879-4618-7
Nielsen F, Garcia V (2009) “Statistical exponential families: a digest with flash cards. arXiv.org:0911.4863
Nielsen F (2013) Pattern learning and recognition on statistical manifolds. Int Workshop Similarity Based Pattern Recognit 7953:1–25
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Bengio YA (2009) Learning deep architecture for AI. Found Trends Mach Learn 2:1–127
Kamvar S, Klein D, Manning C (2003) Spectral learning. In: International joint conferences on artificial intelligence. AAAI, Catalonia, pp 561–566
Xiong X, Chan KL, Tan KL (2012) Similarity-driven cluster merging method for unsupervised fuzzy clustering. In: Proceedings of the 20th international conference on uncertainty in artificial intelligence, pp 55–67
Smith LN (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. In: Applications of computer vision (WACV), 2017 IEEE winter conference on, pp 464–472. IEEE
Amari S (2001) Information geometry on hierarchy of probability distributions. IEEE Trans Inf Theory 47(5):1701–1711
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof for Theorem 1 Using Central Limit Theorems, for any distribution with a sufficiently large j, we have
Then, there exists a positive function c(j), which is decreasing with zero as the limit, such that
For large j, we deduce that \(\begin{gathered} {d_F}(({\mu _1},{\sigma _1}),({\mu _2},{\sigma _2})) \hfill \\ =\sqrt 2 \ln \{ \bigg [F(({\mu _1},{\sigma _1}),({\mu _2},{\sigma _2}))+{({\mu _1} - {\mu _2})^2}+2(\sigma _{1}^{2}+\sigma _{2}^{2})]/4{\sigma _1}{\sigma _2}\} \hfill \\ =\sqrt 2 \ln \bigg[\frac{{\sigma _{1}^{2}\sqrt {({\mu ^2}+2{\sigma ^2})({\mu ^2}+8+O(\sigma ))} +\sigma _{1}^{2}{\mu ^2}+4\sigma _{1}^{2}(1+\sigma +O({\sigma ^2}))}}{{4{\sigma _1}{\sigma _2}}}\bigg] \hfill \\ \end{gathered}\)Then,
where \(r=\sqrt {{\mu ^2}+{\sigma ^2}}\), c1 and c2 are positive constants. The results holds true.
Next, we prove the superiority of dF compared with KLD.
The symmetric form of KLD is [56]
\(KLD(({\mu _1},{\sigma _1})||({\mu _2},{\sigma _2}))=\frac{1}{2}[2\ln ({\sigma _2}/{\sigma _1})+\sigma _{1}^{2}/\sigma _{2}^{2}+{({\mu _1} - {\mu _2})^2}/\sigma _{2}^{2} - 1].\)
Then for large j, we have
Then, according to Theorem 1, with at least a probability of \(1 - \varepsilon\),
which implies that KLD(.) has lower sensitivity than dF(.).
Rights and permissions
About this article
Cite this article
Wang, M., Ning, ZH., Li, T. et al. Information geometry enhanced fuzzy deep belief networks for sentiment classification. Int. J. Mach. Learn. & Cyber. 10, 3031–3042 (2019). https://doi.org/10.1007/s13042-018-00920-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-018-00920-3