Information geometry enhanced fuzzy deep belief networks for sentiment classification

Wang, Meng; Ning, Zhen-Hu; Li, Tong; Xiao, Chuang-Bai

doi:10.1007/s13042-018-00920-3

Information geometry enhanced fuzzy deep belief networks for sentiment classification

Original Article
Published: 09 January 2019

Volume 10, pages 3031–3042, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Meng Wang¹,
Zhen-Hu Ning¹,
Tong Li¹ &
…
Chuang-Bai Xiao¹

467 Accesses
Explore all metrics

Abstract

With the development of internet, more and more people share reviews. Efficient sentiment analysis over such reviews using deep learning techniques has become an emerging research topic, which has attracted more and more attention from the natural language processing community. However, improving performance of a deep neural network remains an open question. In this paper, we propose a sophisticated algorithm based on deep learning, fuzzy clustering and information geometry. In particular, the distribution of training samples is treated as prior knowledge and is encoded in fuzzy deep belief networks using an improved Fuzzy C-Means (FCM) clustering algorithm. We adopt information geometry to construct geodesic distance between the distributions over features for classification, improving the FCM. Based on the clustering results, we then embed the fuzzy rules learned by FCM into fuzzy deep belief networks in order to improve their performance. Finally, we evaluate our proposal using empirical data sets that are dedicated for sentiment classification. The results show that our algorithm brings out significant improvement over existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization Focused on Parallel Fuzzy Deep Belief Neural Network for Opinion Mining

A Deep Learning Approach to Deal with Data Uncertainty in Sentiment Analysis

Gaussian Neuron in Deep Belief Network for Sentiment Prediction

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Shoushan L, Lee SYM, Chen Y, Huang C, Zhou G (2010) Sentiment classification and polarity shifting. In: Proceedings of the 23rd international conference on computational linguistics, pp 635–643
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2010) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Google Scholar
Ravishankar N, Raghunathan S (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. Comput Sci 8(2):172–178
Google Scholar
HaCohen-Kerner Y, Badash H (2016) Positive and negative sentiment words in a blog corpus written in hebrew. Procedia Comput Sci 96(50):733–743
Article Google Scholar
Gao K, Su S, Wang J (2015) A sentiment analysis hybrid approach for microblogging and E-commerce corpus. In: 7th international conference on modelling, identification and control (ICMIC), pp 1–6
Bo P, Lillian L, Shivakumar V (2002) Thumbs up? Sentiment classification using machine learning techniques. Proc EMNLP-02 10(2):79–86
Google Scholar
Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Annual meeting of the association of computational linguistics, pp 417–424
Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(1):315–346
Google Scholar
Da Silva NFF, Coletta LFS, Hruschka ER, Hruschka ER Jr (2016) Using unsupervised information to improve semi-supervised tweet sentiment classification. Inf Sci 355(1):348–365
Article Google Scholar
Torresani L (2014) Weakly supervised learning. Comput Vis A Ref Guide 10(2–3):883–885
Google Scholar
Guan Z, Chen L, Zhao W, Zheng Y, Tan S, Cai D (2016) Weakly-supervised deep learning for customer review sentiment classification. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI-16)
Hady MFA, Schwenker F (2013) Semi-supervised learning. In: Bianchini M, Maggini M, Jain L (eds) Handbook on neural information processing. intelligent systems reference library, vol 49. Springer, Berlin
Google Scholar
Li S, Wang Z, Zhou G, Lee SYatM (2017) Semi-supervised learning for imbalanced sentiment classification. J R Stat Soc 172(2):530–530
Google Scholar
Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(1):1527–1554
MathSciNet MATH Google Scholar
Zhou S, Chen Q, Wang X (2014) Fuzzy deep belief networks for semi-supervised sentiment classification. Neurocomputing 131(1):312–322
Article Google Scholar
Zadeh LA (1965) A Fuzzy sets. Inf Control 8:338–353
Article Google Scholar
Basseville M (2013) Divergence measures for statistical data processing—an annotated bibliography. Signal Process 93(4):621–633
Article MathSciNet Google Scholar
Zhao K, Alavi A, Wiliem A, Lovell BC (2005) A novel information geometric approach to variable selection in MLP networks. Neural Netw 18(2):1309–1318
Google Scholar
Amari SI (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Article Google Scholar
Zhao J (2015) Natural gradient learning algorithms for RBF networks. Neural Comput 27(2):481–505
Article MathSciNet Google Scholar
Bezdek AC, Ehrlich R, Full W (1984) FCM: the Fuzzy C-means clustering algorithm. Comput Geosci 10(2–3):191–203
Google Scholar
Zhuang L, Jing F, Zhu Z (2006) Movie review mining and summarization. In: Proceedings of the 15th ACM international conference on information and knowledge management, pp 43–50
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pp 1–12
Wu F, Song Y, Huang Y (2015) Microblog sentiment classification with contextual knowledge regularization. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2332–2338
Xia Y, Wang AL, Wong KF, Xu M (2008) Lyric-based song sentiment classification with sentiment vector space model. In: Annual meeting of the association of computational linguistics, pp 133–136
Mcdonald R, Hannan K, Neylon T (2007) Structured models for fine-to-coarse sentiment analysis. In: Annual meeting of the association of computational linguistics, pp 432–439
Deng Z, Luo K, Yu H (2014) A study of supervised term weighting scheme for sentiment analysis. Expert Syst Appl 41(1):3506–3513
Article Google Scholar
Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: International conference on recent advances in natural language processing, pp 210–231
Tan S, Wu G, Tang AH, Cheng X (2007) A novel scheme for domain-transfer problem in the context of sentiment analysis. In: ACM conference on information & knowledge management, pp 979–982
Li S, Zong C (2008) Multi-domain sentiment classification. In: Annual meeting of the association of computational linguistics, association for computational linguistics, pp 257–260
Pan J, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: International World Wide Web Conference, ACM, pp 751–760
Biagioni R (2016) Unsupervised sentiment classification. Springer, Cham
Book Google Scholar
Read J, Carroll J (2009) Weakly supervised techniques for domain-independent sentiment classification. In: Proceedings of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion, TSA’09, pp 45–52
Zhao ZW, Guan L, Chen X, He D, Cai B, Wang, Wang Q (2018) Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans Knowl Data Eng 30(1):1–23
Article Google Scholar
Zhu X (2007) Semi-supervised learning literature survey. Ph.D. thesis
Goldberg AB, Zhu X (2006) Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of text graphs: the first workshop on graph based methods for natural language processing, association for computational linguistics, pp 45–52
Sindhwani V, Melville P (2008) Document-word co-regularization for semi-supervised sentiment analysis. In: IEEE international conference on data mining, pp 1025–1030
Zhou S, Qingcai C, Xiaolong W (2010) Active deep networks for semi-supervised sentiment classification. In: International conference on computational linguistics, poster, pp 1515–1523
Smolensky S (1986) Information processing in dynamical systems: foundations of harmony theory. Parallel Distrib Process Explor Micro Struct Cognit 1:194–281
Google Scholar
Park K-J, Lee J-P, Lee DY (2012) Optimal design of fuzzy clustering-based fuzzy neural networks for pattern classification. Int J Grid Distrib Comput 5(3):361–831
Google Scholar
Rubio JJ, Pacheco J (2009) An stable online clustering fuzzy neural network for nonlinear system identification. Neural Comput Appl 18(1):633–641
Article Google Scholar
Anuar N, Zakaria Z (2012) Electricity load profile determination by using Fuzzy C-means and probability neural network. Energy Procedia 14(5):1861–1869
Article Google Scholar
Kass RE, Vos PW (1997) Geometrical foundations of asymptotic inference. Wiley, New York
Book Google Scholar
Amari S, Kawanabe M (1997) Information geometry of estimating functions in semiparametric statistical models. Bernoulli 3:29–54
Article MathSciNet Google Scholar
Dasgupta S, Ng V (2009) Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification. In: Joint conference of the 47th annual meeting of the association for computational linguistics and 4th international joint conference on natural language processing of the Asian federation of natural language processing, pp 701–709
Sergey I, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Comput Sci 3(21):15–23
Google Scholar
Frieden BR (2004) Science from Fisher information: a unification. Cambridge Univ. Press, Cambridge
Book Google Scholar
Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin. ISBN:0-3879-4618-7
Book Google Scholar
Nielsen F, Garcia V (2009) “Statistical exponential families: a digest with flash cards. arXiv.org:0911.4863
Nielsen F (2013) Pattern learning and recognition on statistical manifolds. Int Workshop Similarity Based Pattern Recognit 7953:1–25
Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet Google Scholar
Bengio YA (2009) Learning deep architecture for AI. Found Trends Mach Learn 2:1–127
Article Google Scholar
Kamvar S, Klein D, Manning C (2003) Spectral learning. In: International joint conferences on artificial intelligence. AAAI, Catalonia, pp 561–566
Google Scholar
Xiong X, Chan KL, Tan KL (2012) Similarity-driven cluster merging method for unsupervised fuzzy clustering. In: Proceedings of the 20th international conference on uncertainty in artificial intelligence, pp 55–67
Smith LN (2017) Corpus based sentiment classification of tamil movie tweets using syntactic patterns. In: Applications of computer vision (WACV), 2017 IEEE winter conference on, pp 464–472. IEEE
Amari S (2001) Information geometry on hierarchy of probability distributions. IEEE Trans Inf Theory 47(5):1701–1711
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, People’s Republic of China
Meng Wang, Zhen-Hu Ning, Tong Li & Chuang-Bai Xiao

Authors

Meng Wang
View author publications
You can also search for this author inPubMed Google Scholar
Zhen-Hu Ning
View author publications
You can also search for this author inPubMed Google Scholar
Tong Li
View author publications
You can also search for this author inPubMed Google Scholar
Chuang-Bai Xiao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhen-Hu Ning.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof for Theorem 1 Using Central Limit Theorems, for any distribution with a sufficiently large j, we have

$$({\mu _2} - {\mu _1})/{\sigma _1}\sim N(0,\,1/j)$$

$${({\sigma _2})^2}/{({\sigma _1})^2}\sim N(1,\,1/(j - 1))$$

Then, there exists a positive function c(j), which is decreasing with zero as the limit, such that

$$p\{ |\mu | \leq c(j)\}>1 - \varepsilon$$

$$p\{ |\sigma | \leq c(j)\}>1 - \varepsilon$$

For large j, we deduce that $\begin{gathered} {d_F}(({\mu _1},{\sigma _1}),({\mu _2},{\sigma _2})) \hfill \\ =\sqrt 2 \ln \{ \bigg [F(({\mu _1},{\sigma _1}),({\mu _2},{\sigma _2}))+{({\mu _1} - {\mu _2})^2}+2(\sigma _{1}^{2}+\sigma _{2}^{2})]/4{\sigma _1}{\sigma _2}\} \hfill \\ =\sqrt 2 \ln \bigg[\frac{{\sigma _{1}^{2}\sqrt {({\mu ^2}+2{\sigma ^2})({\mu ^2}+8+O(\sigma ))} +\sigma _{1}^{2}{\mu ^2}+4\sigma _{1}^{2}(1+\sigma +O({\sigma ^2}))}}{{4{\sigma _1}{\sigma _2}}}\bigg] \hfill \\ \end{gathered}$Then,

$$\sqrt 2 \ln [{c_1}(r+o(r))+1] \leq {d_F}(({\mu _1},{\sigma _1}),({\mu _2},{\sigma _2})) \leq \sqrt 2 \ln [{c_2}(r+o(r))+1]$$

where $r=\sqrt {{\mu ^2}+{\sigma ^2}}$, c₁ and c₂ are positive constants. The results holds true.

Next, we prove the superiority of d_F compared with KLD.

The symmetric form of KLD is [56]

$KLD(({\mu _1},{\sigma _1})||({\mu _2},{\sigma _2}))=\frac{1}{2}[2\ln ({\sigma _2}/{\sigma _1})+\sigma _{1}^{2}/\sigma _{2}^{2}+{({\mu _1} - {\mu _2})^2}/\sigma _{2}^{2} - 1].$

Then for large j, we have

$$KLD(({\mu _1},{\sigma _1})||({\mu _2},{\sigma _2})) \leq o(\sqrt {{\mu ^2}+{\sigma ^2}} ).$$

Then, according to Theorem 1, with at least a probability of $1 - \varepsilon$,

$$\mathop {\lim }\limits_{{n ->\infty }} KLD(({\mu _1},{\sigma _1})||({\mu _2},{\sigma _2}))/\sqrt {{\mu ^2}+{\sigma ^2}} =0$$

which implies that KLD(.) has lower sensitivity than d_F(.).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Ning, ZH., Li, T. et al. Information geometry enhanced fuzzy deep belief networks for sentiment classification. Int. J. Mach. Learn. & Cyber. 10, 3031–3042 (2019). https://doi.org/10.1007/s13042-018-00920-3

Download citation

Received: 02 May 2018
Accepted: 26 December 2018
Published: 09 January 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s13042-018-00920-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information geometry enhanced fuzzy deep belief networks for sentiment classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimization Focused on Parallel Fuzzy Deep Belief Neural Network for Opinion Mining

A Deep Learning Approach to Deal with Data Uncertainty in Sentiment Analysis

Gaussian Neuron in Deep Belief Network for Sentiment Prediction

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now