Skip to main content

Advertisement

Log in

Semi-supervised Learning for Affective Common-Sense Reasoning

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Background

Big social data analysis is the area of research focusing on collecting, examining, and processing large multi-modal and multi-source datasets in order to discover patterns/correlations and extract information from the Social Web. This is usually accomplished through the use of supervised and unsupervised machine learning algorithms that learn from the available data. However, these are usually highly computationally expensive, either in the training or in the prediction phase, as they are often not able to handle current data volumes. Parallel approaches have been proposed in order to boost processing speeds, but this clearly requires technologies that support distributed computations.

Methods

Extreme learning machines (ELMs) are an emerging learning paradigm, presenting an efficient unified solution to generalized feed-forward neural networks. ELM offers significant advantages such as fast learning speed, ease of implementation, and minimal human intervention. However, ELM cannot be easily parallelized, due to the presence of a pseudo-inverse calculation. Therefore, this paper aims to find a reliable method to realize a parallel implementation of ELM that can be applied to large datasets typical of Big Data problems with the employment of the most recent technology for parallel in-memory computation, i.e., Spark, designed to efficiently deal with iterative procedures that recursively perform operations over the same data. Moreover, this paper shows how to take advantage of the most recent advances in statistical learning theory (SLT) in order to address the issue of selecting ELM hyperparameters that give the best generalization performance. This involves assessing the performance of such algorithms (i.e., resampling methods and in-sample methods) by exploiting the most recent results in SLT and adapting them to the Big Data framework. The proposed approach has been tested on two affective analogical reasoning datasets. Affective analogical reasoning can be defined as the intrinsically human capacity to interpret the cognitive and affective information associated with natural language. In particular, we employed two benchmarks, each one composed by 21,743 common-sense concepts; each concept is represented according to two models of a semantic network in which common-sense concepts are linked to a hierarchy of affective domain labels.

Results

The labeled data have been split into two sets: The first 20,000 samples have been used for building the model with the ELM with the different SLT strategies, while the rest of the labeled samples, numbering 1743, have been kept apart as reference set in order to test the performance of the learned model. The splitting process has been repeated 30 times in order to obtain statistically relevant results. We ran the experiments through the use of the Google Cloud Platform, in particular, the Google Compute Engine. We employed the Google Compute Engine Platform with NM = 4 machines with two cores and 1.8 GB of RAM (machine type n1-highcpu-2) and an HDD of 30 GB equipped with Spark. Results on the affective dataset both show the effectiveness of the proposed parallel approach and underline the most suitable SLT strategies for the specific Big Data problem.

Conclusion

In this paper we showed how to build an ELM model with a novel scalable approach and to carefully assess the performance, with the use of the most recent results from SLT, for a sentiment analysis problem. Thanks to recent technologies and methods, the computational requirements of these methods have been improved to allow for the scaling to large datasets, which are typical of Big Data applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In this paper, we deal with a frequentist approach, which derives confidence intervals for quantities of interest, but the credible intervals of the Bayesian approach can be addressed equally well in the parametric setting [30].

  2. We have exploited the property \(\sqrt{a 2b} \le \frac{a}{2} + b\) in order to remove all the constant terms which do not depend on \({\widehat{\beta }}_{\text{loo}}({\mathscr{A}}_{\mathscr{H}}, {\sqrt{n}}/{2})\).

References

  1. Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–7.

    Article  Google Scholar 

  2. Saif H, He Y, Fernandez M, Alani H. Contextual semantics for sentiment analysis of twitter. Inf Process Manag. 2016;52(1):5–19.

    Article  Google Scholar 

  3. Xia R, Xu F, Yu J, Qi Y, Cambria E. Polarity shift detection, elimination and ensemble: a three-stage model for document-level sentiment analysis. Inf Process Manag. 2016;52(1):36–45.

    Article  Google Scholar 

  4. Balahur A, Jacquet G. Sentiment analysis meets social media-challenges and solutions of the field in view of the current information sharing context. Inf Process Manag. 2015;51(4):428–32.

    Article  Google Scholar 

  5. Google. Announcing syntaxnet: the world’s most accurate parser goes open source. http://googleresearch.blogspot.it/2016/05/announcing-syntaxnet-worlds-most.html. 2016.

  6. Roy RS, Agarwal S, Ganguly N, Choudhury M. Syntactic complexity of web search queries through the lenses of language models, networks and users. Inf Process Manag. 2016;52(5):923–48.

  7. Abainia K, Ouamour S, Sayoud H. Effective language identification of forum texts based on statistical approaches. Inf Process Manag. 2016;52(4):491–512.

  8. Sun J, Wang G, Cheng X, Fu Y. Mining affective text to improve social media item recommendation. Inf Process Manag. 2015;51(4):444–57.

    Article  Google Scholar 

  9. Cambria E, Hussain A. Sentic computing: a common-sense-based framework for concept-level sentiment analysis. Switzerland: Cham; 2015.

    Book  Google Scholar 

  10. Poria S, Cambria E, Howard N, Huang G-B, Hussain A. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. 2016;174:50–9.

    Article  Google Scholar 

  11. Wang Q, Cambria E, Liu C, Hussain A. Common sense knowledge for handwritten chinese recognition. Cogn Comput. 2013;5(2):234–42.

    Article  Google Scholar 

  12. Cambria E, Hussain A, Durrani T, Havasi C, Eckl C, Munro J. Sentic computing for patient centered application. In: IEEE ICSP, Beijing; 2010. p. 1279–82.

  13. Cambria E, Gastaldo P, Bisio F, Zunino R. An ELM-based model for affective analogical reasoning. Neurocomputing. 2015;149:443–55.

    Article  Google Scholar 

  14. Cambria E, Fu J, Bisio F, Poria S. AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: AAAI, Austin; 2015. p. 508–14.

  15. Cambria E, Wang H, White B. Guest editorial: big social data analysis. Knowl Based Syst. 2014;69:1–2.

    Article  Google Scholar 

  16. Chakraborty M, Pal S, Pramanik R, Chowdary CR. Recent developments in social spam detection and combating techniques: a survey. Inf Process Manag. 2016;52(6):1053–73.

  17. Kranjc J, Smailović J, Podpečan V, Grčar M, Žnidaršič M, Lavrač N. Active learning for sentiment analysis on data streams: methodology and workflow implementation in the clowdflows platform. Inf Process Manag. 2015;51(2):187–203.

    Article  Google Scholar 

  18. Fersini E, Messina E, Pozzi FA. Expressive signals in social media languages to improve polarity detection. Inf Process Manag. 2016;52(1):20–35.

    Article  Google Scholar 

  19. Cambria E, Livingstone A, Hussain A. The hourglass of emotions. In: Esposito A, Esposito AM, Vinciarelli A, Hoffmann R, Müller CC, editors. Cognitive behavioural systems. Berlin Heidelberg: Springer; 2012. p. 144–57.

  20. Huang G-B, Wang DH, Lan Y. Extreme learning machines: a survey. Int J Mach Learn Cybern. 2011;2(2):107–22.

    Article  Google Scholar 

  21. Huang G, Song S, Gupta JND, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern. 2014;44(12):2405–17.

    Article  PubMed  Google Scholar 

  22. Cambria E, Huang G-B, et al. Extreme learning machines. IEEE Intell Syst. 2013;28(6):30–59.

    Article  Google Scholar 

  23. Huang G-B, Cambria E, Toh K-A, Widrow B, Xu Z. New trends of learning in computational intelligence. IEEE Comput Intell Mag. 2015;10(2):16–7.

    Article  Google Scholar 

  24. Chapelle O, Schölkopf B, Zien A, et al. Semi-supervised learning. Cambridge: MIT Press; 2006.

    Book  Google Scholar 

  25. Zhu X. Semi-supervised learning literature survey. Madison: University of Wisconsin; 2005.

    Google Scholar 

  26. Habernal I, Ptáček T, Steinberger J. Supervised sentiment analysis in Czech social media. Inf Process Manag. 2014;50(5):693–707.

    Article  Google Scholar 

  27. Guo Z, Zhang ZM, Xing EP, Faloutsos C. Semi-supervised learning based on semiparametric regularization, vol. 8. In: SDM, SIAM; 2008. p. 132–42.

  28. Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–434.

    Google Scholar 

  29. Draper NR, Smith H, Pownell E. Applied regression analysis. New York: Wiley; 1966.

    Google Scholar 

  30. MacKay DJC. Bayesian interpolation. Neural Comput. 1992;4(3):415–47.

    Article  Google Scholar 

  31. Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16(3):199–231.

    Article  Google Scholar 

  32. Dhar V. Data science and prediction. Commun ACM. 2013;56(12):64–73.

    Article  Google Scholar 

  33. Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw. 1999;10(5):988–99.

    Article  CAS  PubMed  Google Scholar 

  34. Wolpert DH. The lack of a priori distinctions between learning algorithms. Neural Comput. 1996;8(7):1341–90.

    Article  Google Scholar 

  35. Magdon-Ismail M. No free lunch for noise prediction. Neural Comput. 2000;12(3):547–64.

    Article  CAS  PubMed  Google Scholar 

  36. Vapnik VN. Statistical learning theory. New York: Wiley-Interscience; 1998.

  37. Valiant LG. A theory of the learnable. Commun ACM. 1984;27(11):1134–42.

    Article  Google Scholar 

  38. Bartlett PL, Boucheron S, Lugosi G. Model selection and error estimation. Mach Learn. 2002;48(1–3):85–113.

    Article  Google Scholar 

  39. Langford J. Tutorial on practical prediction theory for classification. J Mach Learn Res. 2006;6(1):273.

    Google Scholar 

  40. Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Trans Neural Netw Learn Syst. 2012;23(9):1390–406.

    Article  PubMed  Google Scholar 

  41. Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; 1995.

  42. Efron B, Tibshirani RJ. An introduction to the bootstrap. London: Chapman & Hall; 1993.

    Book  Google Scholar 

  43. Oneto L, Ghio A, Ridella S, Anguita D. Fully empirical and data-dependent stability-based bounds. IEEE Trans Cybern. 2015;45(9):1913–1926.

  44. Anguita D, Ghio A, Oneto L, Ridella S. A deep connection between the Vapnik–Chervonenkis entropy and the Rademacher complexity. IEEE Trans Neural Netw Learn Syst. 2014;25(12):2202–11.

    Article  PubMed  Google Scholar 

  45. Oneto, Ghio A, Ridella S, Anguita D. Global Rademacher complexity bounds: from slow to fast convergence rates. Neural Process Lett. 2016;43(2):567–602.

  46. Bartlett PL, Bousquet O, Mendelson S. Local Rademacher complexities. Ann Stat. 2005;33(4):1497–1537.

    Article  Google Scholar 

  47. Oneto L, Ghio A, Ridella S, Anguita D. Local Rademacher complexity: sharper risk bounds with and without unlabeled samples, Neural Netw. 2015 (in press).

  48. Lei Y, Binder A, ün Dogan U, Kloft M. Theory and algorithms for the localized setting of learning kernels. Neural Inf Process Syst. 2015;173–95. http://www.jmlr.org/proceedings/papers/v44/LeiBinDogKlo15.pdf.

  49. McAllester DA. Some PAC-Bayesian theorems. In Proceedings of the eleventh annual conference on Computational learning theory. ACM; 1998. p. 230–234.

  50. Lever G, Laviolette F, Shawe-Taylor J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoret Comput Sci. 2013;473:4–28.

    Article  Google Scholar 

  51. Germain P, Lacasse A, Laviolette F, Marchand M, Roy JF. Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm. J Mach Learn Res. 2015;16(4):787–860.

    Google Scholar 

  52. Bégin L, Germain P, Laviolette F, Roy JF. PAC-Bayesian bounds based on the rényi divergence. In: International conference on artificial intelligence and statistics; 2016.

  53. Floyd S, Warmuth M. Sample compression, learnability, and the Vapnik–Chervonenkis dimension. Mach Learn. 1995;21(3):269–304.

    Google Scholar 

  54. Langford J, McAllester DA, Computable shell decomposition bounds. In: Proceedings of the eleventh annual conference on Computational learning theory; 2000. p. 25–34.

  55. Bousquet O, Elisseeff A. Stability and generalization. J Mach Learn Res. 2002;2:499–526.

    Google Scholar 

  56. Poggio T, Rifkin R, Mukherjee S, Niyogi P. General conditions for predictivity in learning theory. Nature. 2004;428(6981):419–22.

    Article  CAS  PubMed  Google Scholar 

  57. Guyon I, Saffari A, Dror G, Cawley G. Model selection: beyond the Bayesian/frequentist divide. J Mach Learn Res. 2010;11:61–87.

    Google Scholar 

  58. Huang GB. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn Comput. 2015;7(3):263–78.

    Article  Google Scholar 

  59. Huang Z, Yu Y, Gu J, Liu H. An efficient method for traffic sign recognition based on extreme learning machine. IEEE Trans Cybern. doi:10.1109/TCYB.2016.2533424.

  60. Huang GB, Bai Z, Kasun LLC, Vong CM. Local receptive fields based extreme learning machine. IEEE Comput Intell Mag. 2015;10(2):18–29.

    Article  Google Scholar 

  61. Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern. 2012;42(2):513–29.

    Article  PubMed  Google Scholar 

  62. Bisio F, Decherchi S, Gastaldo P, Zunino R. Inductive bias for semi-supervised extreme learning machine, vol. 1. In: Proceedings of ELM-2014; 2015.

  63. Dinuzzo F, Schölkopf B. The representer theorem for hilbert spaces: a necessary and sufficient condition. In: Advances in neural information processing systems; 2012. p. 189–196.

  64. Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: International Conference on Computational Learning Theory. Springer Berlin Heidelberg; 2001. p. 416–426.

  65. Erhan D, Bengio Y, Courville A, Manzagol P-A, Vincent P, Bengio S. Why does unsupervised pre-training help deep learning? J Mach Learn Res. 2010;11:625–60.

    Google Scholar 

  66. Salakhutdinov R, Hinton G. An efficient learning procedure for deep boltzmann machines. Neural Comput. 2012;24(8):1967–2006.

    Article  PubMed  Google Scholar 

  67. Arlot S, Celisse A. A survey of cross-validation procedures for model selection. Stat Surv. 2010;4:40–79.

    Article  Google Scholar 

  68. McAllester DA. PAC-Bayesian stochastic model selection. Mach Learn. 2003;51(1):5–21.

    Article  Google Scholar 

  69. Anguita D, Ghio A, Oneto L, Ridella S. In-sample model selection for support vector machines. In: International joint conference on neural networks; 2011.

  70. Koltchinskii V. Rademacher penalties and structural risk minimization. IEEE Trans Inf Theory. 2001;47(5):1902–14.

    Article  Google Scholar 

  71. Inoue A, Kilian L. In-sample or out-of-sample tests of predictability: which one should we use? Econom Rev. 2005;23(4):371–402.

    Article  Google Scholar 

  72. Cheng F, Yu J, Xiong H. Facial expression recognition in Jaffe dataset based on Gaussian process classification. IEEE Trans Neural Netw. 2010;21(10):1685–90.

    Article  PubMed  Google Scholar 

  73. Shalev-Shwartz S, Ben-David S. Understanding machine learning: from theory to algorithms. Cambridge: Cambridge University Press; 2014.

    Book  Google Scholar 

  74. Hoeffding W. Probability inequalities for sums of bounded random variables. J Am Stat Assoc. 1963;58(301):13–30.

    Article  Google Scholar 

  75. Anguita D, Ghio A, Ridella S, Sterpi D. K-fold cross validation for error rate estimate in support vector machines. In: International conference on data mining; 2009.

  76. Vapnik VN, Kotz S. Estimation of dependences based on empirical data, vol. 41. New York: Springer; 1982.

    Google Scholar 

  77. Shawe-Taylor J, Bartlett PL, Williamson RC, Anthony M. Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory. 1998;44(5):1926–40.

    Article  Google Scholar 

  78. Boucheron S, Lugosi G, Massart P. A sharp concentration inequality with applications. Random Struct Algorithms. 2000;16(3):277–92.

    Article  Google Scholar 

  79. Boucheron S, Lugosi G, Massart P. Concentration inequalities: a nonasymptotic theory of independence. Oxford: Oxford University Press; 2013.

    Book  Google Scholar 

  80. Bartlett PL, Mendelson S. Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res. 2003;3:463–82.

    Google Scholar 

  81. Laviolette F, Marchand M. PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers. J Mach Learn Res. 2007;8(7):1461–87.

    Google Scholar 

  82. Lacasse A, Laviolette F, Marchand M, Germain P, Usunier N. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In: Advances in Neural information processing systems; 2006. p. 769–776.

  83. Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.

    Google Scholar 

  84. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  85. Schapire RE, Freund Y, Bartlett P, Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat. 1998;26(5):1651–86.

    Article  Google Scholar 

  86. Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Mach Learn. 1999;37(3):297–336.

    Article  Google Scholar 

  87. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis, vol. 2. London: Taylor & Francis; 2014.

    Google Scholar 

  88. Rakhlin A, Mukherjee S, Poggio T. Stability results in learning theory. Anal Appl. 2005;3(04):397–417.

    Article  Google Scholar 

  89. Devroye L, Györfi L, Lugosi G. A probabilistic theory of pattern recognition. Berlin: Springer; 1996.

    Book  Google Scholar 

  90. Dietrich R, Opper M, Sompolinsky H. Statistical mechanics of support vector networks. Phys Rev Lett. 1999;82(14):2975.

    Article  CAS  Google Scholar 

  91. Li M, Vitányi P. An introduction to Kolmogorov complexity and its applications. Springer-Verlag, New York: Springer Science & Business Media; 2013.

  92. Grünwald PD. The minimum description length principle. Cambridge: MIT Press; 2007.

    Google Scholar 

  93. Tikhonov AN, Arsenin VI. Solutions of ill-posed problems. New York: Vh Winston; 1977.

  94. Boyd S, Vandenberghe L. Convex optimization. Cambridge: Cambridge University Press; 2004.

    Book  Google Scholar 

  95. Serfling RJ. Probability inequalities for the sum in sampling without replacement. Ann Stat. 1974;2(1):39–48.

    Article  Google Scholar 

  96. Zhu X, Goldberg AB. Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn. 2009;3(1):1–130.

    Article  Google Scholar 

  97. Anguita D, Ghio A, Oneto L, Ridella S. In-sample model selection for trimmed hinge loss support vector machine. Neural Process Lett. 2012;36(3):275–83.

    Article  Google Scholar 

  98. Bartlett PL, Long PM, Williamson RC. Fat-shattering and the learnability of real-valued functions. In: Proceedings of the seventh annual conference on Computational learning theory. ACM; 1994. p. 299–310.

  99. Zhou D-X. The covering number in learning theory. J Complex. 2002;18(3):739–67.

    Article  Google Scholar 

  100. Massart P. Some applications of concentration inequalities to statistics. Ann Fac Sci Toulouse Math. 2000;9(2):245–303.

    Article  Google Scholar 

  101. Ivanov VV. The theory of approximate methods and their applications to the numerical solution of singular integral equations. US: Springer Science & Business Media; 1976.

  102. Pelckmans K, Suykens JA, De Moor B. Morozov. Ivanov and Tikhonov regularization based LS-SVMS. In: International Conference on Neural information processing, Springer Berlin Heidelberg; 2004. p. 1216–1222.

  103. Oneto L, Anguita D, Ghio A, Ridella S. The impact of unlabeled patterns in rademacher complexity theory for kernel classifiers. In: Advances in Neural information processing systems; 2011. p. 585–593.

  104. Anguita D, Ghio A, Oneto L, Ridella S. Unlabeled patterns to tighten rademacher complexity error bounds for kernel classifiers. Pattern Recognit Lett. 2014;37:210–9.

    Article  Google Scholar 

  105. Eckart C, Young G. The approximation of one matrix by another of lower rank. Psychometrika. 1936;1(3):211–8.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Cambria.

Ethics declarations

Conflict of Interest

Luca Oneto, Federica Bisio, Erik Cambria, and Davide Anguita declare that they have no conflict of interest.

Informed Consent

Informed consent was not required as no human or animals were involved.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oneto, L., Bisio, F., Cambria, E. et al. Semi-supervised Learning for Affective Common-Sense Reasoning. Cogn Comput 9, 18–42 (2017). https://doi.org/10.1007/s12559-016-9433-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-016-9433-5

Keywords

Navigation