Skip to main content

Concatenating or Averaging? Hybrid Sentences Representations for Sentiment Analysis

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11314))

Abstract

Performances in sentiment analysis - the crucial task of automatically classifying the huge amount of users’ opinions generated online - heavily rely on the representation used to transform words or sentences into numbers. In the field of machine learning for sentiment analysis the most common embedding is the bag of words (BOW) model, which works well in practice but which is essentially a lexical conversion. Another well-known method is the Word2vec approach which, instead, attempts to capture the meaning of the terms. Given the complementarity of the information encoded in the two models, the knowledge offered by Word2vec can be helpful to enrich the information comprised in the BOW scheme. Based on this assumption we designed and tested four hybrid sentence representations which combine the two former approaches. Experiments performed on publicly available datasets confirm the effectiveness of the hybrid embeddings which led to a stable increase in the performances across different sentiment analysis domains.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://jmcauley.ucsd.edu/data/amazon/.

  2. 2.

    https://code.google.com/archive/p/word2vec/.

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  2. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011). https://doi.org/10.1016/J.JOCS.2010.12.007

    Article  Google Scholar 

  3. Collobert, R., Weston, J.: A unified architecture for natural language processing. In: ICML 2008, pp. 160–167. ACM Press (2008). https://doi.org/10.1145/1390156.1390177

  4. Enríquez, F., Troyano, J.A., López-Solaz, T.: An approach to the use of word embeddings in an opinion classification task. Expert Syst. Appl. 66, 1–6 (2016). https://doi.org/10.1016/j.eswa.2016.09.005

    Article  Google Scholar 

  5. Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169–2188 (2009). https://doi.org/10.1002/asi.21149

    Article  Google Scholar 

  6. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and Word2vec for text classification with semantic features. In: 2015 ICCI*CC, pp. 136–140. IEEE, July 2015. https://doi.org/10.1109/ICCI-CC.2015.7259377

  7. Liu, B.: Sentiment Analysis. Cambridge University Press, Cambridge (2015). https://doi.org/10.1017/CBO9781139084789

    Book  Google Scholar 

  8. Manning, C.D., Raghavan, P., Schutze, H.: Scoring, term weighting, and the vector space model. In: Introduction to Information Retrieval, pp. 100–123. Cambridge University Press (2008). https://doi.org/10.1017/cbo9780511809071.007

  9. Mäntylä, M.V., Graziotin, D., Kuutila, M.: The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Comput. Sci. Rev. 27, 16–32 (2018). https://doi.org/10.1016/J.COSREV.2017.10.002

    Article  Google Scholar 

  10. McAuley, J., Pandey, R., Leskovec, J.: Inferring networks of substitutable and complementary products. In: ACM SIGKDD 2015, pp. 785–794. ACM Press, New York (2015). https://doi.org/10.1145/2783258.2783381

  11. McAuley, J., Targett, C., Shi, Q., van den Hengel, A.: Image-based recommendations on styles and substitutes. In: SIGIR 2015, pp. 43–52. ACM Press, New York (2015). https://doi.org/10.1145/2766462.2767755

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781

  14. Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL HLT 2013, pp. 746–751 (2013). http://www.aclweb.org/anthology/N13-1090

  15. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis, vol. 2. Now Publishers, Inc., Delft (2008). https://doi.org/10.1561/1500000011

    Book  Google Scholar 

  16. Piryani, R., Madhavi, D., Singh, V.: Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Inf. Process. Manag. 53(1), 122–150 (2017). https://doi.org/10.1016/J.IPM.2016.07.001

    Article  Google Scholar 

  17. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013, pp. 1631–1642. ACL (2013)

    Google Scholar 

  18. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Election forecasts with Twitter. Soc. Sci. Comput. Rev. 29(4), 402–418 (2010). https://doi.org/10.1177/0894439310386557

    Article  Google Scholar 

  19. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier Inc., Amsterdam (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Volpetti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Orsenigo, C., Vercellis, C., Volpetti, C. (2018). Concatenating or Averaging? Hybrid Sentences Representations for Sentiment Analysis. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11314. Springer, Cham. https://doi.org/10.1007/978-3-030-03493-1_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03493-1_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03492-4

  • Online ISBN: 978-3-030-03493-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics