Skip to main content
Log in

Developing a supervised learning-based social media business sentiment index

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The fast-growing digital data generation leads to the emergence of the era of big data, which become particularly more valuable because approximately 70% of the collected data in the world comes from social media. Thus, the investigation of online social network services is of paramount importance. In this paper, we use the sentiment analysis, which detects attitudes and emotions toward issues of society posted in social media, to understand the actual economic situation. To this end, two steps are suggested. In the first step, after training the sentiment classifiers with several big data sources of social media datasets, we consider three types of feature sets: feature vector, sequence vector and a combination of dictionary-based feature and sequence vectors. Then, the performance of six classifiers is assessed: MaxEnt-L1, C4.5 decision tree, SVM-kernel, Ada-boost, Naïve Bayes and MaxEnt. In the second step, we collect datasets that are relevant to several economic words that the public use to explicitly express their opinions. Finally, we use a vector auto-regression analysis to confirm our hypothesis. The results show the statistically significant relationship between public sentiment and economic performance. That is, “depression” and “unemployment” lead to KOSPI. Also, it shows that the extracted keywords from the sentiment analysis, such as “price,” “year-end-tax” and “budget deficit,” cause the exchange rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Perrin A (2015) Social media usage. Pew research center, pp 52–68

  2. Statista, Number of social network users worldwide from 2010 to 2021 (in billions). https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/

  3. Jay Jacobs, CFA (2016) Social Media: Tech’s Growth Industry. https://www.globalxfunds.com/social-media-techs-growth-industry/

  4. Jin S, Lin W, Yin H, Yang S, Li A, Deng B (2015) Community structure mining in big data social media networks with MapReduce. Clust Comput 18(3):999–1010

    Article  Google Scholar 

  5. Zhang G, Xu L, Xue Y (2017) Model and forecast stock market behavior integrating investor sentiment analysis and transaction data. Clust Comput 20(1):789–803

    Article  Google Scholar 

  6. Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd International Conference on Knowledge Capture. ACM, pp 70–77

  7. Appel O, Chiclana F, Carter J (2015) Main concepts, state of the art and future research questions in sentiment analysis. Acta Polytech Hung 12(3):87–108

    Google Scholar 

  8. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  9. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Article  Google Scholar 

  10. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10. Association for Computational Linguistics, pp 79–86

  11. Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp 347-354

  12. O’Hare N, Davy M, Bermingham A, Ferguson P, Sheridan P, Gurrin C, Smeaton AF (2009) Topic-dependent sentiment analysis of financial blogs. In: Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. ACM, pp 9–16

  13. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, vol 1, no 12

  14. Wu F, Yuan Z, Huang Y (2017) Collaboratively training sentiment classifiers for multiple domains. IEEE Trans Knowl Data Eng 29(7):1370–1383

    Article  Google Scholar 

  15. Fernández AM, Esuli A, Sebastiani F (2016) Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J Artif Intell Res 55(1):131–163

    Article  MathSciNet  Google Scholar 

  16. Wang L, Niu J, Song H, Atiquzzaman M (2018) SentiRelated: a cross-domain sentiment classification algorithm for short texts through sentiment related index. J Netw Comput Appl 101:111–119

    Article  Google Scholar 

  17. Bader BW, Kegelmeyer WP, Chew PA (2011) Multilingual sentiment analysis using latent semantic indexing and machine learning. In: IEEE 11th International Conference on Data Mining Workshops, pp 45–52

  18. Manek AS, Shenoy PD, Mohan MC, Venugopal KR (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini index feature selection method and SVM classifier. World Wide Web 20(2):135–154

    Article  Google Scholar 

  19. Culnan M, McHugh P, Zubillaga J (2010) How large U.S. companies can use twitter and other social media to gain business value. MIS Q Executive 9(4):243–259

    Google Scholar 

  20. Di Gangi PM, Wasko M, Hooker RE (2010) Getting customers’ ideas to work for you: learning from dell how to succeed with online user innovation communities. MIS Q Executive 9(4):163–178

    Google Scholar 

  21. He W, Zha S, Li L (2013) Social media competitive analysis and text mining: a case study in the pizza industry. Int J Inf Manag 33(3):464–472

    Article  Google Scholar 

  22. Yang Y, Duan W, Cao Q (2013) The impact of social and conventional media on firm equity value: a sentiment analysis approach. Decis Support Syst 55(4):919–926

    Article  Google Scholar 

  23. Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecol Model 190(3):231–259

    Article  Google Scholar 

  24. Sun CJ, Yao L, Lin L, Sha XJ, Wang XL (2011) Semi-supervised biomedical relation classification using generalized expectation criteria. In: 2011 International Conference on Machine Learning and Cybernetics (ICMLC), vol 4. IEEE, pp 1949–1952

  25. Mann GS, McCallum A (2010) Generalized expectation criteria for semi-supervised learning with weakly labeled data. J Mach Learn Res 11:955–984

    MathSciNet  MATH  Google Scholar 

  26. Polat K, Güneş S (2009) A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Syst Appl 36(2):1587–1592

    Article  Google Scholar 

  27. Schapire RE (2003) The boosting approach to machine learning: an overview. In: Nonlinear estimation and classification. Springer, New York, pp 149–171

  28. Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: European Conference on Machine Learning. Springer, Berlin, pp 4–15

  29. Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  30. Levine R, Zervos S (1998) Stock markets, banks, and economic growth. Am Econ Rev 88:537–558

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2015S1A3A2046711).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, H., Lee, N., Seo, H. et al. Developing a supervised learning-based social media business sentiment index. J Supercomput 76, 3882–3897 (2020). https://doi.org/10.1007/s11227-018-02737-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-02737-x

Keywords