Abstract
Ensuring viewpoint diversity in mass media is a historical challenge and recent political events, and the ever-increased use of the Internet, have made it an increasingly critical and contentious issue. This research explores the relationship between semantic structures and viewpoint; demonstrating that the viewpoint diversity in a selection of documents can be increased by utilizing extracted semantic and sentiment features. Small portions of documents matching search terms were embedded in a semantic space using word vectors and sentiment scores. The resulting features were used to train a support vector machine to differentiate documents by viewpoint in a topically homogeneous corpus. When evaluating the top 10% most probable predictions for each viewpoint, this approach yielded a lift of between 1.26 and 2.04.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Source code is available at https://github.com/jeffharwell/viewpointdiversity.
References
Napoli, P.M.: Deconstructing the diversity principle. J. Commun. 49, 7–34 (1999)
Webster, J.G.: The Marketplace of Attention: How Audiences Take Shape in a Digital Age. The MIT Press, Cambridge (2014)
Putnam, R.D.: E pluribus unum: diversity and community in the twenty-first century the 2006 Johan Skytte prize lecture. Scand. Polit. Stud. 30, 137–174 (2007)
Haidt, J., Rosenberg, E., Hom, H.: Differentiating diversities: moral diversity is not like other kinds. J. Appl. Soc. Psychol. 33, 1–36 (2003)
Pariser, E.: The Filter Bubble: What the Internet is Hiding From You. Penguin UK (2011)
Ognyanova, K.: Intermedia agenda setting in an era of fragmentation: applications of network science in the study of mass communication. University of Southern California (2013)
McGarry, R.G.: The Subtle Slant: A Cross-Linguistic Discourse Analysis Model for Evaluating Interethnic Conflict in the Press. Parkway Publishers, Inc. (1994)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 335–336. ACM (1998)
Akinyemi, J.A., Clarke, C.L., Kolla, M.: Towards a collection-based results diversification. In: Adaptivity, Personalization and Fusion of Heterogeneous Information, pp. 202–205. Le Centre de Hautes Etudes Internationales d’Informatique Documentaire (2010)
Skoutas, D., Minack, E., Nejdl, W.: Increasing diversity in web search results. Raleigh, North Carolina, United States (2010)
Abbott, R., Ecker, B., Anand, P., Walker, M.: Internet argument corpus 2.0: an SQL schema for dialogic social media and the corpora to go with it. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 4445–4452 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR, Scottsdale, AZ (2013)
Pantel, P., Lin, D.: Discovering word senses from text. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 613–619 (2002)
Lau, J.H., Baldwin, T.: An empirical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 78–86 (2016)
Jebaseeli, A.N., Kirubakaran, E.: A survey on sentiment analysis of (product) reviews. Int. J. Comput. Appl. 47 (2012)
Hutto, C., Gilbert, E.: VADER: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media (2014)
Loria, S.: TextBlob Documentation. Release 0.16 (2020). http://textblob.readthedocs.io. Accessed 19 Apr 2022
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 136–140. IEEE (2015)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Mikolov, T., Grave, E., Bojanowski, P., et al.: Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Harwell, J., Li, Y. (2022). Classifying Documents by Viewpoint Using Word2Vec and Support Vector Machines. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-08473-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08472-0
Online ISBN: 978-3-031-08473-7
eBook Packages: Computer ScienceComputer Science (R0)