short-paper

Handling Out-of-Vocabulary Words in Lexicons to Polarity Classification

Authors:

Gabriel Nascimento,

Fellipe Duarte,

Gustavo Paiva GuedesAuthors Info & Claims

IHC '18: Proceedings of the 17th Brazilian Symposium on Human Factors in Computing Systems

Article No.: 47, Pages 1 - 5

https://doi.org/10.1145/3274192.3274239

Published: 22 October 2018 Publication History

Abstract

Emotions play an important role in the area of Human-Computer Interaction (HCI). Sentiment Analysis (SA) aims to detect these emotions in text and, some SA tasks use lexicons to infer valence polarity from a text. Moreover, attributes extracted from lexicons such as Wordnet and LIWC have widespread use in AS tasks. However, one of the major challenges in using these lexicons is the absence of words in the vocabulary given that these words may contain valuable information for the SA task and therefore cannot be discarded. This paper proposes a new algorithm, named IKLex, to infer features to out-of-vocabulary words of LIWC lexicons using word embeddings. The experiments carried out with IKLex present promising results when applying the state-of-art classifiers of the polarity classification task in two datasets with different languages: Brazilian Portuguese and English. There was an improvement of at least 1% in the F1 score of the evaluated classifiers.

References

[1]

Jehad Ali, Rehanullah Khan, Nasir Ahmad, and Imran Maqsood. 2012. Random forests and decision trees. IJCSI International Journal of Computer Science Issues 9, 5 (2012), 272--278.

[2]

Hanen Ameur, Salma Jamoussi, and Abdelmajid Ben Hamadou. 2017. Sentiment lexicon enrichment using emotional vector representation. In Computer Systems and Applications (AICCSA), 2017 IEEE/ACS 14th International Conference on. IEEE, 951--958.

[3]

Pedro P Balage Filho, Thiago Alexandre Salgueiro Pardo, and Sandra M Aluísio. 2013. An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.

[4]

Alexandra Balahur, Rada Mihalcea, and Andrés Montoyo. 2014. Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications.

Digital Library

[5]

Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. 2013. How noisy social media text, how diffrnt social media sources?. In Proceedings of the Sixth International Joint Conference on Natural Language Processing. 356--364.

[6]

Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.

Digital Library

[7]

Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Discovering coherent topics using general knowledge. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 209--218.

Digital Library

[8]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.

Digital Library

[9]

Aline Rode dos Santos, Karin Becker, and Viviane Pereira Moreira. 2014. Mineração de Emoções em Textos Multilíngues usando um Corpus Paralelo. In SBBD. 127--136.

[10]

Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Commun. ACM 56, 4 (2013), 82--89.

Digital Library

[11]

George Forman. 2007. Feature selection for text classification. Computational methods of feature selection 1944355797 (2007).

[12]

D Gîfu. 2007. Utilization of technology for linguistic processing in an electoral context: Method LIWC-2007. In Proceedings of the Communication, Context, Interdisciplinary Congress, Vol. 1. 87--98.

[13]

CJ Hutto Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14). Available at (20/04/16) http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

[14]

Amit Gupte, Sourabh Joshi, Pratik Gadgul, Akshay Kadam, and A Gupte. 2014. Comparative study of classification algorithms used in sentiment analysis. International Journal of Computer Science and Information Technologies 5, 5 (2014), 6261--6264.

[15]

K. Hulliyah, N. S. A. A. Bakar, and A. R. Ismail. 2017. Emotion recognition and brain mapping for sentiment analysis: A review. In 2017 Second International Conference on Informatics and Computing (ICIC). 1--5.

[16]

Hamidreza Keshavarz and Mohammad Saniee Abadeh. 2017. ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs. Knowledge-Based Systems 122 (2017), 1--16.

Digital Library

[17]

Svetlana Kiritchenko, Xiaodan Zhu, and Saif M Mohammad. 2014. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50 (2014), 723--762.

Digital Library

[18]

Efthymios Kouloumpis, Theresa Wilson, and Johanna D Moore. 2011. Twitter sentiment analysis: The good the bad and the omg! Icwsm 11, 538--541 (2011), 164.

[19]

David D Lewis. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. In European conference on machine learning. Springer, 4--15.

Digital Library

[20]

Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, Vol. 752. Citeseer, 41--48.

[21]

Todor Mihaylov and Preslav Nakov. 2016. SemanticZ at SemEval-2016 Task 3: Ranking relevant answers in community question answering using semantic similarity based on fine-tuned word embeddings. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 879--886.

[22]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[23]

Thien Hai Nguyen, Kiyoaki Shirai, and Julien Velcin. 2015. Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications 42, 24 (2015), 9603--9611.

Digital Library

[24]

Abubakr H Ombabi, Onsa Lazzez, Wael Ouarda, and Adel M Alimi. 2017. Deep learning framework based on Word2Vec and CNNfor users interests classification. In Computer Science and Information Technology (SCCSIT), 2017 Sudan Conference on. IEEE, 1--7.

[25]

Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. How many trees in a random forest?. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, 154--168.

Digital Library

[26]

Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the ACL.

Digital Library

[27]

Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2, 1--2 (2008), 1--135.

Digital Library

[28]

SoHyun Park, Afsaneh Fazly, Annie Lee, Brandon Seibel, Wenjie Zi, and Paul Cook. 2016. Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus. In LREC.

[29]

James W Pennebaker, Matthias R Mehl, and Kate G Niederhoffer. 2003. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology 54, 1 (2003), 547--577.

[30]

Hassan Saif, Yulan He, and Harith Alani. 2012. Alleviating data sparsity for twitter sentiment analysis. CEUR Workshop Proceedings (CEUR-WS. org).

[31]

Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, Heidelberg.

[32]

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 347--354.

Digital Library

Index Terms

Handling Out-of-Vocabulary Words in Lexicons to Polarity Classification

Recommendations

Multi class-based n-gram language model for new words using web data
ROCOM'11/MUSP'11: Proceedings of the 11th WSEAS international conference on robotics, control and manufacturing technology, and 11th WSEAS international conference on Multimedia systems & signal processing

Out-of-vocabulary (OOV) words cause a serious problem for automatic speech recognition (ASR) system. Not only it will be miss-recognized as an in-vocabulary word with similar phonetics, but the error will also affect nearby words to make errors. ...
An OOV-Aware Curation Process for Psycholinguistic Analysis of Social Media Text - A Hybrid Approach
Designing for Digital Transformation. Co-Creating Services with Citizens and Industry
Abstract
Massive user generated social media text posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personality. Traditional off-the-shelf NLP (Natural Language Processing) tools ...
The viability of web-derived polarity lexicons
HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

We examine the viability of building large polarity lexicons semi-automatically from the web. We begin by describing a graph propagation framework inspired by previous work on constructing polarity lexicons from lexical graphs (Kim and Hovy, 2004; Hu ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IHC '18: Proceedings of the 17th Brazilian Symposium on Human Factors in Computing Systems

October 2018

488 pages

ISBN:9781450366014

DOI:10.1145/3274192

Conference Chairs:
Marcelle Mota
Universidade Federal do Pará (UFPA)
,
Bianchi Serique
Universidade Federal do Pará (UFPA)
,
Program Chairs:
Raquel O. Prates
Universidade Federal de Minas Gerais (UFMG)
,
Heloísa Candello
IBM Research - Brazil

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SBC: Brazilian Computer Society
SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

IHC 2018

IHC 2018: 17th Brazilian Symposium on Human Factors in Computing Systems

October 22 - 26, 2018

Belém, Brazil

Acceptance Rates

IHC '18 Paper Acceptance Rate 42 of 166 submissions, 25%;

Overall Acceptance Rate 331 of 973 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
81
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten