skip to main content
10.1145/3274192.3274239acmotherconferencesArticle/Chapter ViewAbstractPublication PagesihcConference Proceedingsconference-collections
short-paper

Handling Out-of-Vocabulary Words in Lexicons to Polarity Classification

Published: 22 October 2018 Publication History

Abstract

Emotions play an important role in the area of Human-Computer Interaction (HCI). Sentiment Analysis (SA) aims to detect these emotions in text and, some SA tasks use lexicons to infer valence polarity from a text. Moreover, attributes extracted from lexicons such as Wordnet and LIWC have widespread use in AS tasks. However, one of the major challenges in using these lexicons is the absence of words in the vocabulary given that these words may contain valuable information for the SA task and therefore cannot be discarded. This paper proposes a new algorithm, named IKLex, to infer features to out-of-vocabulary words of LIWC lexicons using word embeddings. The experiments carried out with IKLex present promising results when applying the state-of-art classifiers of the polarity classification task in two datasets with different languages: Brazilian Portuguese and English. There was an improvement of at least 1% in the F1 score of the evaluated classifiers.

References

[1]
Jehad Ali, Rehanullah Khan, Nasir Ahmad, and Imran Maqsood. 2012. Random forests and decision trees. IJCSI International Journal of Computer Science Issues 9, 5 (2012), 272--278.
[2]
Hanen Ameur, Salma Jamoussi, and Abdelmajid Ben Hamadou. 2017. Sentiment lexicon enrichment using emotional vector representation. In Computer Systems and Applications (AICCSA), 2017 IEEE/ACS 14th International Conference on. IEEE, 951--958.
[3]
Pedro P Balage Filho, Thiago Alexandre Salgueiro Pardo, and Sandra M Aluísio. 2013. An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.
[4]
Alexandra Balahur, Rada Mihalcea, and Andrés Montoyo. 2014. Computational approaches to subjectivity and sentiment analysis: Present and envisaged methods and applications.
[5]
Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. 2013. How noisy social media text, how diffrnt social media sources?. In Proceedings of the Sixth International Joint Conference on Natural Language Processing. 356--364.
[6]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
[7]
Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Discovering coherent topics using general knowledge. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 209--218.
[8]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.
[9]
Aline Rode dos Santos, Karin Becker, and Viviane Pereira Moreira. 2014. Mineração de Emoções em Textos Multilíngues usando um Corpus Paralelo. In SBBD. 127--136.
[10]
Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Commun. ACM 56, 4 (2013), 82--89.
[11]
George Forman. 2007. Feature selection for text classification. Computational methods of feature selection 1944355797 (2007).
[12]
D Gîfu. 2007. Utilization of technology for linguistic processing in an electoral context: Method LIWC-2007. In Proceedings of the Communication, Context, Interdisciplinary Congress, Vol. 1. 87--98.
[13]
CJ Hutto Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14). Available at (20/04/16) http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
[14]
Amit Gupte, Sourabh Joshi, Pratik Gadgul, Akshay Kadam, and A Gupte. 2014. Comparative study of classification algorithms used in sentiment analysis. International Journal of Computer Science and Information Technologies 5, 5 (2014), 6261--6264.
[15]
K. Hulliyah, N. S. A. A. Bakar, and A. R. Ismail. 2017. Emotion recognition and brain mapping for sentiment analysis: A review. In 2017 Second International Conference on Informatics and Computing (ICIC). 1--5.
[16]
Hamidreza Keshavarz and Mohammad Saniee Abadeh. 2017. ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs. Knowledge-Based Systems 122 (2017), 1--16.
[17]
Svetlana Kiritchenko, Xiaodan Zhu, and Saif M Mohammad. 2014. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50 (2014), 723--762.
[18]
Efthymios Kouloumpis, Theresa Wilson, and Johanna D Moore. 2011. Twitter sentiment analysis: The good the bad and the omg! Icwsm 11, 538--541 (2011), 164.
[19]
David D Lewis. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. In European conference on machine learning. Springer, 4--15.
[20]
Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, Vol. 752. Citeseer, 41--48.
[21]
Todor Mihaylov and Preslav Nakov. 2016. SemanticZ at SemEval-2016 Task 3: Ranking relevant answers in community question answering using semantic similarity based on fine-tuned word embeddings. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). 879--886.
[22]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[23]
Thien Hai Nguyen, Kiyoaki Shirai, and Julien Velcin. 2015. Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications 42, 24 (2015), 9603--9611.
[24]
Abubakr H Ombabi, Onsa Lazzez, Wael Ouarda, and Adel M Alimi. 2017. Deep learning framework based on Word2Vec and CNNfor users interests classification. In Computer Science and Information Technology (SCCSIT), 2017 Sudan Conference on. IEEE, 1--7.
[25]
Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. How many trees in a random forest?. In International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, 154--168.
[26]
Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the ACL.
[27]
Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2, 1--2 (2008), 1--135.
[28]
SoHyun Park, Afsaneh Fazly, Annie Lee, Brandon Seibel, Wenjie Zi, and Paul Cook. 2016. Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus. In LREC.
[29]
James W Pennebaker, Matthias R Mehl, and Kate G Niederhoffer. 2003. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology 54, 1 (2003), 547--577.
[30]
Hassan Saif, Yulan He, and Harith Alani. 2012. Alleviating data sparsity for twitter sentiment analysis. CEUR Workshop Proceedings (CEUR-WS. org).
[31]
Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, Heidelberg.
[32]
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 347--354.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IHC '18: Proceedings of the 17th Brazilian Symposium on Human Factors in Computing Systems
October 2018
488 pages
ISBN:9781450366014
DOI:10.1145/3274192
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LIWC
  2. OOV
  3. lexicon
  4. out-of-vocabulary
  5. word embeddings

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

IHC 2018

Acceptance Rates

IHC '18 Paper Acceptance Rate 42 of 166 submissions, 25%;
Overall Acceptance Rate 331 of 973 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 81
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media