research-article

Semantically readable distributed representation learning for social media mining

Authors:

Koichiro Yoshino,

Satoshi NakamuraAuthors Info & Claims

WI '17: Proceedings of the International Conference on Web Intelligence

Pages 716 - 722

https://doi.org/10.1145/3106426.3106521

Published: 23 August 2017 Publication History

Abstract

The problem with distributed representations generated by neural networks is that the meaning of the features is difficult to understand. We propose a new method that gives a specific meaning to each node of a hidden layer by introducing a manually created word semantic vector dictionary into the initial weights and by using paragraph vector models. Our experimental results demonstrated that weights obtained based on learning and weights based on the dictionary are more strongly correlated in a closed test and more weakly correlated in an open test, compared with the results of a control test. Additionally, we found that the learned vector are better than the performance of the existing paragraph vector in the evaluation of the sentiment analysis task. Finally, we determined the readability of document embedding in a user test. The definition of readability in this paper is that people can understand the meaning of large weighted features of distributed representations. A total of 52.4% of the top five weighted hidden nodes were related to tweets where one of the paragraph vector models learned the document embedding. Because each hidden node maintains a specific meaning, the proposed method succeeds in improving readability.

References

[1]

Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. 86--90.

Digital Library

[2]

Blei, David M. and Ng, Andrew Y. and Jordan, Michael I. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (2003), 993--1022.

Digital Library

[3]

Danushka Bollegala, Alsuhaibani Mohammed, Takanori Maehara, and Ken-ichi Kawarabayashi. 2016. Joint Word Representation Learning Using a Corpus and a Semantic Lexicon. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 2690--2696.

Digital Library

[4]

Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting Word Vectors to Semantic Lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1606--1615.

[5]

Ikuo Keshi, Hiroshi Ikeuchi, and Ken'ichi Kuromusha. 1996. Associative image retrieval using knowledge in encyclopedia text. Systems and Computers in Japan 27, 12 (1996), 53--62.

[6]

Ikuo Keshi, Yu Suzuki, Koichiro Yoshino, Graham Neubig, Kazuto Ohara, Toshiro Mukai, and Satoshi Nakamura. 2017. Reputation Information Extraction from Twitter Using a Word Semantic Vector Dictionary. IEICE TRANSACTIONS on Information and Systems (Japanese Edition) J100-D, 4 (2017), 530--543.

[7]

Jey Han Lau and Timothy Baldwin. 2016. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. In Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, 78--86.

[8]

Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31th International Conference on Machine Learning. 1188--1196.

Digital Library

[9]

Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, and Rui Fang. 2016. Tweet Topic Classification Using Distributed Language Representations. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI). 81--88.

[10]

Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical Word Embeddings. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2418--2424.

Digital Library

[11]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).

[12]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013).

[13]

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association of Computational Linguistics : Human Language Technologies. 746--751.

[14]

George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM 38, 11 (1995), 39--41.

Digital Library

[15]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Vol. 14. 1532--1543.

[16]

Xin Rong. 2014. word2vec Parameter Learning Explained. CoRR abs/1411.2738 (2014).

[17]

S. Rosenthal, P. Nakov, S. Kiritchenko, S. M Mohammad, A. Ritter, and V. Stoyanov. 2015. SemEval-2015 Task 10: Sentiment analysis in Twitter, In Proceedings of the 9th International Workshop on Semantic Evaluation. SemEval-2015, 451--463.

[18]

J. Snoek, H. Larochelle, and R. P. Adams. 2012. Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems (2012), 2951--2959.

Digital Library

[19]

S.Suzuki. 2003. Probabilistic Word Vector and Similarity based on Dictionaries. In Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics. 564--574.

Digital Library

[20]

Soroush Vosoughi, Prashanth Vijayaraghavan, and Deb Roy. 2016. Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1041--1044.

Digital Library

[21]

Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, and Tie-Yan Liu. 2014. RC-NET: A General Framework for Incorporating Knowledge into Word Representations. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 1219--1228.

Digital Library

Cited By

Razova EKotelnikov E(2020)Concentration Areas of Sentiment Lexica in the Word Embedding SpaceInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.201904010413:2(48-62)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.4018/IJCINI.2019040104
Razova EKotelnikov E(2020)The Layout of Sentiment Lexica in Word Vector SpaceInnovations, Algorithms, and Applications in Cognitive Informatics and Natural Intelligence10.4018/978-1-7998-3038-2.ch015(321-338)Online publication date: 2020
https://doi.org/10.4018/978-1-7998-3038-2.ch015
Xie YZhou BOu Y(2019)A Method based on Sentence Embeddings for the Sub-Topics DetectionJournal of Physics: Conference Series10.1088/1742-6596/1168/5/0520041168(052004)Online publication date: 12-Mar-2019
https://doi.org/10.1088/1742-6596/1168/5/052004
Show More Cited By

Semantically readable distributed representation learning for social media mining
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Learning distributed word representation with multi-contextual mixed embedding

Learning distributed word representations has been a popular method for various natural language processing applications such as word analogy and similarity, document classification and sentiment analysis. However, most existing word embedding models ...
Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, ...
Word2vec’s Distributed Word Representation for Hindi Word Sense Disambiguation
Distributed Computing and Internet Technology
Abstract
Word Sense Disambiguation (WSD) is the task of extracting an appropriate sense of an ambiguous word in a sentence. WSD is an essential task for language processing, as it is a pre-requisite for determining the closest interpretations of various ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WI '17: Proceedings of the International Conference on Web Intelligence

August 2017

1284 pages

ISBN:9781450349512

DOI:10.1145/3106426

Conference Chair:
Amit Sheth
Wright State University GuoROLE@GENERAL CHAIR
,
General Chairs:
Axel Ngonga
Leipzig University, Germany
,
yin Wang
Chongqing University of Posts and Telecommunications, China
,
Elizabeth Chang
The University of New South Wales, Australia
,
Dominik Ślęzak
Infobright Inc. & University of Warsaw, Poland
,
Bogdan Franczyk
Leipzig University, Germany
,
Program Chairs:
Rainer Alt
Leipzig University, Germany
,
Xiaohui Tao
University of Southern Queensland, Australia

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
TCII: IEEE Computer Society Technical Committee on Intelligent Informatics
Web Intelligence Consortium

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CREST JST
Sharp Corporation

Conference

WI '17

Sponsor:

SIGAI
TCII

WI '17: International Conference on Web Intelligence 2017

August 23 - 26, 2017

Leipzig, Germany

Acceptance Rates

WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;

Overall Acceptance Rate 118 of 178 submissions, 66%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
114
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Razova EKotelnikov E(2020)Concentration Areas of Sentiment Lexica in the Word Embedding SpaceInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.201904010413:2(48-62)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.4018/IJCINI.2019040104
Razova EKotelnikov E(2020)The Layout of Sentiment Lexica in Word Vector SpaceInnovations, Algorithms, and Applications in Cognitive Informatics and Natural Intelligence10.4018/978-1-7998-3038-2.ch015(321-338)Online publication date: 2020
https://doi.org/10.4018/978-1-7998-3038-2.ch015
Xie YZhou BOu Y(2019)A Method based on Sentence Embeddings for the Sub-Topics DetectionJournal of Physics: Conference Series10.1088/1742-6596/1168/5/0520041168(052004)Online publication date: 12-Mar-2019
https://doi.org/10.1088/1742-6596/1168/5/052004
KESHI ISUZUKI YYOSHINO KNAKAMURA S(2018)Semantically Readable Distributed Representation Learning and Its Expandability Using a Word Semantic Vector DictionaryIEICE Transactions on Information and Systems10.1587/transinf.2017DAP0019E101.D:4(1066-1078)Online publication date: 2018
https://doi.org/10.1587/transinf.2017DAP0019
Razova EKotelnikov E(2018)The Arrangement of Sentiment Lexica in the Space of Distributed Word Representations2018 IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)10.1109/ICCI-CC.2018.8482065(240-245)Online publication date: Jul-2018
https://doi.org/10.1109/ICCI-CC.2018.8482065

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten