skip to main content
10.1145/3106426.3106521acmconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Semantically readable distributed representation learning for social media mining

Published: 23 August 2017 Publication History

Abstract

The problem with distributed representations generated by neural networks is that the meaning of the features is difficult to understand. We propose a new method that gives a specific meaning to each node of a hidden layer by introducing a manually created word semantic vector dictionary into the initial weights and by using paragraph vector models. Our experimental results demonstrated that weights obtained based on learning and weights based on the dictionary are more strongly correlated in a closed test and more weakly correlated in an open test, compared with the results of a control test. Additionally, we found that the learned vector are better than the performance of the existing paragraph vector in the evaluation of the sentiment analysis task. Finally, we determined the readability of document embedding in a user test. The definition of readability in this paper is that people can understand the meaning of large weighted features of distributed representations. A total of 52.4% of the top five weighted hidden nodes were related to tweets where one of the paragraph vector models learned the document embedding. Because each hidden node maintains a specific meaning, the proposed method succeeds in improving readability.

References

[1]
Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. 86--90.
[2]
Blei, David M. and Ng, Andrew Y. and Jordan, Michael I. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (2003), 993--1022.
[3]
Danushka Bollegala, Alsuhaibani Mohammed, Takanori Maehara, and Ken-ichi Kawarabayashi. 2016. Joint Word Representation Learning Using a Corpus and a Semantic Lexicon. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. 2690--2696.
[4]
Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy, and Noah A. Smith. 2015. Retrofitting Word Vectors to Semantic Lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1606--1615.
[5]
Ikuo Keshi, Hiroshi Ikeuchi, and Ken'ichi Kuromusha. 1996. Associative image retrieval using knowledge in encyclopedia text. Systems and Computers in Japan 27, 12 (1996), 53--62.
[6]
Ikuo Keshi, Yu Suzuki, Koichiro Yoshino, Graham Neubig, Kazuto Ohara, Toshiro Mukai, and Satoshi Nakamura. 2017. Reputation Information Extraction from Twitter Using a Word Semantic Vector Dictionary. IEICE TRANSACTIONS on Information and Systems (Japanese Edition) J100-D, 4 (2017), 530--543.
[7]
Jey Han Lau and Timothy Baldwin. 2016. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. In Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, 78--86.
[8]
Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31th International Conference on Machine Learning. 1188--1196.
[9]
Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, and Rui Fang. 2016. Tweet Topic Classification Using Distributed Language Representations. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI). 81--88.
[10]
Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical Word Embeddings. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2418--2424.
[11]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).
[12]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013).
[13]
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association of Computational Linguistics : Human Language Technologies. 746--751.
[14]
George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM 38, 11 (1995), 39--41.
[15]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Vol. 14. 1532--1543.
[16]
Xin Rong. 2014. word2vec Parameter Learning Explained. CoRR abs/1411.2738 (2014).
[17]
S. Rosenthal, P. Nakov, S. Kiritchenko, S. M Mohammad, A. Ritter, and V. Stoyanov. 2015. SemEval-2015 Task 10: Sentiment analysis in Twitter, In Proceedings of the 9th International Workshop on Semantic Evaluation. SemEval-2015, 451--463.
[18]
J. Snoek, H. Larochelle, and R. P. Adams. 2012. Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems (2012), 2951--2959.
[19]
S.Suzuki. 2003. Probabilistic Word Vector and Similarity based on Dictionaries. In Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics. 564--574.
[20]
Soroush Vosoughi, Prashanth Vijayaraghavan, and Deb Roy. 2016. Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1041--1044.
[21]
Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, and Tie-Yan Liu. 2014. RC-NET: A General Framework for Incorporating Knowledge into Word Representations. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 1219--1228.

Cited By

View all
  • (2020)Concentration Areas of Sentiment Lexica in the Word Embedding SpaceInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.201904010413:2(48-62)Online publication date: 1-Oct-2020
  • (2020)The Layout of Sentiment Lexica in Word Vector SpaceInnovations, Algorithms, and Applications in Cognitive Informatics and Natural Intelligence10.4018/978-1-7998-3038-2.ch015(321-338)Online publication date: 2020
  • (2019)A Method based on Sentence Embeddings for the Sub-Topics DetectionJournal of Physics: Conference Series10.1088/1742-6596/1168/5/0520041168(052004)Online publication date: 12-Mar-2019
  • Show More Cited By
  1. Semantically readable distributed representation learning for social media mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WI '17: Proceedings of the International Conference on Web Intelligence
    August 2017
    1284 pages
    ISBN:9781450349512
    DOI:10.1145/3106426
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Twitter
    2. distributed representation learning
    3. paragraph vector
    4. semantic lexicon
    5. semantic vector
    6. sentiment analysis
    7. word2vec

    Qualifiers

    • Research-article

    Funding Sources

    • CREST JST
    • Sharp Corporation

    Conference

    WI '17
    Sponsor:

    Acceptance Rates

    WI '17 Paper Acceptance Rate 118 of 178 submissions, 66%;
    Overall Acceptance Rate 118 of 178 submissions, 66%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Concentration Areas of Sentiment Lexica in the Word Embedding SpaceInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.201904010413:2(48-62)Online publication date: 1-Oct-2020
    • (2020)The Layout of Sentiment Lexica in Word Vector SpaceInnovations, Algorithms, and Applications in Cognitive Informatics and Natural Intelligence10.4018/978-1-7998-3038-2.ch015(321-338)Online publication date: 2020
    • (2019)A Method based on Sentence Embeddings for the Sub-Topics DetectionJournal of Physics: Conference Series10.1088/1742-6596/1168/5/0520041168(052004)Online publication date: 12-Mar-2019
    • (2018)Semantically Readable Distributed Representation Learning and Its Expandability Using a Word Semantic Vector DictionaryIEICE Transactions on Information and Systems10.1587/transinf.2017DAP0019E101.D:4(1066-1078)Online publication date: 2018
    • (2018)The Arrangement of Sentiment Lexica in the Space of Distributed Word Representations2018 IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)10.1109/ICCI-CC.2018.8482065(240-245)Online publication date: Jul-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media