Abstract
Unstructured text data is very important in many applications because it reflects the thought of the people who create this data. However, it is difficult to realize the latent information as it was hidden on the unstructured text data. This paper proposes a latent learning method to construct the lexical structure to constitute the relations between the latent meaning and words. The established lexical structure derived the useful information from unstructured text data and this information and this information can be used for various application. This paper describes how to predict a rating from user-written reviews which is one of unstructured text data. And it also provides visualization information of the semantic lexical structures as the result of analysis. As a result, the proposed method easily quantifies the semantic relations of words and it shows good performance on prediction of ratings from unstructured text data. The proposed method can contribute to analyzing the unstructured text data in various perspectives on latent meaning of words.





Similar content being viewed by others
References
Agarwal D, Chen B-C (2010) fLDA: matrix factorization through latent dirichlet allocation. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp 91–100
Alghunaim A (2015) A vector space approach for aspect-based sentiment analysis. Massachusetts Institute of Technology
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, pp 2200–2204
Benton A, Paul MJ, Hancock B, Dredze M (2016) Collective supervision of topic models for predicting surveys with social media. In: AAAI, pp 2892–2898
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
He R, McAuley J (2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 507–517
Kilgarriff A (2000) Wordnet: An electronic lexical database. JSTOR
Koren Y, Bell R (2015) Advances in collaborative filtering. In: Recommender systems handbook. Springer, pp 77–118
Li F, Liu NN, Jin H, Zhao K, Yang Q, Zhu X (2011) Incorporating reviewer and product information for review rating prediction. In: Twenty-Second International Joint Conference on Artificial Intelligence
Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: a survey. Decis Support Syst 74:12–32
McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM conference on Recommender systems, ACM, pp 165–172
McAuley J, Targett C, Shi Q, Van Den Hengel A (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 43–52
Meng L, Huang R, Gu J (2013) A review of semantic similarity measures in wordnet. IJHIT 6:1–12
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space arXiv preprint arXiv:13013781
Moore TE (1982) Subliminal advertising: What you see is what you get. J Mark 46:38–47
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2:1–135
Qiu G, Liu B, Bu J, Chen C (2011) Opinion word expansion and target extraction through double propagation. Comput Linguist 37:9–27
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, Association for Computational Linguistics, pp 248–256
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence, AUAI Press, pp 487–494
Seo J, Choi S, Kim M, Han S (2013) The method of personalized recommendation with ensemble combination. JoWUA 4:108–121
Seo J, Choi S, Kim M, Han S (2016) A robust ensemble-based trust and reputation system against different types of intruder attacks. Int J Comput Math 93:308–324
Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their location in images. In: Computer Vision, ICCV 2005. Tenth IEEE International Conference on, 2005. IEEE, pp 370–377
Tang D, Wei F, Qin B, Zhou M, Liu T (2014a) Building large-scale twitter-specific sentiment lexicon: A representation learning approach. In: COLING, pp 172–182
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014b) Learning sentiment-specific word embedding for twitter sentiment classification. In: ACL (1), pp 1555–1565
Tang D, Qin B, Liu T, Yang Y (2015) User modeling with neural network for review rating prediction. In: IJCAI, pp 1340–1346
Tsang AS, Prendergast G (2009) Is a “star” worth a thousand words? The interplay between product-review texts and rating valences. Eur J Mark 43:1269–1280
Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 783–792
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and future Planning (NRF - 2015R1A2 A2A01005304) and this research was supported by the Chung-Ang University Graduate Research Scholarship in 2015.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Seo, J., Yoo, K., Choi, S. et al. The latent learning model to derive semantic relations of words from unstructured text data in social media. Multimed Tools Appl 78, 28649–28663 (2019). https://doi.org/10.1007/s11042-018-6211-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6211-2