Elsevier

Data & Knowledge Engineering

Volume 119, January 2019, Pages 139-149
Data & Knowledge Engineering

Modelling user attitudes using hierarchical sentiment-topic model

https://doi.org/10.1016/j.datak.2019.01.005Get rights and content

Abstract

Uncovering the latent structure of various hotly discussed topics and the corresponding sentiments from different social media user groups (e.g., Twitter) is critical for helping organizations and governments understand how users feel about their services and facilities, along with the events happening around them. Although numerous research texts have explored sentiment analysis on the different aspects of a product, fewer works have focused on why users like or dislike those products. In this paper, a novel probabilistic model is proposed, namely, the Hierarchical User Sentiment Topic Model (HUSTM), to discover the hidden structure of topics and users while performing sentiment analysis in a unified way. The goal of the HUSTM is to hierarchically model the users’ attitudes (opinions ) using different topic and sentiment information, including the positive, negative, and neutral. The experiment results on real-world data sets show the high quality of the hierarchy obtained by the HUSTM in comparison to those discovered using other state-of-the-art techniques.

Introduction

The construction of a hierarchical tree of topics and user’s interests from a social media platform is an interesting and significant problem. On social media platforms, such as Twitter and Weibo, a user often expresses opinions on various aspects of a product, such as overall design, battery capacity, screen size, and camera. A high-quality model of user interests at different levels of granularity has many valuable applications in the areas of summarization, search and browsing. With such a model, a user could quickly compare two or more smartphones on different granularities by looking at the hierarchy. Individuals could also find users who shared identical opinions, recommending interests to them. For organizations, hierarchically modelling the attitudes or interests of users can give insight into user interests with respect to a variety of topics and help analysing user’ behaviours, locating influential users at any granularity level by using their sentiment information.

Over the past decade, several author-topic models have been proposed to infer latent topics by using the authors’ interests and assigning each topic a probability distribution over words while each author is assigned a distribution over topics [1], [2]. More recently, hierarchical topic models have been the focus of multiple works [3], [4], [5], [6], [7]. Although some of these recently proposed methods (such as the Hierarchical Aspect-Sentiment Model (HASM) [7] and the Structured Sentiment Model [8]) can successfully discover topics and organize them into groups or hierarchies while identifying the sentiment polarity towards those topics, these models do not work well at modelling a user’s interest at different granularities. In fact, users who share the same opinion and topic should be hierarchically grouped together in the tree. Moreover, the current literature only identified topics or user’s interests if a user mentioned such topics frequently, ignoring the sentiment trend on any given topic.

To address these problems, this paper proposes a novel model called the Hierarchical User Sentiment Topic Model (HUSTM), which extends on the HASM by adding a user-sentiment layer that captures the users’ interest topics with different sentiment information. The primary goal of the HUSTM is to discover users’ attitudes and interests about different polar topics in the text hierarchy. In the HUSTM, the entire structure is a tree with each node in the tree separated further into two sub-nodes: (1) the topic-sentiment node, which models the word distribution over topic and sentiment (e.g., positive, negative, or neutral); and (2) the user-sentiment node, which captures the users’ attitudes with respective sentiment information. The HUSTM incorporates the user-sentiment analysis into topic discovery to investigate the attitudes that users have towards the topics found in the tree. Fig. 1 shows a topic hierarchy run on a smartphone data set. We experimentally demonstrate the effectiveness of the proposed models in three data sets. The results show a high-quality topical hierarchy discovered by our model when compared with other methods.

The advantages of the proposed HUSTM over existing models are summarized as follows:

  • It provides a unified model that discovers the hierarchical tree of topics, sentiments, and users.

  • It infers the width and the depth of the tree from the data automatically.

  • It discovers the topic hierarchy from both a short text and a long text by recursively modelling the words in an entire sentence.

  • It allows for an estimation of the user’s interest and sentiment towards topics to enhance the model accuracy.

The remaining of the paper is organized as follows. Section 2 reviews the related work. In Section 3, the HUSTM model is presented in detail. Section 4 describes the experimental result. Finally, Section 5 concludes this paper.

Section snippets

Related work

Overall, there are three types of model work related to ours: user (author) topic, sentimental topic, and hierarchical topic.

User (author) topic. This topic model is a type of statistical model for grouping words in order to find hidden topics in document collections. A popular topic model that represents documents as mixtures of topics is the Latent Dirichlet allocation (LDA) model, which models each topic as a distribution over words. A number of recent author topic models, which merge the

Problem definition

Below we introduce some problems of HUSTM and define the related concepts.

Definition 1 Topical Tree

A topical tree is defined as T, where each node in the tree is a tree itself. Fig. 1 shows a magnified view to the proposed topical tree. As shown, each node Ψ in the tree consists of a topic-sentiment node and a user-sentiment node. Fig. 1(a) shows the topical structure of HUSTM.

Definition 2 Topic-sentiment Node

A topic-sentiment node is semantically coherent them, which is represented by a multinomial distribution of the whole words in the vocabulary.

Hierarchical user-sentiment topic model

Hierarchical topics models, such as the HASM, discover topic hierarchies based on document-level word co-occurrence patterns. Applying these models to short texts may produce incomprehensible and incorrect trees, since short texts possess very limited word co-occurrence information. Such models also fail to include the users’ sentiment information, which is critical when extracting a more meaningful tree. To tackle these problems, the HUSTM is proposed, replacing the sentence-based layer with a

Experiments

The experimental evaluation of the proposed model is performed according to the following three aspects: the coverage, the parent–child relatedness, and the topic-sentiment consistency. These experimental results show that our proposed model provides promising results.

Conclusion

This paper presents a hierarchical user sentiment topic model (HUSTM), which can discover the hierarchy of the topic and user data while performing a sentiment analysis simultaneously. The primary advantage of the HUSTM is that it allows modelling of the users’ sentiment information in the model. It offers a general and effective model for answering questions about topics that the majority of users care about and why users like or dislike those topics. Experiments were conducted to evaluate the

Acknowledgments

This research is partially supported by Nature Science Foundation of China (Grant No. 61672284) and Australian Research Council (ARC) Discovery Project DP160104075.

Abdulqader Almars is currently a Ph.D. candidate in the School of Information Technology and Electrical Engineering at the University of Queensland Brisbane, Australia. He obtained his Master in Computer Science in from the University of Queensland Brisbane, Australia in 2016. His research interests include Big Data analysis, Sentiment analysis, and hierarchical topic models.

References (29)

  • SteyversM. et al.

    Probabilistic author-topic models for information discovery

  • MimnoD. et al.
  • Prescott AdamsR. et al.

    Tree-structured stick breaking processes for hierarchical dabelta

    Advances in Neural Information Processing Systems

    (2010)
  • AhmedA. et al.

    Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering

  • MimnoD. et al.
  • AhmedA. et al.
  • KimS. et al.

    A hierarchical aspect-sentiment model for online reviews.

    AAAI

    (2013)
  • AlmarsA. et al.

    Structured sentiment analysis.

    Advanced Data Mining and Applications

    (2017)
  • SachanM. et al.

    Using content and interactions for discovering communities in social networks

    WWW

    (2012)
  • LiD. et al.

    Adding community and dynamic to topic models

    J. Informetrics

    (2012)
  • AlmarsA. et al.

    Learning Concept Hierarchy from Short Texts Using Context Coherence: 19th International Conference, Dubai, United Arab Emirates, November 12-15, 2018, Proceedings, Part I

    (2018)
  • WangC. et al.
  • Rosen-ZviM. et al.

    The author-topic model for authors and documents

    Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence

    (2004)
  • PathakN. et al.

    Social Topic Models for Community Extraction

    (2008)
  • Cited by (0)

    Abdulqader Almars is currently a Ph.D. candidate in the School of Information Technology and Electrical Engineering at the University of Queensland Brisbane, Australia. He obtained his Master in Computer Science in from the University of Queensland Brisbane, Australia in 2016. His research interests include Big Data analysis, Sentiment analysis, and hierarchical topic models.

    Xue Li is an Associate Professor in the School of Information Technology and Electrical Engineering at the University of Queensland. He graduated in Computer Software from Chongqing University, Chongqing, China in 1982 and obtained the MSc degree in Computer Science from the University of Queensland in 1990 and the Ph.D. degree in Information Systems from Queensland University of Technology in 1997. His major areas of research interests and expertise include Data Mining and Intelligent Web Information Systems. He is a member of ACM, IEEE, and SIGKDD.

    Xin Zhao received the Ph.D. from the University of Queensland, Brisbane, Australia, in 2014. He is currently a postdoc at the University of Queensland. His current research interests include machine learning, data mining, healthcare analytics and social computing.

    View full text