Abstract
Crowdsourced review services such as IMDB or Rotten Tomatoes provide numerical ratings and raw text reviews to help users decide what movie to watch. As the amount of reviews per movie can be very large, selecting a movie among a catalog can be a tedious task. This problem can be addressed by providing the user the most relevant reviews or building automatic reviews summaries. We take a different approach by predicting personalized movie description from text reviews using Latent Dirichlet Allocation (LDA) based topic models. Our models extract distinct qualitative and descriptive topics by combining text reviews and movie ratings in a joint probabilistic model. We evaluate our models on an IMDB dataset and illustrate its performance through comparison of topics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For restaurants, the number of features is smaller and the optimal value of K is around 60.
References
Bird, S., Klein, E., Loper, E.: Natural language processing with Python. O’Reilly Media, Inc. (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)
Casella, G., Berger, R.L.: Statistical Inference, vol. 2. Duxbury Pacific Grove, CA (2002)
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Advance in Neural Information Processing Systems (2009)
Diao, Q., Qiu, M., Wu, C.Y., Smola, A.J., Jiang, J., Wang, C.: Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS). In: Proceeding of ACM SIGKDD (2014)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(suppl 1), 5228–5235 (2004)
Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent Dirichlet allocation. In: Advance in Neural Information Processing Systems (2010)
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. JMLR 14(1), 1303–1347 (2013)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceeding of the 26th International Conference on Machine Learning, pp. 375–384 (2009)
Ling, G., Lyu, M.R., King, I.: Ratings meet reviews, a combined approach to recommend. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 105–112. ACM (2014)
McAuley, J.J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems (2013)
Mcauliffe, J., Blei, D.: Supervised topic models. In: Advance in Neural Information Processing Systems, pp. 121–128 (2008)
Newman, D., Bonilla, E., Buntine, W.: Improving topic coherence with regularized topic models. In: Advance in Neural Information Processing Systems, pp. 496–504 (2011)
Titov, I., McDonald, R.: Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th International Conference on World Wide Web, pp. 111–120. ACM (2008)
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th International Conference on Machine Learning(2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Variational derivation of LDA-R-C
In this section, we provide the full variational derivation of our model LDA-R-C presented Fig. 3. Our objective is to maximize the likelihood of the observed corpus of documents \(\mathcal {W}={w^1,\ldots ,W^D}\):
As this likelihood is intractable to compute, we maximize an approximation of the likelihood \(\mathcal {L}(q)=\sum _{d=1}^D\mathcal {L}^d(q)\) over a variational family of distributions. Following [8], we have for any \(w^d\in \mathcal {W}\):
where q represents the variational model. We choose the variational model q to be in the meanfield variational family:
with, \(\forall d=1,\ldots ,D\):
-
\(q(\phi ^k|\lambda ^k)\sim \)Dirichlet(\(\lambda ^k\)) with \(\lambda ^k\in \mathbb {R}^V\) and \(k=1,\ldots ,K\),
-
\(q(\varPsi ^s|\varLambda ^s)\sim \)Dirichlet(\(\varLambda ^s\)) with \(\varLambda ^s\in \mathbb {R}^V\) and \(s\in \{+1,-1\}\),
-
\(q(p^d|\pi ^d)\sim \)Dirichlet(\(\pi ^d\)) with \(\pi ^d\in \mathbb {R}^2\),
-
\(q(z_n^d|\alpha _n^d)\sim \)Multinomial(\(\alpha _n^d\)) with \(\alpha _n^d\in \mathbb {R}^K\), \(\sum _k\alpha _{n,k}^d=1\) and \(n=1,\ldots ,N\),
-
\(q(m_n^d|\mu _n^d)\sim \)Multinomial(\(\mu _n^d\)) with \(\mu _n^d\in \mathbb {R}^2\), \(\mu _{n,1}^d+\mu _{n,2}^d=1\) and \(n=1,\ldots ,N\).
We also have:
with, \(\forall d=1,\ldots ,D\):
-
\(p(\phi ^k|\eta )\sim \)Dirichlet(\(\eta \mathbf {1}\)) with \(\eta \in \mathbb {R}\) and \(k=1,\ldots ,K\),
-
\(p(\varPsi ^s|\eta )\sim \)Dirichlet(\(\eta \mathbf {1}\)) with \(\eta \in \mathbb {R}\) and \(s\in \{+1,-1\}\),
-
\(p(p^d|\omega )\sim \)Dirichlet(\(\omega \)) with \(\omega \in \mathbb {R}^2\),
-
\(p(z_n^d|\theta ^{c^d})\sim \)Multinomial(\(\theta ^{c^d}\)) with \(\theta ^{c^d}\in \mathbb {R}^K\), \(\sum _k\theta ^{c^d}_{k}=1\) and \(n=1,\ldots ,N\),
-
\(p(m_n^d|p^d)\sim \)Multinomial(\(p^d\)) with \(p^d\in \mathbb {R}^2\), \(p_1^d+p_2^d=1\) and \(n=1,\ldots ,N\),
-
\(p(w_n^d|\phi ,\varPsi ,z_n^d,r^d,m_n^d)\sim \)Multinomial\(\left( \phi ^{z_n^d}\mathbf {1}[m_n^d=0]+\varPsi ^{r^d}\mathbf {1}[m_n^d=1]\right) \).
We then maximize \(\mathcal {L}(q)\) by iteratively maximizing \(\mathcal {L}(q)\) with respect to variational parameters \(\lambda ,\varLambda ,\pi ,\alpha ,\mu \) (E-step) then maximizing \(\mathcal {L}(q)\) with respect to hyperparameters \(\omega ,\eta \) (M-step).
1.1 A.1 Variational E-step
For the E-step, we maximize \(\mathcal {L}(q)\) with respect to variational parameters \(\lambda ,\varLambda ,\pi ,\alpha ,\mu \) by alternatively setting the gradient of \(\mathcal {L}(q)\) with respect to each paramater to zero. It gives the following updates for the variational parameters, for \(n=1,\ldots ,N\); \(k=1,\ldots ,K\); \(i=1,2\) and \(s\in \{-1,+1\}\):
\(\psi \) is the digamma function: \(\psi (x)=\frac{d}{dx}\ln \varGamma (x)\).
1.2 A.2 Variational M-step
For the M-step, we maximize \(\mathcal {L}(q)\) with respect to the hyperparameters \(\omega ,\eta \). We use the Newton method for each parameter, using the same scheme than in LDA [2]. We have the following derivatives for \(\omega \):
We have the following derivatives for \(\eta \)
We maximize \(\mathcal {L}(q)\) with respect to \(\omega \) by doing iterations of Newton steps until convergence:
where H is the Hessian \(H=\nabla ^2_{\omega ^{(t)}}\mathcal {L}(q)\). We then maximize \(\mathcal {L}(q)\) with respect to \(\eta \) by again doing iterations of Newton steps until convergence:
B Topics Extracted with HFT [12]
In this section, we present the 8 most significant topics out of 100 topics inferred with LDA-R-C, SLDA and HFT in Table 4. Both LDA-R-C and SLDA extract qualitative topics, while we could not extract qualitative topics with HFT. The descriptive topics of the three methods are consistent around genres, sequels, actors or directors and the three methods extract similar topics. We observe that topics extracted with LDA-R-C share more top words with SLDA than with HFT. For instance, in Table 3, 6 out of 10 top words of topic T1 obtained with LDA-R-C also appear in the top of SLDA’s topic T3. In the same way, topics T2 to T5 extracted with LDA-R-C are respectively closer to topics T4 to T7 extracted with SLDA than topics T4 to T7 extracted with HFT.
In HFT, the parameters of LDA are linked to rating prediction parameters. As a result, the top words of the topics are still centered around generic genres, sequels, actors, directors but also contain words related to specific movies. For instance, in Table 3, the topic T4 extracted with HFT is centered around comedy and contains the words sandler, ferrell which are specific actor names and wedding which is a specific part of a plot. In the topic T5 extracted with HFT, centered around animation movies, we find the words wall-e, nemo which are specific titles and costner which is an actor name. The top words in both LDA-R-C and SLDA topics are more generic, leading to better predictions. Indeed, it is more likely that a review about a comedy movie contains funny than wedding, as only few comedy movies are related to a wedding.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dupuy, C., Bach, F., Diot, C. (2017). Qualitative and Descriptive Topic Extraction from Movie Reviews Using LDA. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-62416-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)