Qualitative and Descriptive Topic Extraction from Movie Reviews Using LDA

Dupuy, Christophe; Bach, Francis; Diot, Christophe

doi:10.1007/978-3-319-62416-7_7

Christophe Dupuy^14,15,
Francis Bach^14,16 &
Christophe Diot¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Included in the following conference series:

International Conference on Machine Learning and Data Mining in Pattern Recognition

3986 Accesses
4 Citations

Abstract

Crowdsourced review services such as IMDB or Rotten Tomatoes provide numerical ratings and raw text reviews to help users decide what movie to watch. As the amount of reviews per movie can be very large, selecting a movie among a catalog can be a tedious task. This problem can be addressed by providing the user the most relevant reviews or building automatic reviews summaries. We take a different approach by predicting personalized movie description from text reviews using Latent Dirichlet Allocation (LDA) based topic models. Our models extract distinct qualitative and descriptive topics by combining text reviews and movie ratings in a joint probabilistic model. We evaluate our models on an IMDB dataset and illustrate its performance through comparison of topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For restaurants, the number of features is smaller and the optimal value of K is around 60.

References

Bird, S., Klein, E., Loper, E.: Natural language processing with Python. O’Reilly Media, Inc. (2009)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)
MATH Google Scholar
Casella, G., Berger, R.L.: Statistical Inference, vol. 2. Duxbury Pacific Grove, CA (2002)
MATH Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Advance in Neural Information Processing Systems (2009)
Google Scholar
Diao, Q., Qiu, M., Wu, C.Y., Smola, A.J., Jiang, J., Wang, C.: Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS). In: Proceeding of ACM SIGKDD (2014)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(suppl 1), 5228–5235 (2004)
Article Google Scholar
Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent Dirichlet allocation. In: Advance in Neural Information Processing Systems (2010)
Google Scholar
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. JMLR 14(1), 1303–1347 (2013)
MathSciNet MATH Google Scholar
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Article Google Scholar
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceeding of the 26th International Conference on Machine Learning, pp. 375–384 (2009)
Google Scholar
Ling, G., Lyu, M.R., King, I.: Ratings meet reviews, a combined approach to recommend. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 105–112. ACM (2014)
Google Scholar
McAuley, J.J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems (2013)
Google Scholar
Mcauliffe, J., Blei, D.: Supervised topic models. In: Advance in Neural Information Processing Systems, pp. 121–128 (2008)
Google Scholar
Newman, D., Bonilla, E., Buntine, W.: Improving topic coherence with regularized topic models. In: Advance in Neural Information Processing Systems, pp. 496–504 (2011)
Google Scholar
Titov, I., McDonald, R.: Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th International Conference on World Wide Web, pp. 111–120. ACM (2008)
Google Scholar
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th International Conference on Machine Learning(2009)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, Paris, France
Christophe Dupuy & Francis Bach
Technicolor, Issy-les-Moulineaux, France
Christophe Dupuy & Christophe Diot
ENS, Paris, France
Francis Bach

Authors

Christophe Dupuy
View author publications
You can also search for this author in PubMed Google Scholar
Francis Bach
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Diot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christophe Dupuy .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Appendices

A Variational derivation of LDA-R-C

In this section, we provide the full variational derivation of our model LDA-R-C presented Fig. 3. Our objective is to maximize the likelihood of the observed corpus of documents $\mathcal {W}={w^1,\ldots ,W^D}$:

$$ p(\mathcal {W}|\omega ,\{\theta ^{c^d}\}_d,\eta ,\{\hat{r^d}\}_d)= \prod _{d=1}^D p(w^d|\omega ,\theta ^{c^d},\eta ,\hat{r}^d). $$

As this likelihood is intractable to compute, we maximize an approximation of the likelihood $\mathcal {L}(q)=\sum _{d=1}^D\mathcal {L}^d(q)$ over a variational family of distributions. Following [8], we have for any $w^d\in \mathcal {W}$:

$$ \log p(w^d|\omega ,\theta ^{c^d},\eta ,\hat{r}^d)\ge \mathbb {E}_q[ \log p(w^d,z^d,m^d,p^d,\phi ,\varPsi |\omega ,\theta ^{c^d},\eta ,\hat{r}^d)] - \mathbb {E}_q[ \log q(z^d,m^d,p^d,\phi ,\varPsi )]\equiv \mathcal {L}^d(q), $$

where q represents the variational model. We choose the variational model q to be in the meanfield variational family:

$$ q(z^d,m^d,p^d,\phi ,\varPsi ) = q(\phi |\lambda )q(\varPsi |\varLambda )q(p^d|\pi ^d) \prod _{n=1}^N q(z_n^d|\alpha _n^d)q(m_n^d|\mu _n^d), $$

with, $\forall d=1,\ldots ,D$:

$q(\phi ^k|\lambda ^k)\sim $Dirichlet($\lambda ^k$) with $\lambda ^k\in \mathbb {R}^V$ and $k=1,\ldots ,K$,
$q(\varPsi ^s|\varLambda ^s)\sim $Dirichlet($\varLambda ^s$) with $\varLambda ^s\in \mathbb {R}^V$ and $s\in \{+1,-1\}$,
$q(p^d|\pi ^d)\sim $Dirichlet($\pi ^d$) with $\pi ^d\in \mathbb {R}^2$,
$q(z_n^d|\alpha _n^d)\sim $Multinomial($\alpha _n^d$) with $\alpha _n^d\in \mathbb {R}^K$, $\sum _k\alpha _{n,k}^d=1$ and $n=1,\ldots ,N$,
$q(m_n^d|\mu _n^d)\sim $Multinomial($\mu _n^d$) with $\mu _n^d\in \mathbb {R}^2$, $\mu _{n,1}^d+\mu _{n,2}^d=1$ and $n=1,\ldots ,N$.

We also have:

$$ p(w^d,z^d,m^d,p^d,\phi ,\varPsi |\omega ,\theta ^{c^d},\eta ,\hat{r}^d) = p(\phi |\eta )p(\varPsi |\eta )p(p^d|\omega )\prod _{n=1}^N p(w_n^d|\phi ,\varPsi ,z_n^d,r^d,m_n^d)p(z_n^d|\theta ^{c^d})p(m_n^d|p^d), $$

with, $\forall d=1,\ldots ,D$:

$p(\phi ^k|\eta )\sim $Dirichlet($\eta \mathbf {1}$) with $\eta \in \mathbb {R}$ and $k=1,\ldots ,K$,
$p(\varPsi ^s|\eta )\sim $Dirichlet($\eta \mathbf {1}$) with $\eta \in \mathbb {R}$ and $s\in \{+1,-1\}$,
$p(p^d|\omega )\sim $Dirichlet($\omega $) with $\omega \in \mathbb {R}^2$,
$p(z_n^d|\theta ^{c^d})\sim $Multinomial($\theta ^{c^d}$) with $\theta ^{c^d}\in \mathbb {R}^K$, $\sum _k\theta ^{c^d}_{k}=1$ and $n=1,\ldots ,N$,
$p(m_n^d|p^d)\sim $Multinomial($p^d$) with $p^d\in \mathbb {R}^2$, $p_1^d+p_2^d=1$ and $n=1,\ldots ,N$,
$p(w_n^d|\phi ,\varPsi ,z_n^d,r^d,m_n^d)\sim $Multinomial$\left( \phi ^{z_n^d}\mathbf {1}[m_n^d=0]+\varPsi ^{r^d}\mathbf {1}[m_n^d=1]\right) $.

We then maximize $\mathcal {L}(q)$ by iteratively maximizing $\mathcal {L}(q)$ with respect to variational parameters $\lambda ,\varLambda ,\pi ,\alpha ,\mu $ (E-step) then maximizing $\mathcal {L}(q)$ with respect to hyperparameters $\omega ,\eta $ (M-step).

1.1 A.1 Variational E-step

For the E-step, we maximize $\mathcal {L}(q)$ with respect to variational parameters $\lambda ,\varLambda ,\pi ,\alpha ,\mu $ by alternatively setting the gradient of $\mathcal {L}(q)$ with respect to each paramater to zero. It gives the following updates for the variational parameters, for $n=1,\ldots ,N$; $k=1,\ldots ,K$; $i=1,2$ and $s\in \{-1,+1\}$:

$$ \left\{ \begin{array}{ccl} \alpha _{n,k}^d &{} \propto &{} \theta ^{c^d}_k\exp \left[ \mu _{n,1}^d\left( \psi (\lambda ^k_{w_n^d}) - \psi (\sum _j \lambda ^k_{j})\right) \right] ,\\ \pi _i^d &{} =&{} \omega _i + \sum _{n=1}^N\mu _{n,i}^d,\\ \mu _{n,1}^d &{}\propto &{} \exp \left[ \psi (\pi _1^d) + \sum _{k=1}^K \psi (\lambda ^k_{w_n^d}) - \psi (\sum _j \lambda ^k_{j})\right] , \\ \mu _{n,2}^d &{}\propto &{} \exp \left[ \psi (\pi _2^d) + \sum _{s\in \{-1,+1\}} \psi (\varLambda ^s_{w_n^d}) - \psi (\sum _j \varLambda ^s_{j})\right] , \\ \\ \lambda ^k_{v} &{} = &{} \eta + \sum \limits _{d=1}^D\sum _{n=1}^{N_d}\mu _{n,1}^d\alpha _{n,k}^d\mathbf {1}[w_n^d=v],\\ \varLambda ^s_{v} &{} = &{} \eta + \sum \limits _{d: r^d=s}\sum _{n=1}^{N_d}\mu _{n,2}^d\mathbf {1}[w_n^d=v]. \end{array} \right. $$

$\psi $ is the digamma function: $\psi (x)=\frac{d}{dx}\ln \varGamma (x)$.

1.2 A.2 Variational M-step

For the M-step, we maximize $\mathcal {L}(q)$ with respect to the hyperparameters $\omega ,\eta $. We use the Newton method for each parameter, using the same scheme than in LDA [2]. We have the following derivatives for $\omega $:

$$\left\{ \begin{array}{ccl} \frac{\partial }{\partial \omega _i}\mathcal {L}(q) &{} = &{} D\left( \psi (\sum _j \omega _j) - \psi (\omega _i) \right) + \sum _{d=1}^D \left( \psi (\pi ^d_i), -\psi (\sum _j\pi ^d_j)\right) \\ \\ \frac{\partial ^2}{\partial \omega _i\partial \omega _j}\mathcal {L}(q) &{} = &{} D\psi '(\sum _l\omega _l) - \mathbf {1}[i=j]D\psi '(\omega _i). \end{array} \right. $$

We have the following derivatives for $\eta $

$$\left\{ \begin{array}{ccl} \frac{\partial }{\partial \eta }\mathcal {L}(q) &{} = &{} (K+2)V\left( \psi (V\eta )-\psi (\eta )\right) + \sum _{v=1}^V\left( \sum _{k=1}^K\psi (\lambda ^k_{v}) +\sum _{s=\{-1,+1\}}\psi (\varLambda ^s_{v})\right) \\ &{} &{} - V\left( \sum _{k=1}^K\psi (\sum _{v=1}^V\lambda ^k_{v}) + \sum _{s=\{-1,+1\}}\psi (\sum _{v=1}^V\varLambda ^s_{v})\right) , \\ \\ \frac{\partial ^2}{\partial \eta ^2}\mathcal {L}(q) &{} = &{} (K+2)V\left( V\psi '(V\eta )-\psi '(\eta )\right) . \end{array} \right. $$

We maximize $\mathcal {L}(q)$ with respect to $\omega $ by doing iterations of Newton steps until convergence:

$$ \omega ^{(t+1)} = \omega ^{(t)} - H^{-1}\nabla _{\omega ^{(t)}}\mathcal {L}(q), $$

where H is the Hessian $H=\nabla ^2_{\omega ^{(t)}}\mathcal {L}(q)$. We then maximize $\mathcal {L}(q)$ with respect to $\eta $ by again doing iterations of Newton steps until convergence:

$$ \eta ^{(t+1)} = \eta ^{(t)} - \left[ \frac{\partial ^2}{(\partial \eta ^{(t)}) ^2}\mathcal {L}(q)\right] ^{-1}\left( \frac{\partial }{\partial \eta ^{(t)}}\mathcal {L}(q)\right) . $$

B Topics Extracted with HFT [12]

In this section, we present the 8 most significant topics out of 100 topics inferred with LDA-R-C, SLDA and HFT in Table 4. Both LDA-R-C and SLDA extract qualitative topics, while we could not extract qualitative topics with HFT. The descriptive topics of the three methods are consistent around genres, sequels, actors or directors and the three methods extract similar topics. We observe that topics extracted with LDA-R-C share more top words with SLDA than with HFT. For instance, in Table 3, 6 out of 10 top words of topic T1 obtained with LDA-R-C also appear in the top of SLDA’s topic T3. In the same way, topics T2 to T5 extracted with LDA-R-C are respectively closer to topics T4 to T7 extracted with SLDA than topics T4 to T7 extracted with HFT.

In HFT, the parameters of LDA are linked to rating prediction parameters. As a result, the top words of the topics are still centered around generic genres, sequels, actors, directors but also contain words related to specific movies. For instance, in Table 3, the topic T4 extracted with HFT is centered around comedy and contains the words sandler, ferrell which are specific actor names and wedding which is a specific part of a plot. In the topic T5 extracted with HFT, centered around animation movies, we find the words wall-e, nemo which are specific titles and costner which is an actor name. The top words in both LDA-R-C and SLDA topics are more generic, leading to better predictions. Indeed, it is more likely that a review about a comedy movie contains funny than wedding, as only few comedy movies are related to a wedding.

Table 4. 8 topics extracted with LDA-R-C, SLDA and HFT, $K=100$ and the associated score for SLDA (see [13] for details).

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dupuy, C., Bach, F., Diot, C. (2017). Qualitative and Descriptive Topic Extraction from Movie Reviews Using LDA. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-62416-7_7
Published: 02 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics