Skip to main content

Qualitative and Descriptive Topic Extraction from Movie Reviews Using LDA

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Abstract

Crowdsourced review services such as IMDB or Rotten Tomatoes provide numerical ratings and raw text reviews to help users decide what movie to watch. As the amount of reviews per movie can be very large, selecting a movie among a catalog can be a tedious task. This problem can be addressed by providing the user the most relevant reviews or building automatic reviews summaries. We take a different approach by predicting personalized movie description from text reviews using Latent Dirichlet Allocation (LDA) based topic models. Our models extract distinct qualitative and descriptive topics by combining text reviews and movie ratings in a joint probabilistic model. We evaluate our models on an IMDB dataset and illustrate its performance through comparison of topics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For restaurants, the number of features is smaller and the optimal value of K is around 60.

References

  1. Bird, S., Klein, E., Loper, E.: Natural language processing with Python. O’Reilly Media, Inc. (2009)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. JMLR 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Casella, G., Berger, R.L.: Statistical Inference, vol. 2. Duxbury Pacific Grove, CA (2002)

    MATH  Google Scholar 

  4. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Advance in Neural Information Processing Systems (2009)

    Google Scholar 

  5. Diao, Q., Qiu, M., Wu, C.Y., Smola, A.J., Jiang, J., Wang, C.: Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS). In: Proceeding of ACM SIGKDD (2014)

    Google Scholar 

  6. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101(suppl 1), 5228–5235 (2004)

    Article  Google Scholar 

  7. Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent Dirichlet allocation. In: Advance in Neural Information Processing Systems (2010)

    Google Scholar 

  8. Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. JMLR 14(1), 1303–1347 (2013)

    MathSciNet  MATH  Google Scholar 

  9. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  10. Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceeding of the 26th International Conference on Machine Learning, pp. 375–384 (2009)

    Google Scholar 

  11. Ling, G., Lyu, M.R., King, I.: Ratings meet reviews, a combined approach to recommend. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 105–112. ACM (2014)

    Google Scholar 

  12. McAuley, J.J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems (2013)

    Google Scholar 

  13. Mcauliffe, J., Blei, D.: Supervised topic models. In: Advance in Neural Information Processing Systems, pp. 121–128 (2008)

    Google Scholar 

  14. Newman, D., Bonilla, E., Buntine, W.: Improving topic coherence with regularized topic models. In: Advance in Neural Information Processing Systems, pp. 496–504 (2011)

    Google Scholar 

  15. Titov, I., McDonald, R.: Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th International Conference on World Wide Web, pp. 111–120. ACM (2008)

    Google Scholar 

  16. Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th International Conference on Machine Learning(2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christophe Dupuy .

Editor information

Editors and Affiliations

Appendices

A Variational derivation of LDA-R-C

Fig. 3.
figure 3

Graphical representation of the model LDA-R-C including reduced ratings applied on D documents. White nodes represent hidden variables and colored nodes represent observed variables. The observed rating \(r^d\) is not reported for the sake of clarity. (Color figure online)

In this section, we provide the full variational derivation of our model LDA-R-C presented Fig. 3. Our objective is to maximize the likelihood of the observed corpus of documents \(\mathcal {W}={w^1,\ldots ,W^D}\):

$$ p(\mathcal {W}|\omega ,\{\theta ^{c^d}\}_d,\eta ,\{\hat{r^d}\}_d)= \prod _{d=1}^D p(w^d|\omega ,\theta ^{c^d},\eta ,\hat{r}^d). $$

As this likelihood is intractable to compute, we maximize an approximation of the likelihood \(\mathcal {L}(q)=\sum _{d=1}^D\mathcal {L}^d(q)\) over a variational family of distributions. Following [8], we have for any \(w^d\in \mathcal {W}\):

$$ \log p(w^d|\omega ,\theta ^{c^d},\eta ,\hat{r}^d)\ge \mathbb {E}_q[ \log p(w^d,z^d,m^d,p^d,\phi ,\varPsi |\omega ,\theta ^{c^d},\eta ,\hat{r}^d)] - \mathbb {E}_q[ \log q(z^d,m^d,p^d,\phi ,\varPsi )]\equiv \mathcal {L}^d(q), $$

where q represents the variational model. We choose the variational model q to be in the meanfield variational family:

$$ q(z^d,m^d,p^d,\phi ,\varPsi ) = q(\phi |\lambda )q(\varPsi |\varLambda )q(p^d|\pi ^d) \prod _{n=1}^N q(z_n^d|\alpha _n^d)q(m_n^d|\mu _n^d), $$

with, \(\forall d=1,\ldots ,D\):

  • \(q(\phi ^k|\lambda ^k)\sim \)Dirichlet(\(\lambda ^k\)) with \(\lambda ^k\in \mathbb {R}^V\) and \(k=1,\ldots ,K\),

  • \(q(\varPsi ^s|\varLambda ^s)\sim \)Dirichlet(\(\varLambda ^s\)) with \(\varLambda ^s\in \mathbb {R}^V\) and \(s\in \{+1,-1\}\),

  • \(q(p^d|\pi ^d)\sim \)Dirichlet(\(\pi ^d\)) with \(\pi ^d\in \mathbb {R}^2\),

  • \(q(z_n^d|\alpha _n^d)\sim \)Multinomial(\(\alpha _n^d\)) with \(\alpha _n^d\in \mathbb {R}^K\), \(\sum _k\alpha _{n,k}^d=1\) and \(n=1,\ldots ,N\),

  • \(q(m_n^d|\mu _n^d)\sim \)Multinomial(\(\mu _n^d\)) with \(\mu _n^d\in \mathbb {R}^2\), \(\mu _{n,1}^d+\mu _{n,2}^d=1\) and \(n=1,\ldots ,N\).

We also have:

$$ p(w^d,z^d,m^d,p^d,\phi ,\varPsi |\omega ,\theta ^{c^d},\eta ,\hat{r}^d) = p(\phi |\eta )p(\varPsi |\eta )p(p^d|\omega )\prod _{n=1}^N p(w_n^d|\phi ,\varPsi ,z_n^d,r^d,m_n^d)p(z_n^d|\theta ^{c^d})p(m_n^d|p^d), $$

with, \(\forall d=1,\ldots ,D\):

  • \(p(\phi ^k|\eta )\sim \)Dirichlet(\(\eta \mathbf {1}\)) with \(\eta \in \mathbb {R}\) and \(k=1,\ldots ,K\),

  • \(p(\varPsi ^s|\eta )\sim \)Dirichlet(\(\eta \mathbf {1}\)) with \(\eta \in \mathbb {R}\) and \(s\in \{+1,-1\}\),

  • \(p(p^d|\omega )\sim \)Dirichlet(\(\omega \)) with \(\omega \in \mathbb {R}^2\),

  • \(p(z_n^d|\theta ^{c^d})\sim \)Multinomial(\(\theta ^{c^d}\)) with \(\theta ^{c^d}\in \mathbb {R}^K\), \(\sum _k\theta ^{c^d}_{k}=1\) and \(n=1,\ldots ,N\),

  • \(p(m_n^d|p^d)\sim \)Multinomial(\(p^d\)) with \(p^d\in \mathbb {R}^2\), \(p_1^d+p_2^d=1\) and \(n=1,\ldots ,N\),

  • \(p(w_n^d|\phi ,\varPsi ,z_n^d,r^d,m_n^d)\sim \)Multinomial\(\left( \phi ^{z_n^d}\mathbf {1}[m_n^d=0]+\varPsi ^{r^d}\mathbf {1}[m_n^d=1]\right) \).

We then maximize \(\mathcal {L}(q)\) by iteratively maximizing \(\mathcal {L}(q)\) with respect to variational parameters \(\lambda ,\varLambda ,\pi ,\alpha ,\mu \) (E-step) then maximizing \(\mathcal {L}(q)\) with respect to hyperparameters \(\omega ,\eta \) (M-step).

1.1 A.1 Variational E-step

For the E-step, we maximize \(\mathcal {L}(q)\) with respect to variational parameters \(\lambda ,\varLambda ,\pi ,\alpha ,\mu \) by alternatively setting the gradient of \(\mathcal {L}(q)\) with respect to each paramater to zero. It gives the following updates for the variational parameters, for \(n=1,\ldots ,N\); \(k=1,\ldots ,K\); \(i=1,2\) and \(s\in \{-1,+1\}\):

$$ \left\{ \begin{array}{ccl} \alpha _{n,k}^d &{} \propto &{} \theta ^{c^d}_k\exp \left[ \mu _{n,1}^d\left( \psi (\lambda ^k_{w_n^d}) - \psi (\sum _j \lambda ^k_{j})\right) \right] ,\\ \pi _i^d &{} =&{} \omega _i + \sum _{n=1}^N\mu _{n,i}^d,\\ \mu _{n,1}^d &{}\propto &{} \exp \left[ \psi (\pi _1^d) + \sum _{k=1}^K \psi (\lambda ^k_{w_n^d}) - \psi (\sum _j \lambda ^k_{j})\right] , \\ \mu _{n,2}^d &{}\propto &{} \exp \left[ \psi (\pi _2^d) + \sum _{s\in \{-1,+1\}} \psi (\varLambda ^s_{w_n^d}) - \psi (\sum _j \varLambda ^s_{j})\right] , \\ \\ \lambda ^k_{v} &{} = &{} \eta + \sum \limits _{d=1}^D\sum _{n=1}^{N_d}\mu _{n,1}^d\alpha _{n,k}^d\mathbf {1}[w_n^d=v],\\ \varLambda ^s_{v} &{} = &{} \eta + \sum \limits _{d: r^d=s}\sum _{n=1}^{N_d}\mu _{n,2}^d\mathbf {1}[w_n^d=v]. \end{array} \right. $$

\(\psi \) is the digamma function: \(\psi (x)=\frac{d}{dx}\ln \varGamma (x)\).

1.2 A.2 Variational M-step

For the M-step, we maximize \(\mathcal {L}(q)\) with respect to the hyperparameters \(\omega ,\eta \). We use the Newton method for each parameter, using the same scheme than in LDA [2]. We have the following derivatives for \(\omega \):

$$\left\{ \begin{array}{ccl} \frac{\partial }{\partial \omega _i}\mathcal {L}(q) &{} = &{} D\left( \psi (\sum _j \omega _j) - \psi (\omega _i) \right) + \sum _{d=1}^D \left( \psi (\pi ^d_i), -\psi (\sum _j\pi ^d_j)\right) \\ \\ \frac{\partial ^2}{\partial \omega _i\partial \omega _j}\mathcal {L}(q) &{} = &{} D\psi '(\sum _l\omega _l) - \mathbf {1}[i=j]D\psi '(\omega _i). \end{array} \right. $$

We have the following derivatives for \(\eta \)

$$\left\{ \begin{array}{ccl} \frac{\partial }{\partial \eta }\mathcal {L}(q) &{} = &{} (K+2)V\left( \psi (V\eta )-\psi (\eta )\right) + \sum _{v=1}^V\left( \sum _{k=1}^K\psi (\lambda ^k_{v}) +\sum _{s=\{-1,+1\}}\psi (\varLambda ^s_{v})\right) \\ &{} &{} - V\left( \sum _{k=1}^K\psi (\sum _{v=1}^V\lambda ^k_{v}) + \sum _{s=\{-1,+1\}}\psi (\sum _{v=1}^V\varLambda ^s_{v})\right) , \\ \\ \frac{\partial ^2}{\partial \eta ^2}\mathcal {L}(q) &{} = &{} (K+2)V\left( V\psi '(V\eta )-\psi '(\eta )\right) . \end{array} \right. $$

We maximize \(\mathcal {L}(q)\) with respect to \(\omega \) by doing iterations of Newton steps until convergence:

$$ \omega ^{(t+1)} = \omega ^{(t)} - H^{-1}\nabla _{\omega ^{(t)}}\mathcal {L}(q), $$

where H is the Hessian \(H=\nabla ^2_{\omega ^{(t)}}\mathcal {L}(q)\). We then maximize \(\mathcal {L}(q)\) with respect to \(\eta \) by again doing iterations of Newton steps until convergence:

$$ \eta ^{(t+1)} = \eta ^{(t)} - \left[ \frac{\partial ^2}{(\partial \eta ^{(t)}) ^2}\mathcal {L}(q)\right] ^{-1}\left( \frac{\partial }{\partial \eta ^{(t)}}\mathcal {L}(q)\right) . $$

B Topics Extracted with HFT [12]

In this section, we present the 8 most significant topics out of 100 topics inferred with LDA-R-C, SLDA and HFT in Table 4. Both LDA-R-C and SLDA extract qualitative topics, while we could not extract qualitative topics with HFT. The descriptive topics of the three methods are consistent around genres, sequels, actors or directors and the three methods extract similar topics. We observe that topics extracted with LDA-R-C share more top words with SLDA than with HFT. For instance, in Table 3, 6 out of 10 top words of topic T1 obtained with LDA-R-C also appear in the top of SLDA’s topic T3. In the same way, topics T2 to T5 extracted with LDA-R-C are respectively closer to topics T4 to T7 extracted with SLDA than topics T4 to T7 extracted with HFT.

In HFT, the parameters of LDA are linked to rating prediction parameters. As a result, the top words of the topics are still centered around generic genres, sequels, actors, directors but also contain words related to specific movies. For instance, in Table 3, the topic T4 extracted with HFT is centered around comedy and contains the words sandler, ferrell which are specific actor names and wedding which is a specific part of a plot. In the topic T5 extracted with HFT, centered around animation movies, we find the words wall-e, nemo which are specific titles and costner which is an actor name. The top words in both LDA-R-C and SLDA topics are more generic, leading to better predictions. Indeed, it is more likely that a review about a comedy movie contains funny than wedding, as only few comedy movies are related to a wedding.

Table 4. 8 topics extracted with LDA-R-C, SLDA and HFT, \(K=100\) and the associated score for SLDA (see [13] for details).

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dupuy, C., Bach, F., Diot, C. (2017). Qualitative and Descriptive Topic Extraction from Movie Reviews Using LDA. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62416-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62415-0

  • Online ISBN: 978-3-319-62416-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics