Skip to main content

Comparing LDA and LSA Topic Models for Content-Based Movie Recommendation Systems

  • Conference paper
  • First Online:
Web Information Systems and Technologies (WEBIST 2014)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 226))

Included in the following conference series:

Abstract

We propose a plot-based recommendation system, which is based upon an evaluation of similarity between the plot of a video that was watched by a user and a large amount of plots stored in a movie database. Our system is independent from the number of user ratings, thus it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. The system implements and compares the two Topic Models, Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA), on a movie database of two hundred thousand plots that has been constructed by integrating different movie databases in a local NoSQL (MongoDB) DBMS. The topic models behaviour has been examined on the basis of standard metrics and user evaluations, performance assessments with 30 users to compare our tool with a commercial system have been conducted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://2011.camrachallenge.com/.

  2. 2.

    http://www.imdb.com/.

  3. 3.

    http://dbpedia.org/.

  4. 4.

    http://www.themoviedb.org/.

  5. 5.

    http://www.mongodb.org/.

  6. 6.

    http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.

  7. 7.

    One database access, using MongoDB, takes about 0.3 ms while the extraction of keywords from a plot takes more than one second.

  8. 8.

    Thus, we store the matrix of document-topic vectors to represent the training set.

  9. 9.

    In [4] work we determined 500 as a good number of topic. This value allows to have reasonable computational costs, and to maintain an appropriate level of accuracy.

  10. 10.

    Several experiments where conducted on a subset of the test set.

  11. 11.

    http://radimrehurek.com/gensim/.

  12. 12.

    The cost refers to a virtual machine set up with VMWare Workstation 9.0.1, installed on a server that has the following features: OS: Ubuntu 12.04 LTS 64-bit; RAM: 8 GB; 20 GB dedicated to the virtual hard disk; 4 cores. The DataBase Management System used is MongoDB 2.4.1, and it was installed on a machine with the following characteristics: OS: Windows Server 2008 R2 64-bit; CPU: Intel (R) Xeon E5620 Ghz 2:40; RAM: 12 GB.

  13. 13.

    http://www.imdb.com/chart/top.

  14. 14.

    http://www.jinni.com/.

  15. 15.

    http://wordnet.princeton.edu/.

References

  1. Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM 40, 56–58 (1997)

    Article  Google Scholar 

  2. Rashid, A.M., Karypis, G., Riedl, J.: Learning preferences of new users in recommender systems: an information theoretic approach. SIGKDD Explor. Newsl. 10, 90–100 (2008)

    Article  Google Scholar 

  3. Bergamaschi, S., Po, L., Sorrentino, S.: Comparing topic models for a movie recommendation system. In: Proceedings of 10th International Conference on Web Information Systems and Technologies (WEBIST 2014), Barcelona, Spain, Number 2, SCITEPRESS, pp. 172-183 (2014). ISBN 978-989-758-024-6

    Google Scholar 

  4. Farinella, T., Bergamaschi, S., Po, L.: A non-intrusive movie recommendation system. In: Meersman, R., et al. (eds.) OTM 2012, Part II. LNCS, vol. 7566, pp. 736–751. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  6. Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the Fourth ACM Conference on Recommender Systems. RecSys 2010, pp. 361–364. ACM, New York (2010)

    Google Scholar 

  7. Park, L.A.F., Ramamohanarao, K.: An analysis of latent semantic term self-correlation. ACM Trans. Inf. Syst. 27, 8:1–8:35 (2009)

    Article  Google Scholar 

  8. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)

    Article  Google Scholar 

  9. Griffiths, T., Steyvers, M., Tenenbaum, J.: Topics in semantic representation. Psychol. Rev. 114, 211–244 (2007)

    Article  Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. Sorrentino, S., Bergamaschi, S., Parmiggiani, E.: A supervised method for lexical annotation of schema labels based on wikipedia. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012 Main Conference 2012. LNCS, vol. 7532, pp. 359–368. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)

    MathSciNet  MATH  Google Scholar 

  13. Ekstrand, M.D., Riedl, J.T., Konstan, J.A.: Collaborative filtering recommender systems. Found. Trends Hum.-Comput. Interact. 4, 81–173 (2011)

    Article  Google Scholar 

  14. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005)

    Article  Google Scholar 

  15. Lee, M. D., Welsh, M.: An empirical evaluation of models of text document similarity. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society. CogSci2005, Erlbaum (2005) 1254–1259

    Google Scholar 

  16. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42, 30–37 (2009)

    Article  Google Scholar 

  17. Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 69–77. ACM, New York (2011)

    Google Scholar 

  18. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, ELRA, pp. 45–50 (2010)

    Google Scholar 

  19. Jin, X., Mobasher, B., Zhou, Y.: A web recommendation system based on maximum entropy. In: ITCC, pp. 213–218. IEEE Computer Society (2005)

    Google Scholar 

  20. Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: Bergman, L.D., Tuzhilin, A., Burke, R.D., Felfernig, A., Schmidt-Thieme, L. (eds.) RecSys, pp. 61–68. ACM (2009)

    Google Scholar 

  21. Shi, Y., Larson, M., Hanjalic, A.: Mining contextual movie similarity with matrix factorization for context-aware recommendation. ACM Trans. Intell. Syst. Technol. 4, 16:1–16:19 (2013)

    Google Scholar 

  22. Moshfeghi, Y., Piwowarski, B., Jose, J. M.: Handling data sparsity in collaborative filtering using emotion and semantic based features. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2011, pp. 625–634. ACM, New York (2011)

    Google Scholar 

  23. Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41, 1–69 (2009)

    Article  Google Scholar 

  24. Po, L., Sorrentino, S.: Automatic generation of probabilistic relationships for improving schema matching. Inf. Syst. 36, 192–208 (2011)

    Article  Google Scholar 

  25. Sorrentino, S., Bergamaschi, S., Gawinecki, M., Po, L.: Schema label normalization for improving schema matching. Data Knowl. Eng. 69, 1254–1273 (2010)

    Article  Google Scholar 

  26. Bergamaschi, S., Bouquet, P., Giacomuzzi, D., Guerra, F., Po, L., Vincini, M.: An incremental method for the lexical annotation of domain ontologies. Int. J. Semantic Web Inf. Syst. 3, 57–80 (2007)

    Article  Google Scholar 

Download references

Acknowledgements

The system has been developed in collaboration between the database group of the University of Modena and Reggio Emilia and vfree.tv (http://vfree.tv), a young and innovative German company focused on creating new ways of distributing television content and generating an unprecedented watching experience for the user.

We also want to express our gratitude to Tania Farinella, Matteo Abbruzzo and Olga Kryukova, master students in Computer Engineering and Science at the Department of Engineering “Enzo Ferrari” at University of Modena and Reggio Emilia for their contribution in term of implementation of the first and second version of the system (without and with LDA) and for their support during the evaluation of the system. Particular appreciation goes to Serena Sorrentino that helps us to integrate the LDA models in our system.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laura Po .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bergamaschi, S., Po, L. (2015). Comparing LDA and LSA Topic Models for Content-Based Movie Recommendation Systems. In: Monfort, V., Krempels, KH. (eds) Web Information Systems and Technologies. WEBIST 2014. Lecture Notes in Business Information Processing, vol 226. Springer, Cham. https://doi.org/10.1007/978-3-319-27030-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27030-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27029-6

  • Online ISBN: 978-3-319-27030-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics