Comparing LDA and LSA Topic Models for Content-Based Movie Recommendation Systems

Bergamaschi, Sonia; Po, Laura

doi:10.1007/978-3-319-27030-2_16

Sonia Bergamaschi⁸ &
Laura Po⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 226))

Included in the following conference series:

International Conference on Web Information Systems and Technologies

1473 Accesses
9 Citations

Abstract

We propose a plot-based recommendation system, which is based upon an evaluation of similarity between the plot of a video that was watched by a user and a large amount of plots stored in a movie database. Our system is independent from the number of user ratings, thus it is able to propose famous and beloved movies as well as old or unheard movies/programs that are still strongly related to the content of the video the user has watched. The system implements and compares the two Topic Models, Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA), on a movie database of two hundred thousand plots that has been constructed by integrating different movie databases in a local NoSQL (MongoDB) DBMS. The topic models behaviour has been examined on the basis of standard metrics and user evaluations, performance assessments with 30 users to compare our tool with a commercial system have been conducted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://2011.camrachallenge.com/.
2.
http://www.imdb.com/.
3.
http://dbpedia.org/.
4.
http://www.themoviedb.org/.
5.
http://www.mongodb.org/.
6.
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.
7.
One database access, using MongoDB, takes about 0.3 ms while the extraction of keywords from a plot takes more than one second.
8.
Thus, we store the matrix of document-topic vectors to represent the training set.
9.
In [4] work we determined 500 as a good number of topic. This value allows to have reasonable computational costs, and to maintain an appropriate level of accuracy.
10.
Several experiments where conducted on a subset of the test set.
11.
http://radimrehurek.com/gensim/.
12.
The cost refers to a virtual machine set up with VMWare Workstation 9.0.1, installed on a server that has the following features: OS: Ubuntu 12.04 LTS 64-bit; RAM: 8 GB; 20 GB dedicated to the virtual hard disk; 4 cores. The DataBase Management System used is MongoDB 2.4.1, and it was installed on a machine with the following characteristics: OS: Windows Server 2008 R2 64-bit; CPU: Intel (R) Xeon E5620 Ghz 2:40; RAM: 12 GB.
13.
http://www.imdb.com/chart/top.
14.
http://www.jinni.com/.
15.
http://wordnet.princeton.edu/.

References

Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM 40, 56–58 (1997)
Article Google Scholar
Rashid, A.M., Karypis, G., Riedl, J.: Learning preferences of new users in recommender systems: an information theoretic approach. SIGKDD Explor. Newsl. 10, 90–100 (2008)
Article Google Scholar
Bergamaschi, S., Po, L., Sorrentino, S.: Comparing topic models for a movie recommendation system. In: Proceedings of 10th International Conference on Web Information Systems and Technologies (WEBIST 2014), Barcelona, Spain, Number 2, SCITEPRESS, pp. 172-183 (2014). ISBN 978-989-758-024-6
Google Scholar
Farinella, T., Bergamaschi, S., Po, L.: A non-intrusive movie recommendation system. In: Meersman, R., et al. (eds.) OTM 2012, Part II. LNCS, vol. 7566, pp. 736–751. Springer, Heidelberg (2012)
Chapter Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Article MATH Google Scholar
Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the Fourth ACM Conference on Recommender Systems. RecSys 2010, pp. 361–364. ACM, New York (2010)
Google Scholar
Park, L.A.F., Ramamohanarao, K.: An analysis of latent semantic term self-correlation. ACM Trans. Inf. Syst. 27, 8:1–8:35 (2009)
Article Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Article Google Scholar
Griffiths, T., Steyvers, M., Tenenbaum, J.: Topics in semantic representation. Psychol. Rev. 114, 211–244 (2007)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Sorrentino, S., Bergamaschi, S., Parmiggiani, E.: A supervised method for lexical annotation of schema labels based on wikipedia. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012 Main Conference 2012. LNCS, vol. 7532, pp. 359–368. Springer, Heidelberg (2012)
Chapter Google Scholar
Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)
MathSciNet MATH Google Scholar
Ekstrand, M.D., Riedl, J.T., Konstan, J.A.: Collaborative filtering recommender systems. Found. Trends Hum.-Comput. Interact. 4, 81–173 (2011)
Article Google Scholar
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005)
Article Google Scholar
Lee, M. D., Welsh, M.: An empirical evaluation of models of text document similarity. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society. CogSci2005, Erlbaum (2005) 1254–1259
Google Scholar
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42, 30–37 (2009)
Article Google Scholar
Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 69–77. ACM, New York (2011)
Google Scholar
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, ELRA, pp. 45–50 (2010)
Google Scholar
Jin, X., Mobasher, B., Zhou, Y.: A web recommendation system based on maximum entropy. In: ITCC, pp. 213–218. IEEE Computer Society (2005)
Google Scholar
Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: Bergman, L.D., Tuzhilin, A., Burke, R.D., Felfernig, A., Schmidt-Thieme, L. (eds.) RecSys, pp. 61–68. ACM (2009)
Google Scholar
Shi, Y., Larson, M., Hanjalic, A.: Mining contextual movie similarity with matrix factorization for context-aware recommendation. ACM Trans. Intell. Syst. Technol. 4, 16:1–16:19 (2013)
Google Scholar
Moshfeghi, Y., Piwowarski, B., Jose, J. M.: Handling data sparsity in collaborative filtering using emotion and semantic based features. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2011, pp. 625–634. ACM, New York (2011)
Google Scholar
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41, 1–69 (2009)
Article Google Scholar
Po, L., Sorrentino, S.: Automatic generation of probabilistic relationships for improving schema matching. Inf. Syst. 36, 192–208 (2011)
Article Google Scholar
Sorrentino, S., Bergamaschi, S., Gawinecki, M., Po, L.: Schema label normalization for improving schema matching. Data Knowl. Eng. 69, 1254–1273 (2010)
Article Google Scholar
Bergamaschi, S., Bouquet, P., Giacomuzzi, D., Guerra, F., Po, L., Vincini, M.: An incremental method for the lexical annotation of domain ontologies. Int. J. Semantic Web Inf. Syst. 3, 57–80 (2007)
Article Google Scholar

Download references

Acknowledgements

The system has been developed in collaboration between the database group of the University of Modena and Reggio Emilia and vfree.tv (http://vfree.tv), a young and innovative German company focused on creating new ways of distributing television content and generating an unprecedented watching experience for the user.

We also want to express our gratitude to Tania Farinella, Matteo Abbruzzo and Olga Kryukova, master students in Computer Engineering and Science at the Department of Engineering “Enzo Ferrari” at University of Modena and Reggio Emilia for their contribution in term of implementation of the first and second version of the system (without and with LDA) and for their support during the evaluation of the system. Particular appreciation goes to Serena Sorrentino that helps us to integrate the LDA models in our system.

Author information

Authors and Affiliations

Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, 41125, Modena, Italy
Sonia Bergamaschi & Laura Po

Authors

Sonia Bergamaschi
View author publications
You can also search for this author in PubMed Google Scholar
Laura Po
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laura Po .

Editor information

Editors and Affiliations

University of Paris, Paris, Paris, France
Valérie Monfort
RWTH Aachen University, Aachen, Germany
Karl-Heinz Krempels

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bergamaschi, S., Po, L. (2015). Comparing LDA and LSA Topic Models for Content-Based Movie Recommendation Systems. In: Monfort, V., Krempels, KH. (eds) Web Information Systems and Technologies. WEBIST 2014. Lecture Notes in Business Information Processing, vol 226. Springer, Cham. https://doi.org/10.1007/978-3-319-27030-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-27030-2_16
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27029-6
Online ISBN: 978-3-319-27030-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics