ABSTRACT
In this paper, we explore different models available to perform topic modelling on subtitles files. Subtitle files are sourced from movies and represent the dialogue being spoken. Applying this to topic modelling would mean trying to obtain the topics regarding the video from only the subtitles. Our novel idea is to test whether it would be feasible to use topic modelling on subtitles to get topics of a movie. While topic modelling as an idea has been used previously in bio-informatics,patent indexing and much more, has not seen any application in this sphere. We extensively search for datasets, preprocess the subtitles files and try Latent Dirichlet Allocation, Hierarchical Dirichlet Processes and Latent Semantic Indexing methods of topic modelling on these documents. These are the top three prominent topic modelling models that are used today. Our results entail what model would work best for subtitle files
- [1] Perego E, Missier F Del, Porta M, Mosconi M. The cognitive effectiveness of subtitle processing. Media Psychol. 2010,13,243-272.Google ScholarCross Ref
- [2] A. Moldovan, R. I. Bot, and G. Wanka, ”Latent semantic indexing for patent documents,” International Journal of Applied Mathematics and Computer Science, vol. 15, 2005.Google Scholar
- [3] Karanikolas, Nikitas N. ”Extractive Summarization Methods-Subtitles and Method Combinations.” RTA-CSIT, pp. 36-40. 2016.Google Scholar
- [4] Onan, Aytug, Serdar Korukoglu, and Hasan Bulut. 2016. ”LDA-based Topic Modelling in Text Sentiment Classification: An Empirical Analysis.” Int. J. Comput. Linguistics Appl. 7 (1):101-119.Google Scholar
- [5] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. ”Latent dirichlet allocation.” the Journal of machine Learning research 3 (2003): 993-1022.Google Scholar
- [6] Teh, Y.W., Jordan, M.I., Beal, M.J. and Blei, D.M., 2006. Hierarchical dirichlet processes. Journal of the american statistical association, 101(476), pp.1566-1581.Google Scholar
- [7] Deerwester, Scott, et al. ”Indexing by latent semantic analysis.” Journal of the American society for information science 41.6 (1990): 391-407.Google ScholarCross Ref
- [8] Wallach, Hanna M., Iain Murray, Ruslan Salakhutdinov, and David Mimno. ”Evaluation methods for topic models.” In Proceedings of the 26th annual international conference on machine learning, pp. 1105-1112. 2009.Google Scholar
- [9]Stevens, Keith, Philip Kegelmeyer, David Andrzejewski, and David Buttler. ”Exploring topic coherence over many models and many topics.” In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp. 952-961. 2012.Google ScholarDigital Library
- [10] Röder M, Both A, Hinneburg A. Exploring the space of topic coherence measures. InProceedings of the eighth ACM international conference on Web search and data mining 2015 Feb 2 (pp. 399-408).Google Scholar
Index Terms
- Exploratory Analysis on Topic Modelling for Video Subtitles
Recommendations
Topic sentiment change analysis
MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognitionPublic opinions on a topic may change over time. Topic Sentiment change analysis is a new research problem consisting of two main components: (a) mining opinions on a certain topic, and (b) detect significant changes of sentiment of the opinions on the ...
Topic analysis for topic-focused multi-document summarization
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementTopic-focused multi-document summarization has been a challenging task because the created summary is required to be biased to the given topic or query. Existing methods consider the given topic as a single coarse unit and then directly incorporate the ...
Topic modelling for qualitative studies
Qualitative studies, such as sociological research, opinion analysis and media studies, can benefit greatly from automated topic mining provided by topic models such as latent Dirichlet allocation LDA. However, examples of qualitative studies that ...
Comments