skip to main content
10.1145/3483845.3483878acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccrisConference Proceedingsconference-collections
research-article

Exploratory Analysis on Topic Modelling for Video Subtitles

Published:22 October 2021Publication History

ABSTRACT

In this paper, we explore different models available to perform topic modelling on subtitles files. Subtitle files are sourced from movies and represent the dialogue being spoken. Applying this to topic modelling would mean trying to obtain the topics regarding the video from only the subtitles. Our novel idea is to test whether it would be feasible to use topic modelling on subtitles to get topics of a movie. While topic modelling as an idea has been used previously in bio-informatics,patent indexing and much more, has not seen any application in this sphere. We extensively search for datasets, preprocess the subtitles files and try Latent Dirichlet Allocation, Hierarchical Dirichlet Processes and Latent Semantic Indexing methods of topic modelling on these documents. These are the top three prominent topic modelling models that are used today. Our results entail what model would work best for subtitle files

References

  1. [1] Perego E, Missier F Del, Porta M, Mosconi M. The cognitive effectiveness of subtitle processing. Media Psychol. 2010,13,243-272.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] A. Moldovan, R. I. Bot, and G. Wanka, ”Latent semantic indexing for patent documents,” International Journal of Applied Mathematics and Computer Science, vol. 15, 2005.Google ScholarGoogle Scholar
  3. [3] Karanikolas, Nikitas N. ”Extractive Summarization Methods-Subtitles and Method Combinations.” RTA-CSIT, pp. 36-40. 2016.Google ScholarGoogle Scholar
  4. [4] Onan, Aytug, Serdar Korukoglu, and Hasan Bulut. 2016. ”LDA-based Topic Modelling in Text Sentiment Classification: An Empirical Analysis.” Int. J. Comput. Linguistics Appl. 7 (1):101-119.Google ScholarGoogle Scholar
  5. [5] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. ”Latent dirichlet allocation.” the Journal of machine Learning research 3 (2003): 993-1022.Google ScholarGoogle Scholar
  6. [6] Teh, Y.W., Jordan, M.I., Beal, M.J. and Blei, D.M., 2006. Hierarchical dirichlet processes. Journal of the american statistical association, 101(476), pp.1566-1581.Google ScholarGoogle Scholar
  7. [7] Deerwester, Scott, et al. ”Indexing by latent semantic analysis.” Journal of the American society for information science 41.6 (1990): 391-407.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Wallach, Hanna M., Iain Murray, Ruslan Salakhutdinov, and David Mimno. ”Evaluation methods for topic models.” In Proceedings of the 26th annual international conference on machine learning, pp. 1105-1112. 2009.Google ScholarGoogle Scholar
  9. [9]Stevens, Keith, Philip Kegelmeyer, David Andrzejewski, and David Buttler. ”Exploring topic coherence over many models and many topics.” In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp. 952-961. 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Röder M, Both A, Hinneburg A. Exploring the space of topic coherence measures. InProceedings of the eighth ACM international conference on Web search and data mining 2015 Feb 2 (pp. 399-408).Google ScholarGoogle Scholar

Index Terms

  1. Exploratory Analysis on Topic Modelling for Video Subtitles
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          CCRIS '21: Proceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System
          August 2021
          278 pages
          ISBN:9781450390453
          DOI:10.1145/3483845

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 October 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)17
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format