Skip to main content

LSI Based Mechanism for Educational Videos Retrieval by Transcripts Processing

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12489))

Abstract

Retrieval of relevant educational videos by NLP analysis of their transcripts represents a particular information retrieval problem that is found in many systems. Since various indexing techniques are available, finding the suitable ingredients that build an efficient data analysis pipeline represents a critical task. The paper tackles the problem of retrieving top-N videos that are relevant for a query provided in the Spanish language at Universitat Politècnica de València (UPV). The main elements that are used in the processing pipeline are clustering, LSI modelling and Wikipedia contextualizing along with basic NLP processing techniques such as bag-of-words, lemmatization, singularization and TF-IDF computing. Experimental results on a real-world dataset of 15.386 transcripts show good results, especially compared with currently existing search mechanism which takes into consideration only the title and keywords of the transcripts. Although live application deployment may be further necessary for further relevance evaluation, we conclude that current progress represents a milestone in further building a system that retrieves appropriate videos for the provided query.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.technologyreview.com/s/506351/the-most-important-education-technology-in-200-years.

  2. 2.

    https://www.upv.es/noticias-upv/noticia-10134-flipped-classr-en.html.

  3. 3.

    https://media.upv.es/.

  4. 4.

    https://www.upvx.es.

  5. 5.

    https://www.edx.org/school/upvalenciax.

References

  1. State-of-the-art multilingual lemmatization. https://towardsdatascience.com/state-of-the-art-multilingual-lemmatization-f303e8ff1a8. Accessed 25 Feb 2020

  2. Aker, A., Petrak, J., Sabbah, F.: An extensible multilingual open source lemmatizer. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 40–45. ACL (2017)

    Google Scholar 

  3. Anaya, L.H.: Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers. ERIC (2011)

    Google Scholar 

  4. Basu, S., Yu, Y., Singh, V.K., Zimmermann, R.: Videopedia: lecture video recommendation for educational blogs using topic modeling. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9516, pp. 238–250. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27671-7_20

    Chapter  Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  6. Cvitanic, T., Lee, B., Song, H.I., Fu, K., Rosen, D.: Lda vs lsa: a comparison of two computational text analysis tools for the functional categorization of patents. In: International Conference on Case-Based Reasoning (2016)

    Google Scholar 

  7. Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G., Beck, F.d.L., Leighton-Beck, L.: Improvinginformation-retrieval with latent semantic indexing (1988)

    Google Scholar 

  8. Drachsler, H., Verbert, K., Santos, O.C., Manouselis, N.: Panorama of recommender systems to support learning. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 421–451. Springer, Boston, MA (2015). https://doi.org/10.1007/978-1-4899-7637-6_12

    Chapter  Google Scholar 

  9. Galanopoulos, D., Mezaris, V.: Temporal lecture video fragmentation using word embeddings. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 254–265. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05716-9_21

    Chapter  Google Scholar 

  10. Gutiérrez, L., Keith, B.: A systematic literature review on word embeddings. In: Mejia, J., Muñoz, M., Rocha, Á., Peña, A., Pérez-Cisneros, M. (eds.) CIMPS 2018. AISC, vol. 865, pp. 132–141. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01171-0_12

    Chapter  Google Scholar 

  11. Kastrati, Z., Imran, A.S., Kurti, A.: Integrating word embeddings and document topics with deep learning in a video classification framework. Pattern Recogn. Lett. 128, 85–92 (2019)

    Article  Google Scholar 

  12. Kastrati, Z., Kurti, A., Imran, A.S.: Wet: word embedding-topic distribution vectors for MOOC video lectures dataset. Data Brief 28, 105090 (2020)

    Article  Google Scholar 

  13. Lewis, D.D., Jones, K.S.: Natural language processing for information retrieval. Commun. ACM 39(1), 92–101 (1996)

    Article  Google Scholar 

  14. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  16. Pappano, L.: The year of the MOOC. New York Times 2(12), 2012 (2012)

    Google Scholar 

  17. Perkins, J.: Python 3 Text Processing with NLTK 3 Cookbook. Packt Publishing Ltd. (2014)

    Google Scholar 

  18. Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, Piscataway, vol. 242, pp. 133–142 (2003)

    Google Scholar 

  19. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, May 2010. http://is.muni.cz/publication/884893/en

  20. Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(6), 601–618 (2010)

    Article  Google Scholar 

  21. Springmeyer, P.: Inflector for python (2019). https://pypi.org/project/Inflector/

  22. Tucker, B.: The flipped classroom. Educ. Next 12(1), 82–83 (2012)

    Google Scholar 

  23. Turcu, G., Mihaescu, M.C., Heras, S., Palanca, J., Julián, V.: Video transcript indexing and retrieval procedure. In: SoftCOM 2019, pp. 1–6. IEEE (2019)

    Google Scholar 

  24. Zhu, H., Dong, L., Wei, F., Qin, B., Liu, T.: Transforming wikipedia into augmented data for query-focused summarization. arXiv:1911.03324 (2019)

Download references

Acknowledgement

This work was partially supported by RTI2018-095390-B-C31-AR project of the Spanish government, and by the Generalitat Valenciana (PROMETEO/2018/002) project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marian Cristian Mihăescu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bleoancă, D.I., Heras, S., Palanca, J., Julian, V., Mihăescu, M.C. (2020). LSI Based Mechanism for Educational Videos Retrieval by Transcripts Processing. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12489. Springer, Cham. https://doi.org/10.1007/978-3-030-62362-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62362-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62361-6

  • Online ISBN: 978-3-030-62362-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics