Skip to main content

Automatic Videography Generation fromĀ Audio Tracks

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13982))

Included in the following conference series:

  • 1517 Accesses

Abstract

This paper describes a prototype of an automatic videography generation system. Given any YouTube video of a song, a set of images are retrieved corresponding to each line of the song which are automatically inserted and aligned into a video track.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://tinyurl.com/d5x32aet.

References

  1. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257ā€“289 (2020). https://doi.org/10.1016/j.ins.2019.09.013. https://www.sciencedirect.com/science/article/pii/S0020025519308588

  2. Gupta, C., Yilmaz, E., Li, H.: AutolyrixAlign (2020). https://github.com/chitralekha18/AutoLyrixAlign. Accessed 17 Feb 2022

  3. Depoix, J.: youtube-transcript-api (2021). https://pypi.org/project/youtube-transcript-api/. Accessed 17 Feb 2022

  4. Django: the web framework for perfectionists with deadlines (2005). https://www.djangoproject.com/. Accessed 17 Feb 2022

  5. Doumanis, I., Economou, D., Sim, G.R., Porter, S.: The impact of multimodal collaborative virtual environments on learning: a gamified online debate. Comput. Educ. 130, 121ā€“138 (2019). https://doi.org/10.1016/j.compedu.2018.09.017. https://www.sciencedirect.com/science/article/pii/S0360131518302537

  6. Garcia, R.: youtube-dl (2021). https://github.com/ytdl-org/youtube-dl. Accessed 17 Feb 2022

  7. Gupta, C., Yılmaz, E., Li, H.: Automatic lyrics alignment and transcription in polyphonic music: does background music help? In: ICASSP 2020ā€“2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 496ā€“500 (2020). https://doi.org/10.1109/ICASSP40776.2020.9054567

  8. Heinisuo, O.P.: opencv-python (2012). https://pypi.org/project/opencv-python/. Accessed 17 Feb 2022

  9. Huggins, J.: selenium (2004). https://pypi.org/project/selenium/. Accessed 17 Feb 2022

  10. Laboratory of Artificial Intelligence and Decision Support: yake (2021). https://github.com/LIAAD/yake. Accessed 17 Feb 2022

  11. Lee, M.: pytesseract (2021). https://pypi.org/project/pytesseract/. Accessed 17 Feb 2022

  12. Liao, L., Long, L.H., Zhang, Z., Huang, M., Chua, T.S.: MMConv: an environment for multimodal conversational search across multiple domains. In: Proceedings of the SIGIR 2021, pp. 675ā€“684 (2021). https://doi.org/10.1145/3404835.3462970

  13. Liikkanen, L.A., Salovaara, A.: Music on YouTube: user engagement with traditional, user-appropriated and derivative videos. Comput. Hum. Behav. 50, 108ā€“124 (2015). https://doi.org/10.1016/j.chb.2015.01.067. https://www.sciencedirect.com/science/article/pii/S0747563215000953

  14. Daily Motion (2005). https://dailymotion.com. Accessed 17 Feb 2022

  15. OpenAI: CLIP: Connecting Text and Images (2021). https://openai.com/blog/clip/. Accessed 14 Jan 2023

  16. RHINO: America - a horse with no name (official audio) (2019). https://www.youtube.com/watch?v=na47wMFfQCo. Accessed 19 Oct 2022

  17. Richardson, L.: beautifulsoup4 (2021). https://pypi.org/project/beautifulsoup4/. Accessed 17 Feb 2022

  18. Ghose, R., Dahlin, T.F., Ficano, N.: pytube (2022). https://github.com/pytube/pytube. Accessed 17 Feb 2022

  19. Sen, P., Ganguly, D., Jones, G.J.F.: Tempo-lexical context driven word embedding for cross-session search task extraction. In: NAACL-HLT, pp. 283ā€“292. Association for Computational Linguistics (2018)

    Google ScholarĀ 

  20. Sen, P., Ganguly, D., Jones, G.J.F.: I know what you need: investigating document retrieval effectiveness with partial session contexts. ACM Trans. Inf. Syst. 40(3), 53:1ā€“53:30 (2022)

    Google ScholarĀ 

  21. Lehman, T., Zechory, I., Moghadam, M.: Genius (2009). https://pypi.org/project/lyricsgenius/. Accessed 17 Feb 2022

  22. Verysweetify: A horse with no name - America (lyrics) (2012). https://www.youtube.com/watch?v=CpSdePGgVyQ. Accessed 19 Oct 2022

  23. YouTube (2005). https://www.youtube.com/. Accessed 17 Feb 2022

  24. Zulko: moviepy (2017). https://pypi.org/project/moviepy/. Accessed 17 Feb 2022

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debasis Ganguly .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ganguly, D., Parker, A., Aji, S. (2023). Automatic Videography Generation fromĀ Audio Tracks. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13982. Springer, Cham. https://doi.org/10.1007/978-3-031-28241-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28241-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28240-9

  • Online ISBN: 978-3-031-28241-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics