Abstract
This paper describes a prototype of an automatic videography generation system. Given any YouTube video of a song, a set of images are retrieved corresponding to each line of the song which are automatically inserted and aligned into a video track.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257ā289 (2020). https://doi.org/10.1016/j.ins.2019.09.013. https://www.sciencedirect.com/science/article/pii/S0020025519308588
Gupta, C., Yilmaz, E., Li, H.: AutolyrixAlign (2020). https://github.com/chitralekha18/AutoLyrixAlign. Accessed 17 Feb 2022
Depoix, J.: youtube-transcript-api (2021). https://pypi.org/project/youtube-transcript-api/. Accessed 17 Feb 2022
Django: the web framework for perfectionists with deadlines (2005). https://www.djangoproject.com/. Accessed 17 Feb 2022
Doumanis, I., Economou, D., Sim, G.R., Porter, S.: The impact of multimodal collaborative virtual environments on learning: a gamified online debate. Comput. Educ. 130, 121ā138 (2019). https://doi.org/10.1016/j.compedu.2018.09.017. https://www.sciencedirect.com/science/article/pii/S0360131518302537
Garcia, R.: youtube-dl (2021). https://github.com/ytdl-org/youtube-dl. Accessed 17 Feb 2022
Gupta, C., Yılmaz, E., Li, H.: Automatic lyrics alignment and transcription in polyphonic music: does background music help? In: ICASSP 2020ā2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 496ā500 (2020). https://doi.org/10.1109/ICASSP40776.2020.9054567
Heinisuo, O.P.: opencv-python (2012). https://pypi.org/project/opencv-python/. Accessed 17 Feb 2022
Huggins, J.: selenium (2004). https://pypi.org/project/selenium/. Accessed 17 Feb 2022
Laboratory of Artificial Intelligence and Decision Support: yake (2021). https://github.com/LIAAD/yake. Accessed 17 Feb 2022
Lee, M.: pytesseract (2021). https://pypi.org/project/pytesseract/. Accessed 17 Feb 2022
Liao, L., Long, L.H., Zhang, Z., Huang, M., Chua, T.S.: MMConv: an environment for multimodal conversational search across multiple domains. In: Proceedings of the SIGIR 2021, pp. 675ā684 (2021). https://doi.org/10.1145/3404835.3462970
Liikkanen, L.A., Salovaara, A.: Music on YouTube: user engagement with traditional, user-appropriated and derivative videos. Comput. Hum. Behav. 50, 108ā124 (2015). https://doi.org/10.1016/j.chb.2015.01.067. https://www.sciencedirect.com/science/article/pii/S0747563215000953
Daily Motion (2005). https://dailymotion.com. Accessed 17 Feb 2022
OpenAI: CLIP: Connecting Text and Images (2021). https://openai.com/blog/clip/. Accessed 14 Jan 2023
RHINO: America - a horse with no name (official audio) (2019). https://www.youtube.com/watch?v=na47wMFfQCo. Accessed 19 Oct 2022
Richardson, L.: beautifulsoup4 (2021). https://pypi.org/project/beautifulsoup4/. Accessed 17 Feb 2022
Ghose, R., Dahlin, T.F., Ficano, N.: pytube (2022). https://github.com/pytube/pytube. Accessed 17 Feb 2022
Sen, P., Ganguly, D., Jones, G.J.F.: Tempo-lexical context driven word embedding for cross-session search task extraction. In: NAACL-HLT, pp. 283ā292. Association for Computational Linguistics (2018)
Sen, P., Ganguly, D., Jones, G.J.F.: I know what you need: investigating document retrieval effectiveness with partial session contexts. ACM Trans. Inf. Syst. 40(3), 53:1ā53:30 (2022)
Lehman, T., Zechory, I., Moghadam, M.: Genius (2009). https://pypi.org/project/lyricsgenius/. Accessed 17 Feb 2022
Verysweetify: A horse with no name - America (lyrics) (2012). https://www.youtube.com/watch?v=CpSdePGgVyQ. Accessed 19 Oct 2022
YouTube (2005). https://www.youtube.com/. Accessed 17 Feb 2022
Zulko: moviepy (2017). https://pypi.org/project/moviepy/. Accessed 17 Feb 2022
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ganguly, D., Parker, A., Aji, S. (2023). Automatic Videography Generation fromĀ Audio Tracks. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13982. Springer, Cham. https://doi.org/10.1007/978-3-031-28241-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-28241-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28240-9
Online ISBN: 978-3-031-28241-6
eBook Packages: Computer ScienceComputer Science (R0)