Automatic Videography Generation from Audio Tracks

Ganguly, Debasis; Parker, Andrew; Aji, Stergious

doi:10.1007/978-3-031-28241-6_27

Debasis Ganguly¹⁶,
Andrew Parker¹⁶ &
Stergious Aji¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13982))

Included in the following conference series:

European Conference on Information Retrieval

1517 Accesses

Abstract

This paper describes a prototype of an automatic videography generation system. Given any YouTube video of a song, a set of images are retrieved corresponding to each line of the song which are automatically inserted and aligned into a video track.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://tinyurl.com/d5x32aet.

References

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020). https://doi.org/10.1016/j.ins.2019.09.013. https://www.sciencedirect.com/science/article/pii/S0020025519308588
Gupta, C., Yilmaz, E., Li, H.: AutolyrixAlign (2020). https://github.com/chitralekha18/AutoLyrixAlign. Accessed 17 Feb 2022
Depoix, J.: youtube-transcript-api (2021). https://pypi.org/project/youtube-transcript-api/. Accessed 17 Feb 2022
Django: the web framework for perfectionists with deadlines (2005). https://www.djangoproject.com/. Accessed 17 Feb 2022
Doumanis, I., Economou, D., Sim, G.R., Porter, S.: The impact of multimodal collaborative virtual environments on learning: a gamified online debate. Comput. Educ. 130, 121–138 (2019). https://doi.org/10.1016/j.compedu.2018.09.017. https://www.sciencedirect.com/science/article/pii/S0360131518302537
Garcia, R.: youtube-dl (2021). https://github.com/ytdl-org/youtube-dl. Accessed 17 Feb 2022
Gupta, C., Yılmaz, E., Li, H.: Automatic lyrics alignment and transcription in polyphonic music: does background music help? In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 496–500 (2020). https://doi.org/10.1109/ICASSP40776.2020.9054567
Heinisuo, O.P.: opencv-python (2012). https://pypi.org/project/opencv-python/. Accessed 17 Feb 2022
Huggins, J.: selenium (2004). https://pypi.org/project/selenium/. Accessed 17 Feb 2022
Laboratory of Artificial Intelligence and Decision Support: yake (2021). https://github.com/LIAAD/yake. Accessed 17 Feb 2022
Lee, M.: pytesseract (2021). https://pypi.org/project/pytesseract/. Accessed 17 Feb 2022
Liao, L., Long, L.H., Zhang, Z., Huang, M., Chua, T.S.: MMConv: an environment for multimodal conversational search across multiple domains. In: Proceedings of the SIGIR 2021, pp. 675–684 (2021). https://doi.org/10.1145/3404835.3462970
Liikkanen, L.A., Salovaara, A.: Music on YouTube: user engagement with traditional, user-appropriated and derivative videos. Comput. Hum. Behav. 50, 108–124 (2015). https://doi.org/10.1016/j.chb.2015.01.067. https://www.sciencedirect.com/science/article/pii/S0747563215000953
Daily Motion (2005). https://dailymotion.com. Accessed 17 Feb 2022
OpenAI: CLIP: Connecting Text and Images (2021). https://openai.com/blog/clip/. Accessed 14 Jan 2023
RHINO: America - a horse with no name (official audio) (2019). https://www.youtube.com/watch?v=na47wMFfQCo. Accessed 19 Oct 2022
Richardson, L.: beautifulsoup4 (2021). https://pypi.org/project/beautifulsoup4/. Accessed 17 Feb 2022
Ghose, R., Dahlin, T.F., Ficano, N.: pytube (2022). https://github.com/pytube/pytube. Accessed 17 Feb 2022
Sen, P., Ganguly, D., Jones, G.J.F.: Tempo-lexical context driven word embedding for cross-session search task extraction. In: NAACL-HLT, pp. 283–292. Association for Computational Linguistics (2018)
Google Scholar
Sen, P., Ganguly, D., Jones, G.J.F.: I know what you need: investigating document retrieval effectiveness with partial session contexts. ACM Trans. Inf. Syst. 40(3), 53:1–53:30 (2022)
Google Scholar
Lehman, T., Zechory, I., Moghadam, M.: Genius (2009). https://pypi.org/project/lyricsgenius/. Accessed 17 Feb 2022
Verysweetify: A horse with no name - America (lyrics) (2012). https://www.youtube.com/watch?v=CpSdePGgVyQ. Accessed 19 Oct 2022
YouTube (2005). https://www.youtube.com/. Accessed 17 Feb 2022
Zulko: moviepy (2017). https://pypi.org/project/moviepy/. Accessed 17 Feb 2022

Download references

Author information

Authors and Affiliations

University of Glasgow, Glasgow, Scotland
Debasis Ganguly, Andrew Parker & Stergious Aji

Authors

Debasis Ganguly
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Parker
View author publications
You can also search for this author in PubMed Google Scholar
Stergious Aji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debasis Ganguly .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Université Grenoble-Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Università della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
University of Tsukuba, Ibaraki, Japan
Hideo Joho
Dublin City University, Dublin, Ireland
Brian Davis
Dublin City University, Dublin, Ireland
Cathal Gurrin
Universität Regensburg, Regensburg, Germany
Udo Kruschwitz
Dublin City University, Dublin, Ireland
Annalina Caputo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ganguly, D., Parker, A., Aji, S. (2023). Automatic Videography Generation from Audio Tracks. In: Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science, vol 13982. Springer, Cham. https://doi.org/10.1007/978-3-031-28241-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-28241-6_27
Published: 16 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28240-9
Online ISBN: 978-3-031-28241-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Videography Generation from Audio Tracks