Abstract
As popularity of video-sharing platforms, content creators have a high demand to produce content which attracts the large amount of viewers. There are many factors related to engagement: visual, sound, transcript, title etc. To take into account of these factors, we propose a deep multi-modal hybrid fusion for YouTube video engagement. Our architecture allows us to be easy to adapt state-of-the-art models for a particular task or variety of modalities, then fuse them to obtain more information aim to classify better. A proposed residual block as a simple neuron architecture search is used to get better features extracted. Our work is at the forefront of classifying YouTube video engagement and promises to broaden the research community’s reach. Through detailed experiments, we proved that the model is the state-of-the-art in problem YouTube video engagement analytics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wattenhofer, M., Wattenhofer, R., Zhu, Z. (eds.): The YouTube Social Network (2012)
EnTube: A Dataset for YouTube Video Engagement Analytics (2022)
Wiebe, E.N., et al.: Measuring engagement in video game-based environments: investigation of the user engagement scale. Comput. Hum. Behav. 32, 123–132 (2014)
Fox, C.M., Brockmyer, J.H.: The development of the game engagement questionnaire: a measure of engagement in video game playing: response to reviews. Interact. Comput. 25(4), 290–293 (2013)
Wu, S., Rizoiu, M.-A., Xie, L.: Beyond views: measuring and predicting engagement in online videos. In: Twelfth International AAAI Conference on Web and Social Media (2018)
Bulathwela, S., et al.: Predicting engagement in video lectures. arXiv preprint. arXiv:2006.00592 (2020)
Chaturvedi, I., et al.: Predicting video engagement using heterogeneous DeepWalk. Neurocomputing 465, 228–237 (2021)
Aguiar, E., Nagrecha, S., Chawla, N.V.: Predicting online video engagement using clickstreams. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10. IEEE (2015)
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Yu, Z., Shi, N.: A multi-modal deep learning model for video thumbnail selection. arXiv preprint. arXiv:2101.00073 (2020)
Joshi, G., Walambe, R., Kotecha, K.: A review on explainability in multimodal deep neural nets. IEEE Access 9, 59800–59821 (2021)
Nguyen, D.Q., Nguyen, A.T.: PhoBERT: pre-trained language models for Vietnamese. arXiv preprint. arXiv:2003.00744 (2020)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR, pp. 6105–6114 (2019)
Hershey, S.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)
Feichtenhofer, C., et al.: Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(1), 1997–2017 (2019)
Acknowledgment
This research is supported by research funding from Faculty of Information Technology, University of Science, Vietnam National University - Ho Chi Minh City, and Gender & Diversity Project - APNIC Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen-Thi, MV., Le, H., Le, T., Le, T., Nguyen, H.T. (2023). Youtube Engagement Analytics via Deep Multimodal Fusion Model. In: Wang, H., et al. Image and Video Technology. PSIVT 2022. Lecture Notes in Computer Science, vol 13763. Springer, Cham. https://doi.org/10.1007/978-3-031-26431-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-26431-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26430-6
Online ISBN: 978-3-031-26431-3
eBook Packages: Computer ScienceComputer Science (R0)