Youtube Engagement Analytics via Deep Multimodal Fusion Model

Nguyen-Thi, Minh-Vuong; Le, Huy; Le, Truong; Le, Tung; Nguyen, Huy Tien

doi:10.1007/978-3-031-26431-3_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13763))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

328 Accesses

Abstract

As popularity of video-sharing platforms, content creators have a high demand to produce content which attracts the large amount of viewers. There are many factors related to engagement: visual, sound, transcript, title etc. To take into account of these factors, we propose a deep multi-modal hybrid fusion for YouTube video engagement. Our architecture allows us to be easy to adapt state-of-the-art models for a particular task or variety of modalities, then fuse them to obtain more information aim to classify better. A proposed residual block as a simple neuron architecture search is used to get better features extracted. Our work is at the forefront of classifying YouTube video engagement and promises to broaden the research community’s reach. Through detailed experiments, we proved that the model is the state-of-the-art in problem YouTube video engagement analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wattenhofer, M., Wattenhofer, R., Zhu, Z. (eds.): The YouTube Social Network (2012)
Google Scholar
EnTube: A Dataset for YouTube Video Engagement Analytics (2022)
Google Scholar
Wiebe, E.N., et al.: Measuring engagement in video game-based environments: investigation of the user engagement scale. Comput. Hum. Behav. 32, 123–132 (2014)
Article Google Scholar
Fox, C.M., Brockmyer, J.H.: The development of the game engagement questionnaire: a measure of engagement in video game playing: response to reviews. Interact. Comput. 25(4), 290–293 (2013)
Article Google Scholar
Wu, S., Rizoiu, M.-A., Xie, L.: Beyond views: measuring and predicting engagement in online videos. In: Twelfth International AAAI Conference on Web and Social Media (2018)
Google Scholar
Bulathwela, S., et al.: Predicting engagement in video lectures. arXiv preprint. arXiv:2006.00592 (2020)
Chaturvedi, I., et al.: Predicting video engagement using heterogeneous DeepWalk. Neurocomputing 465, 228–237 (2021)
Article Google Scholar
Aguiar, E., Nagrecha, S., Chawla, N.V.: Predicting online video engagement using clickstreams. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10. IEEE (2015)
Google Scholar
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Yu, Z., Shi, N.: A multi-modal deep learning model for video thumbnail selection. arXiv preprint. arXiv:2101.00073 (2020)
Joshi, G., Walambe, R., Kotecha, K.: A review on explainability in multimodal deep neural nets. IEEE Access 9, 59800–59821 (2021)
Article Google Scholar
Nguyen, D.Q., Nguyen, A.T.: PhoBERT: pre-trained language models for Vietnamese. arXiv preprint. arXiv:2003.00744 (2020)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR, pp. 6105–6114 (2019)
Google Scholar
Hershey, S.: CNN architectures for large-scale audio classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135. IEEE (2017)
Google Scholar
Feichtenhofer, C., et al.: Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(1), 1997–2017 (2019)
MathSciNet MATH Google Scholar

Download references

Acknowledgment

This research is supported by research funding from Faculty of Information Technology, University of Science, Vietnam National University - Ho Chi Minh City, and Gender & Diversity Project - APNIC Foundation.

Author information

Authors and Affiliations

Faculty of Information Technology, University of Science, Ho Chi Minh, Vietnam
Minh-Vuong Nguyen-Thi, Huy Le, Truong Le, Tung Le & Huy Tien Nguyen
Vietnam National University, Ho Chi Minh City, Vietnam
Minh-Vuong Nguyen-Thi, Huy Le, Truong Le, Tung Le & Huy Tien Nguyen

Authors

Minh-Vuong Nguyen-Thi
View author publications
You can also search for this author in PubMed Google Scholar
Huy Le
View author publications
You can also search for this author in PubMed Google Scholar
Truong Le
View author publications
You can also search for this author in PubMed Google Scholar
Tung Le
View author publications
You can also search for this author in PubMed Google Scholar
Huy Tien Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huy Tien Nguyen .

Editor information

Editors and Affiliations

Xiamen University Malaysia, Sepang, Malaysia
Han Wang
Singapore Institute of Manufacturing Technology, Singapore, Singapore
Wei Lin
Charles Sturt University, Bathurst, NSW, Australia
Paul Manoranjan
Minjiang University, Fuzhou, China
Guobao Xiao
Yau Lee Holdings Ltd., Hong Kong, Hong Kong
Kap Luk Chan
Tsinghua University, Beijing, China
Xiaonan Wang
Nanyang Technological University, Singapore, Singapore
Guiju Ping
Nanyang Technological University, Singapore, Singapore
Haoge Jiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen-Thi, MV., Le, H., Le, T., Le, T., Nguyen, H.T. (2023). Youtube Engagement Analytics via Deep Multimodal Fusion Model. In: Wang, H., et al. Image and Video Technology. PSIVT 2022. Lecture Notes in Computer Science, vol 13763. Springer, Cham. https://doi.org/10.1007/978-3-031-26431-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-26431-3_5
Published: 28 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26430-6
Online ISBN: 978-3-031-26431-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Youtube Engagement Analytics via Deep Multimodal Fusion Model