Abstract
With the development of online video platforms, a comment visualization system that inserts dynamic and contextualized comments on a video has become popular in Japan and China, known as \(\textit{DanMu}\), which provides a feeling of “virtual liveness”. However, at the same time, it also brings some bad influences such as goal impediment and information overload, distraction problems, impolite and irrelevant comments. To solve this problem, there are several studies utilizing textual content for low-quality \(\textit{DanMu}\) detection. However, they leave out the visual context and do not consider users’ watching behavior. To this end, in this paper, we propose an end-to-end multimodal classification framework for low-quality \(\textit{DanMu}\) detection. Specifically, we first design a lab-based user study to investigate users’ watching patterns. Based on the discovered fixation patterns, we propose a new fusion method to fuse them with textual context. Moreover, visual content is also considered with a further fusion mechanism. Our model outperforms other baselines in almost all classification metrics in the real-world dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
We will publish the dataset after the acceptance of this paper.
- 4.
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Chen, L., Li, Z., He, W., Cheng, G., Xu, T., Yuan, N.J., Chen, E.: Entity summarization via exploiting description complementarity and salience. IEEE Trans. Neural Netw. Learn. Syst. (2022)
Chen, L., Li, Z., Wang, Y., Xu, T., Wang, Z., Chen, E.: MMEA: entity alignment for multi-modal knowledge graph. In: Li, G., Shen, H.T., Yuan, Y., Wang, X., Liu, H., Zhao, X. (eds.) KSEM 2020. LNCS (LNAI), vol. 12274, pp. 134–147. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55130-8_12
Chen, Y., Gao, Q., Rau, P.L.P.: Watching a movie alone yet together: understanding reasons for watching Danmaku videos. Int. J. Human-Comput. Interact. 33(9), 731–743 (2017)
Choi, J.H., Lee, J.S.: EmbraceNet: a robust deep learning architecture for multimodal classification. Inf. Fusion 51, 259–270 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ehmke, C., Wilson, S.: Identifying web usability problems from eyetracking data (2007)
Faraday, P., Sutcliffe, A.: Making contact points between text and images. In: Proceedings of the Sixth ACM International Conference on Multimedia, pp. 29–37 (1998)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, M., Ge, Y., Chen, E., Liu, Q., Wang, X.: Exploring the emerging type of comment for online videos: DanMU. ACM Trans. Web (TWEB) 12(1), 1–33 (2017)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, Short Papers, pp. 427–431. Association for Computational Linguistics, April 2017
Liao, Z., Xian, Y., Li, J., Zhang, C., Zhao, S.: Time-sync comments denoising via graph convolutional and contextual encoding. Pattern Recogn. Lett. 135, 256–263 (2020)
Liu, L., Suh, A., Wagner, C.: Who is with you? Integrating a play experience into online video watching via Danmaku technology. In: Kurosu, M. (ed.) HCI 2017. LNCS, vol. 10272, pp. 63–73. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58077-7_6
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)
Lv, G., Xu, T., Chen, E., Liu, Q., Zheng, Y.: Reading the videos: temporal labeling for crowdsourced time-sync videos based on semantic embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Lv, G., et al.: Gossiping the videos: an embedding-based generative adversarial framework for time-sync comments generation. In: Yang, Q., Zhou, Z.-H., Gong, Z., Zhang, M.-L., Huang, S.-J. (eds.) PAKDD 2019. LNCS (LNAI), vol. 11441, pp. 412–424. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16142-2_32
Lv, G., et al.: Understanding the users and videos by mining a novel DanMU dataset. IEEE Trans. Big Data. 8, 535–551 (2019)
Niu, H., Li, J., Zhao, Y.: Smartbullets: a cloud-assisted bullet screen filter based on deep learning. In: 2020 29th International Conference on Computer Communications and Networks (ICCCN), pp. 1–2. IEEE (2020)
Nojavanasghari, B., Gopinath, D., Koushik, J., Baltrušaitis, T., Morency, L.P.: Deep multimodal fusion for persuasiveness prediction. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 284–288 (2016)
Rayner, K., Rotello, C.M., Stewart, A.J., Keir, J., Duffy, S.A.: Integrating text and pictorial information: eye movements when looking at print advertisements. J. Exp. Psychol. Appl. 7(3), 219 (2001)
Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)
Wang, J.: How and why people are impolite in DanMU? Internet Pragmat. 4, 295–322 (2021)
Yang, W., Jia, W., Gao, W., Zhou, X., Luo, Y.: Interactive variance attention based online spoiler detection for time-sync comments. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1241–1250 (2019)
Zhou, B., Tian, Y., Sukhbaatar, S., Szlam, A., Fergus, R.: Simple baseline for visual question answering. arXiv preprint arXiv:1512.02167 (2015)
Acknowledgements
This work was partially supported by the grants from the National Natural Science Foundation of China (No.62072423)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., He, W., Xu, T., Chen, E. (2022). Low-Quality \(\textit{DanMu}\) Detection via Eye-Tracking Patterns. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13370. Springer, Cham. https://doi.org/10.1007/978-3-031-10989-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-10989-8_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10988-1
Online ISBN: 978-3-031-10989-8
eBook Packages: Computer ScienceComputer Science (R0)