Abstract
We focus on the task of Automatic Live Video Commenting (ALVC), which aims to generate real-time video comments with both video frames and other viewers’ comments as inputs. A major challenge in this task is how to properly leverage the rich and diverse information carried by video and text. In this paper, we aim to collect diversified information from video and text for informative comment generation. To achieve this, we propose a Diversified Co-Attention (DCA) model for this task. Our model builds bidirectional interactions between video frames and surrounding comments from multiple perspectives via metric learning, to collect a diversified and informative context for comment generation. We also propose an effective parameter orthogonalization technique to avoid excessive overlap of information learned from different perspectives. Results show that our approach outperforms existing methods in the ALVC task, achieving new state-of-the-art results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We concatenate all surrounding comments into a single sequence \(\textit{\textbf{x}}\).
- 2.
- 3.
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)
Chen, Y., Gao, Q., Rau, P.L.P.: Watching a movie alone yet together: understanding reasons for watching Danmaku videos. Int. J. Hum. Comput. Interact. 33(9), 731–743 (2017)
Cissé, M., Bojanowski, P., Grave, E., Dauphin, Y.N., Usunier, N.: Parseval networks: improving robustness to adversarial examples. In: ICML 2017 (2017)
Das, A., et al.: Visual dialog. In: CVPR 2017 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016 (2016)
Hsu, K., Lin, Y., Chuang, Y.: Co-attention CNNs for unsupervised object co-segmentation. In: IJCAI 2018 (2018)
Jiang, T., et al.: CTGA: graph-based biomedical literature search. In: BIBM 2019 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015 (2015)
Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2013)
Li, W., Xu, J., He, Y., Yan, S., Wu, Y., Sun, X.: Coherent comments generation for Chinese articles with a graph-to-sequence model. In: ACL 2019 (2019)
Li, X., et al.: Beyond RNNs: positional self-attention with co-attention for video question answering. In: AAAI 2019 (2019)
Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based LSTM for video captioning. World Wide Web 22(2), 621–636 (2018). https://doi.org/10.1007/s11280-018-0531-z
Lin, Z., et al.: A structured self-attentive sentence embedding. In: ICLR 2017 (2017)
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NeurIPS 2016 (2016)
Ma, S., Cui, L., Dai, D., Wei, F., Sun, X.: LiveBot: generating live video comments based on visual and textual contexts. In: AAAI 2019 (2019)
Ma, S., Cui, L., Wei, F., Sun, X.: Unsupervised machine commenting with neural variational topic model. ArXiv preprint arXiv:1809.04960 (2018)
Nguyen, D., Okatani, T.: Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In: CVPR 2018 (2018)
Qin, L., et al.: Automatic article commenting: the task and dataset. In: ACL 2018 (2018)
Seo, M.J., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: ICLR 2017 (2017)
Shen, Z., et al.: Weakly supervised dense video captioning. In: CVPR 2017 (2017)
Tay, Y., Luu, A.T., Hui, S.C.: Multi-pointer co-attention networks for recommendation. In: KDD 2018 (2018)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS 2017 (2017)
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R.J., Darrell, T., Saenko, K.: Sequence to sequence - video to text. In: ICCV 2015 (2015)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR 2015 (2015)
Wu, W., et al.: Proactive human-machine conversation with explicit conversation goal. In: ACL 2019 (2019)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance metric learning with application to clustering with side-information. In: NeurIPS 2002 (2002)
Xiong, Y., Dai, B., Lin, D.: Move forward and tell: a progressive generator of video descriptions. In: ECCV 2018 (2018)
Xu, H., Li, B., Ramanishka, V., Sigal, L., Saenko, K.: Joint event detection and description in continuous video streams. In: WACV 2019 (2019)
Yang, P., Zhang, Z., Luo, F., Li, L., Huang, C., Sun, X.: Cross-modal commentator: automatic machine commenting based on cross-modal information. In: ACL 2019 (2019)
Yu, A.W., et al.: QANet: combining local convolution with global self-attention for reading comprehension. In: ICLR 2018 (2018)
Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: CVPR 2019 (2019)
Zeng, W., Abuduweili, A., Li, L., Yang, P.: Automatic generation of personalized comment based on user profile. In: ACL 2019 (2019)
Zhou, H., Zheng, C., Huang, K., Huang, M., Zhu, X.: KdConv: a Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In: ACL 2020 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Z., Yin, Z., Ren, S., Li, X., Li, S. (2020). DCA: Diversified Co-attention Towards Informative Live Video Commenting. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-60457-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)