DCA: Diversified Co-attention Towards Informative Live Video Commenting

Zhang, Zhihan; Yin, Zhiyi; Ren, Shuhuai; Li, Xinhang; Li, Shicheng

doi:10.1007/978-3-030-60457-8_1

Zhihan Zhang¹²,
Zhiyi Yin¹²,
Shuhuai Ren¹³,
Xinhang Li¹⁴ &
…
Shicheng Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12431))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

2212 Accesses
5 Citations

Abstract

We focus on the task of Automatic Live Video Commenting (ALVC), which aims to generate real-time video comments with both video frames and other viewers’ comments as inputs. A major challenge in this task is how to properly leverage the rich and diverse information carried by video and text. In this paper, we aim to collect diversified information from video and text for informative comment generation. To achieve this, we propose a Diversified Co-Attention (DCA) model for this task. Our model builds bidirectional interactions between video frames and surrounding comments from multiple perspectives via metric learning, to collect a diversified and informative context for comment generation. We also propose an effective parameter orthogonalization technique to avoid excessive overlap of information learned from different perspectives. Results show that our approach outperforms existing methods in the ALVC task, achieving new state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

PLVCG: A Pretraining Based Model for Live Video Comment Generation

VTC: Improving Video-Text Retrieval with User Comments

Crowdsourced Time-Sync Video Recommendation via Semantic-Aware Neural Collaborative Filtering

Notes

1.
We concatenate all surrounding comments into a single sequence $\textit{\textbf{x}}$.
2.
https://github.com/lancopku/livebot.
3.
https://www.bilibili.com.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)
Google Scholar
Chen, Y., Gao, Q., Rau, P.L.P.: Watching a movie alone yet together: understanding reasons for watching Danmaku videos. Int. J. Hum. Comput. Interact. 33(9), 731–743 (2017)
Article Google Scholar
Cissé, M., Bojanowski, P., Grave, E., Dauphin, Y.N., Usunier, N.: Parseval networks: improving robustness to adversarial examples. In: ICML 2017 (2017)
Google Scholar
Das, A., et al.: Visual dialog. In: CVPR 2017 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016 (2016)
Google Scholar
Hsu, K., Lin, Y., Chuang, Y.: Co-attention CNNs for unsupervised object co-segmentation. In: IJCAI 2018 (2018)
Google Scholar
Jiang, T., et al.: CTGA: graph-based biomedical literature search. In: BIBM 2019 (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015 (2015)
Google Scholar
Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2013)
Article MathSciNet Google Scholar
Li, W., Xu, J., He, Y., Yan, S., Wu, Y., Sun, X.: Coherent comments generation for Chinese articles with a graph-to-sequence model. In: ACL 2019 (2019)
Google Scholar
Li, X., et al.: Beyond RNNs: positional self-attention with co-attention for video question answering. In: AAAI 2019 (2019)
Google Scholar
Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based LSTM for video captioning. World Wide Web 22(2), 621–636 (2018). https://doi.org/10.1007/s11280-018-0531-z
Article Google Scholar
Lin, Z., et al.: A structured self-attentive sentence embedding. In: ICLR 2017 (2017)
Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NeurIPS 2016 (2016)
Google Scholar
Ma, S., Cui, L., Dai, D., Wei, F., Sun, X.: LiveBot: generating live video comments based on visual and textual contexts. In: AAAI 2019 (2019)
Google Scholar
Ma, S., Cui, L., Wei, F., Sun, X.: Unsupervised machine commenting with neural variational topic model. ArXiv preprint arXiv:1809.04960 (2018)
Nguyen, D., Okatani, T.: Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering. In: CVPR 2018 (2018)
Google Scholar
Qin, L., et al.: Automatic article commenting: the task and dataset. In: ACL 2018 (2018)
Google Scholar
Seo, M.J., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: ICLR 2017 (2017)
Google Scholar
Shen, Z., et al.: Weakly supervised dense video captioning. In: CVPR 2017 (2017)
Google Scholar
Tay, Y., Luu, A.T., Hui, S.C.: Multi-pointer co-attention networks for recommendation. In: KDD 2018 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS 2017 (2017)
Google Scholar
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R.J., Darrell, T., Saenko, K.: Sequence to sequence - video to text. In: ICCV 2015 (2015)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR 2015 (2015)
Google Scholar
Wu, W., et al.: Proactive human-machine conversation with explicit conversation goal. In: ACL 2019 (2019)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.J.: Distance metric learning with application to clustering with side-information. In: NeurIPS 2002 (2002)
Google Scholar
Xiong, Y., Dai, B., Lin, D.: Move forward and tell: a progressive generator of video descriptions. In: ECCV 2018 (2018)
Google Scholar
Xu, H., Li, B., Ramanishka, V., Sigal, L., Saenko, K.: Joint event detection and description in continuous video streams. In: WACV 2019 (2019)
Google Scholar
Yang, P., Zhang, Z., Luo, F., Li, L., Huang, C., Sun, X.: Cross-modal commentator: automatic machine commenting based on cross-modal information. In: ACL 2019 (2019)
Google Scholar
Yu, A.W., et al.: QANet: combining local convolution with global self-attention for reading comprehension. In: ICLR 2018 (2018)
Google Scholar
Yu, Z., Yu, J., Cui, Y., Tao, D., Tian, Q.: Deep modular co-attention networks for visual question answering. In: CVPR 2019 (2019)
Google Scholar
Zeng, W., Abuduweili, A., Li, L., Yang, P.: Automatic generation of personalized comment based on user profile. In: ACL 2019 (2019)
Google Scholar
Zhou, H., Zheng, C., Huang, K., Huang, M., Zhu, X.: KdConv: a Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In: ACL 2020 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Engineering and Computer Science, Peking University, Beijing, China
Zhihan Zhang, Zhiyi Yin & Shicheng Li
School of Software Engineering, Huazhong University of Science and Technology, Wuhan, China
Shuhuai Ren
College of Software, Beijing University of Aeronautics and Astronautics, Beijing, China
Xinhang Li

Authors

Zhihan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyi Yin
View author publications
You can also search for this author in PubMed Google Scholar
Shuhuai Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xinhang Li
View author publications
You can also search for this author in PubMed Google Scholar
Shicheng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihan Zhang .

Editor information

Editors and Affiliations

ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada
Xiaodan Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Min Zhang
School of Computer Science and Technology, Soochow University, Suzhou, China
Yu Hong
College of Intelligence and Computing, Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Yin, Z., Ren, S., Li, X., Li, S. (2020). DCA: Diversified Co-attention Towards Informative Live Video Commenting. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-60457-8_1
Published: 02 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)