Time-sync comments denoising via graph convolutional and contextual encoding☆
Introduction
Online video websites, such as YouTube, are booming in recent years and becoming one of the most significant applications on Internet. The interaction between users and video contents are also varying among these websites. Time-Sync Comments (TSC; also called Danmu in Chinese and Komennto in Japanese) is a new textual information in video content that has been applied to many online video websites such as Acfun1 and Bilibili2 in China, Niconico3 in Japan and so forth. User comments are overlaid directly onto the video, synced to a specific playback time. As a new kind of crowd-sourced user review, TSC embodies great potential on video resource management and video information retrieval due to its increasing popularity. Some substantial researches on TSC have emerged since 2014 including video tagging, highlight detection and video key frame recommendation. However, as mentioned in these works, there are lots of low quality comments in TSC in the forms of advertisements, meaningless characters and irrelevant comments. One typical example is illustrated in Fig. 1. For online video websites, these low-quality comments will directly influence user experience in watching videos, and are required to be cleaned or “denoised”. In practice, the cleaning work is manually done by human admins, which cost huge amount of time and effort. Moreover, for researchers, these low quality comments obviously cause negative effects on downstream tasks such as video content understanding and degrade expected performance of many approaches.
To overcome the aforementioned issue, we pioneer in studying the TSC denoising problem in this paper. However, one challenge to identify a low quality comment of TSC is the lack of labels. According to our survey, most TSC-supported video websites only provide naive TSC filtering strategies, such as hiding legacy comments according to the posted time, or directly banning words in TSC based on a blacklist. Bilibili.com developed a system to identify TSC quality according to user behaviors. They collected comments that video uploaders liked as positive samples and comments reported by viewers as negative samples. Then, a classifier was trained on these labeled data in a supervised learning way. However, compared with the number of unlabeled comments, the labeled samples are extremely sparse, which cannot reach satisfying performance. In addition, they do not take into consideration the context information of TSC, which is quite important to evaluate the quality of a comment in the scenario.
The most effective method to improve TSC filtering is to add more labels on TSC comments. However, this is infeasible to manually label low quality comments, considering the amount of comments (usually up to thousands per episode). Thus, we have to make use of limited labeled comments and relations between other TSCs. Recently, a new research direction called graph neural network has attracted wide attention. It shows great potential on many tasks, especially on semi-supervised learning tasks by utilizing rich relational structures, such as node classification and community detection. In this paper, we propose a method called GCCED for graph convolutional and contextual encoding for denoising. Our main contributions are threefold.
- •
We highlight the importance of the TSC denoising problem and this is the first work to study the problem.
- •
We propose an end-to-end classification model that learns to identify low quality TSCs in a semi-supervised way, featured by graph convolutional encoding and contextual encoding.
- •
We experiment on a real-world dataset and extensively evaluate the performance of our approach.
The rest of this paper is organized as follows. We first summarize recent work on TSC-related problems in Section 2. Next, we define related concepts and formulate TSC denoising problem in Section 3. Then, in Section 4, we introduce the details of our model, followed by several experiments conducted on a real-world dataset in Section 5. Finally, we conclude with discussion and future works on TSC denoising problem in Section 6.
Section snippets
Time-sync comments
Time-sync comments (TSC) provide a new source of information regarding the video and have received growing research interests. The first substantial work on TSC is temporal and personalized topic modeling by Wu et al. [1]. As a fine-grained text for videos, TSCs show their great potential on content-based video analysis. For video highlight detection, Xian et al. [2] firstly represented video shots by latent topics of TSCs and proposed a centroid-diffusion algorithm to detect highlights. Lv
Problem formulation
In this paper, our target is to detect low-quality time-sync comments (TSCs) within a season, which is a cycle or set of episodes of a television program. We start by introducing the concepts of season, episode and time-sync comment, followed by the formal definition of the TSC denoising problem.
In general, a season s consists of a set of episodes and each episode e ∈ Es is associated with a set of time-sync comments . Each time-sync comment x ∈ Xe is defined as a
Methodology
We propose an end-to-end neural model called GCCED to identify the quality of TSC for the TSC denoising problem. The overall pipeline of our model is shown in Fig. 2. Specifically, we first preprocess raw TSCs to deal with informal expression problem and learn the representation of TSCs. Then, we propose two representation components to capture the semantic feature of TSCs: graph convolutional network and contextual encoding. Information contained in limited labels can be propagated to more
Experiments
In this section, we extensively evaluate the performance of our model on a real-world TSC dataset. We begin by introducing the dataset and experimental setup, followed by several quantitative and qualitative analysis of our model. The source code and datasets are available online.5
Conclusion and future work
In this study, we propose a graph-based TSC denoising model to identify low quality comments in videos. First, we formally define the TSC denoising problem. Then, we build a word graph for the whole corpus. Graph convolutional encoding and contextual encoding methods are designed and we exploit the relations between TSCs and their corresponding contexts. With the effectiveness of graph convolution and contextual encoding, our model outperforms other state-of-art methods on the dataset of
Declaration of Competing Interest
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Acknowledgments
This study was funded by the National Natural Science Foundation of China (Grant nos. 61702372, 61572365 and 61772371).
References (28)
- et al.
Exploring the emerging type of comment for online videos: danmu
TWEB
(2017) - et al.
Crowdsourced time-sync video tagging using temporal and personalized topic modeling
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2014) - et al.
Video highlight shot extraction with time-sync comment
Proceedings of the 7th International Workshop on Hot Topics in Planet-Scale mObile Computing and Online Social neTworking
(2015) - et al.
Reading the videos: temporal labeling for crowdsourced time-sync videos based on semantic embedding
Thirtieth AAAI Conference on Artificial Intelligence
(2016) - et al.
Personalized key frame recommendation
SIGIR
(2017) - et al.
Bridging video content and comments: synchronized video description with temporal summarization of crowdsourced time-sync comments.
AAAI
(2017) - S. Ma, L. Cui, D. Dai, F. Wei, X. Sun, LiveBot: generating live video comments based on visual and textual contexts...
- et al.
TSCSet: a crowdsourced time-sync comment dataset for exploration of user experience improvement
IUI
(2018) - et al.
Convolutional neural networks on graphs with fast localized spectral filtering
Advances in Neural Information Processing Systems
(2016) - T.N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, arXiv:1609.02907...
Inductive representation learning on large graphs
NeurIPS
Distributed representations of words and phrases and their compositionality
Advances in Neural Information Processing Systems
Cited by (8)
Sentiment classification of time-sync comments: A semi-supervised hierarchical deep learning method
2024, European Journal of Operational ResearchPoisson kernel: Avoiding self-smoothing in graph convolutional networks
2022, Pattern RecognitionCitation Excerpt :Graph neural networks (GNNs) have attracted great attentions for dealing with the non-Euclidean data. Specifically, graph convolutional networks are typical series in GNNs and have been applied in learning graph representations successfully, such as the tasks of node embedding [1–3], graph classification [4,5], social behavior analysis [6,7], chemical and biological classification [8–12], multi-label recognition [13], human-object interaction [14,15], skeleton-based action recognition [16–21], human pose estimation [22], multi-video summarization [23], and time-sync comments denoising [24]. Our main contributions in this paper can be summarized as follows:
Personalized time-sync comment generation based on a multimodal transformer
2024, Multimedia SystemsConstruction and Simulation of the Enterprise Financial Risk Diagnosis Model by Using Dropout and BN to Improve LSTM
2022, Security and Communication NetworksLow-Quality DanMu Detection via Eye-Tracking Patterns
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Video Content Classification Using Time-Sync Comments and Titles
2022, 2022 7th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2022
- ☆
Editor: Umapada Pal