Learning from Audience Intelligence: Dynamic Labeled LDA Model for Time-Sync Commented Video Tagging

Zeng, Zehua; Xue, Cong; Gao, Neng; Wang, Lei; Liu, Zeyi

doi:10.1007/978-3-030-04182-3_48

Learning from Audience Intelligence: Dynamic Labeled LDA Model for Time-Sync Commented Video Tagging

Zehua Zeng^16,17,18,
Cong Xue¹⁸,
Neng Gao^17,18,
Lei Wang¹⁸ &
…
Zeyi Liu¹⁸

Conference paper
First Online: 18 November 2018

2277 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Abstract

With the boom of online video uploading, video tagging becomes an important way for video indexing. However, text-based video tagging methods ignore either genre labels or temporal differences of videos, which makes results defective. Fortunately, a new type of videos called time-sync commented videos which contains large amounts of information commented by the users helps videos tagging. In this paper, we propose a supervised dynamic Latent Dirichlet Allocation model utilizing the variational topics of time-sync comments to extract both genre labels and keywords as tags. We also implement experiments on large scale real-world datasets and the effectiveness of our model are proved both in genre label classification and keyword extraction compared with baseline models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://www.imdb.com/title/tt5311514/.
2.
https://www.imdb.com/.
3.
https://www.viki.com/.
4.
http://www.bilibili.com/.
5.
These words and topic names are manually translated to English by the authors.
6.
A Chinese internet slang means laughing.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. Arch. 3, 993–1022 (2003)
MATH Google Scholar
Chakrabarti, D., Punera, K.: Event summarization using tweets. ICWSM 11, 66–73 (2011)
Google Scholar
Chen, X., Zhang, Y., Ai, Q., Xu, H., Yan, J., Qin, Z.: Personalized key frame recommendation. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 315–324. ACM (2017)
Google Scholar
Chiu, C.Y., Lin, P.C., Li, S.Y., Tsai, T.H., Tsai, Y.L.: Tagging webcast text in baseball videos by video segmentation and text alignment. IEEE Trans. Circuits Syst. Video Technol. 22(7), 999–1013 (2012)
Article Google Scholar
Lv, G., Xu, T., Chen, E., Liu, Q., Zheng, Y.: Reading the videos: temporal labeling for crowdsourced time-sync videos based on semantic embedding. In: AAAI, pp. 3000–3006 (2016)
Google Scholar
Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Information Processing Systems, pp. 121–128 (2008)
Google Scholar
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 248–256. Association for Computational Linguistics (2009)
Google Scholar
Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statistical topic models for multi-label document classification. Mach. Learn. 88(1–2), 157–208 (2012)
Article MathSciNet Google Scholar
Siersdorfer, S., San Pedro, J., Sanderson, M.: Automatic video tagging using content redundancy. In: Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 395–402. ACM (2009)
Google Scholar
Ulges, A., Schulze, C., Koch, M., Breuel, T.M.: Learning automatic concept detectors from online video. Comput. Vis. Image Underst. 114(4), 429–438 (2010)
Article Google Scholar
Wang, Y., Sabzmeydani, P., Mori, G.: Semi-latent Dirichlet allocation: a hierarchical model for human action recognition. In: Elgammal, A., Rosenhahn, B., Klette, R. (eds.) HuMo 2007. LNCS, vol. 4814, pp. 240–254. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75703-0_17
Chapter Google Scholar
Wang, Z., Yu, J., He, Y., Guan, T.: Affection arousal based highlight extraction for soccer video. Multimed. Tools Appl. 73(1), 519–546 (2014)
Article Google Scholar
Wu, B., Zhong, E., Tan, B., Horner, A., Yang, Q.: Crowdsourced time-sync video tagging using temporal and personalized topic modeling. In: 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 721–730. ACM (2014)
Google Scholar
Xu, C., Wang, J., Wan, K., Li, Y., Duan, L.: Live sports event detection based on broadcast video and web-casting text. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 221–230. ACM (2006)
Google Scholar
Xu, L., Zhang, C.: Bridging video content and comments: Synchronized video description with temporal summarization of crowdsourced time-sync comments. In: AAAI, pp. 1611–1617 (2017)
Google Scholar
Yang, W., Ruan, N., Gao, W., Wang, K., Ran, W., Jia, W.: Crowdsourced time-sync video tagging using semantic association graph. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 547–552. IEEE (2017)
Google Scholar
Yoshii, K., Goto, M.: Musiccommentator: Generating comments synchronized with musical audio signals by a joint probabilistic model of acoustic and textual features. In: ICEC (2009)
Google Scholar
Zhu, J., Ahmed, A., Xing, E.P.: Medlda: maximum margin supervised topic models for regression and classification. In: Proceedings of the 26th annual international conference on machine learning. pp. 1257–1264. ACM (2009)
Google Scholar

Download references

Acknowledgments

This work is partially supported by National Key Research and Development Program of China.

Author information

Authors and Affiliations

School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Zehua Zeng
State Key Laboratory of Information Security, Chinese Academy of Sciences, Beijing, China
Zehua Zeng & Neng Gao
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Zehua Zeng, Cong Xue, Neng Gao, Lei Wang & Zeyi Liu

Authors

Zehua Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Cong Xue
View author publications
You can also search for this author in PubMed Google Scholar
Neng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zeyi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cong Xue .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, Z., Xue, C., Gao, N., Wang, L., Liu, Z. (2018). Learning from Audience Intelligence: Dynamic Labeled LDA Model for Time-Sync Commented Video Tagging. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-04182-3_48
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics