skip to main content
research-article

Contextual Video Recommendation by Multimodal Relevance and User Feedback

Published: 01 April 2011 Publication History

Abstract

With Internet delivery of video content surging to an unprecedented level, video recommendation, which suggests relevant videos to targeted users according to their historical and current viewings or preferences, has become one of most pervasive online video services. This article presents a novel contextual video recommendation system, called VideoReach, based on multimodal content relevance and user feedback. We consider an online video usually consists of different modalities (i.e., visual and audio track, as well as associated texts such as query, keywords, and surrounding text). Therefore, the recommended videos should be relevant to current viewing in terms of multimodal relevance. We also consider that different parts of videos are with different degrees of interest to a user, as well as different features and modalities have different contributions to the overall relevance. As a result, the recommended videos should also be relevant to current users in terms of user feedback (i.e., user click-through). We then design a unified framework for VideoReach which can seamlessly integrate both multimodal relevance and user feedback by relevance feedback and attention fusion. VideoReach represents one of the first attempts toward contextual recommendation driven by video content and user click-through, without assuming a sufficient collection of user profiles available. We conducted experiments over a large-scale real-world video data and reported the effectiveness of VideoReach.

References

[1]
Adomavicius, G. and Tuzhilin, A. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Engin. 17, 6, 734--749.
[2]
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison Wesley.
[3]
Balabanovic, M. 1998. Exploring versus exploiting when learning user models for text recommendation. User Model. User-Adapt. Interact. 8, 4, 71--102.
[4]
Baluja, S., Seth, R., Sivakumar, D., et al. 2008. Video suggestion and discovery for youtube, taking random walks through the view graph. In Proceedings of the International World Wide Web Conference.
[5]
Boll, S. 2007. Multitube-Where multimedia and web 2.0 could meet. IEEE Multimedia Mag. 14, 1, 9--13.
[6]
Bollen, J., Nelson, M. L., Araujo, R., and Geisler, G. 2005. Video recommendations for the open video project. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries. 369--369.
[7]
Burke, R. 2002. Hybrid recommender systems: Survey and experiments. User Model. User-Adapt. Interact. 12, 4, 331--370.
[8]
Chang, S.-F., Ma, W.-Y., and Smeulders, A. 2007. Recent advances and challenges of semantic image/video search. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[9]
Christakou, C. and Stafylopatis, A. 2005. A hybrid movie recommender system based on neural networks. In Proceedings of the 5th International Conference on Intelligent Systems Design and Applications.
[10]
Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 65.
[11]
Encyclopedia. 2011. Encyclopedia. http://www.encyclopedia.com/.
[12]
Fouss, F., Pirotte, A., Renders, J. M., and Saerens, M. 2007. Random-Walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Trans. Knowl. Data Engin. 19, 3, 355--369.
[13]
Gibas, M., Canahuate, G., and Ferhatosmanoglu, H. 2008. Online index recommendations for high-dimensional databases using query workloads. IEEE Trans. Knowl. Data Engin. 20, 2, 246--260.
[14]
Gu, Z., Mei, T., Hua, X.-S., Tang, J., and Wu, X. 2008. Multi-Layer multi-instance learning for video concept detection. IEEE Trans. Multimedia 10, 8, 1605--1616.
[15]
Hauptmann, A. G., Christel, M. G., and Yan, R. 2008. Video retrieval based on semantic concepts. Proc. IEEE 96, 4, 602--622.
[16]
Hu, J., Zeng, H.-J., Li, H., Niu, C., and Chen, Z. 2007. Demographic prediction based on user’s browsing behavior. In Proceedings of the International World Wide Web Conference.
[17]
Hua, X.-S., Lu, L., and Zhang, H.-J. 2004a. Optimization-Based automated home video editing system. IEEE Trans. Circ. Syst. Video Tech. 14, 5, 572--583.
[18]
Hua, X.-S. and Zhang, H.-J. 2004b. An attention-based decision fusion scheme for multimedia information retrieval. In Proceedings of the IEEE Pacific-Rim Conference on Multimedia.
[19]
Iwata, T., Saito, K., and Yamada, T. 2008. Recommendation method for improving customer lifetime value. IEEE Trans. Knowl. Data Engin. 20, 9, 1254--1263.
[20]
Kennedy, L., Chang, S.-F., and Natsev, A. 2008. Query-Adaptive fusion for multimodal search. Proc. IEEE 96, 4, 567--588.
[21]
Lew, M. S., Sebe, N., Djeraba, C., and Jain, R. 2006. Content-Based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Comm. Appl. 2, 1, 1--19.
[22]
Liu, Y., Mei, T., and Hua, X.-S. 2009. CrowdReranking: Exploring multiple search engines for visual search reranking. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 500--507.
[23]
Mei, T., Hua, X.-S., Lai, W., Yang, L., et al. 2007a. MSRA-USTC-SJTU at TRECVID 2007: High-Level feature extraction and search. In Proceedings of TREC Video Retrieval Evaluation Online.
[24]
Mei, T., Hua, X.-S., Yang, L., and Li, S. 2007b. VideoSense: Towards effective online video advertising. In Proceedings of ACM Multimedia. 1075--1084.
[25]
Mei, T., Yang, B., Hua, X.-S., Yang, L., Yang, S.-Q., and Li, S. 2007c. VideoReach: An online video recommendation system. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 767--768.
[26]
Moxley, E., Mei, T., and Manjunath, B. S. 2010. Video annotation through search and graph reinforcement mining. IEEE Trans. Multimedia 12, 3, 184--193.
[27]
MSN Video. 2011. MSN video. http://video.msn.com/video.aspx?mkt=en-us&tab=soapbox/.
[28]
Naphade, M., Smith, J. R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., and Curtis, J. 2006. Large-Scale concept ontology for multimedia. IEEE Multimedia Mag. 13, 3, 86--91.
[29]
Resnick, P. and Varian, H. R. 1997. Recommender systems. Comm. ACM 40, 3, 56--58.
[30]
Rui, Y., Huang, T. S., Ortega, M., and Mehrotra, S. 1998. Relevance feedback: A power tool for interactive content-based image retrieval. IEEE Trans. Circ. Video Tech. 8, 5, 644--655.
[31]
Setten, M. V. and Veenstra, M. 2003. Prediction strategies in a TV recommender system---Method and experiments. In Proceedings of the International World Wide Web Conference.
[32]
Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. 2006a. Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24, 3, 320--352.
[33]
Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. 2006b. Building bridges for web query classification. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 131--138.
[34]
Shen, J., Shepherd, J., Cui, B., and Tan, K.-L. 2009. A novel framework for efficient automated singer identification in large music databases. ACM Trans. Inf. Syst. 27, 3.
[35]
Shen, J., Tao, D., and Li, X. 2008. Modality mixture projections for semantic video event detection. IEEE Trans. Circ. Syst. Video Tech. 18, 11, 1587--1596.
[36]
Siersdorfer, S., Pedro, J. S., and Sanderson, M. 2009. Automatic video tagging using content redundancy. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 395--402.
[37]
Snoek, C. G. M. and Worring, M. 2009. Concept-based video retrieval. Found. Trends Inf. Retr. 4, 2, 215--322.
[38]
Snoek, C., Worring, M., van Gemert, J., Geusebroek, J.-M., and Smeulders, A. W. M. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM International Conference on Multimedia. 421--430.
[39]
Tao, D., Tang, X., Li, X., and Wu, X. 2006. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Patt. Anal. Mach. Intell. 28, 7, 1088--1099.
[40]
TRECVID. 2011. TRECVID. http://www-nlpir.nist.gov/projects/trecvid/.
[41]
Wei, Y. Z., Moreau, L., and Jennings, N. R. 2005. Learning users interests by quality classification in market-based recommender systems. IEEE Trans. Knowl. Data Engin. 17, 12, 1678--1688.
[42]
Yahoo! 2011. Yahoo. http://www.yahoo.com/.
[43]
Yang, B., Mei, T., Hua, X.-S., Yang, L., Yang, S.-Q., and Li, M. 2007. Online video recommendation based on multimodal fusion and relevance feedback. In Proceedings of the ACM International Conference on Image and Video Retrieval. 73--80.
[44]
Yang, Y. and Liu, X. 1999. A re-examination of text categorization methods. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.
[45]
YouTube. 2011. YouTube. http://www.youtube.com/.
[46]
Yu, B., Ma, W.-Y., Nahrstedt, K., and Zhang, H.-J. 2003. Video summarization based on user log enhanced link analysis. In Proceedings of the ACM International Conference on Multimedia. 382--391.
[47]
Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B. L., Zha, H., and Giles, C. L. 2008. Learning multiple graphs for document recommendations. In Proceedings of the International World Wide Web Conference. 141--150.

Cited By

View all
  • (2025)A systematic literature review on incomplete multimodal learning: techniques and challengesSystems Science & Control Engineering10.1080/21642583.2025.246708313:1Online publication date: 26-Feb-2025
  • (2024)Unsupervised Video Moment Retrieval with Knowledge-Based Pseudo-Supervision ConstructionACM Transactions on Information Systems10.1145/370122943:1(1-26)Online publication date: 9-Dec-2024
  • (2024)Variational Stochastic Multiple Auto-Encoder For Multimodal RecommendationProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700269(1-7)Online publication date: 3-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 29, Issue 2
April 2011
193 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1961209
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2011
Accepted: 01 December 2010
Revised: 01 August 2010
Received: 01 January 2010
Published in TOIS Volume 29, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Video recommendation
  2. image retrieval
  3. relevance feedback

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)7
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A systematic literature review on incomplete multimodal learning: techniques and challengesSystems Science & Control Engineering10.1080/21642583.2025.246708313:1Online publication date: 26-Feb-2025
  • (2024)Unsupervised Video Moment Retrieval with Knowledge-Based Pseudo-Supervision ConstructionACM Transactions on Information Systems10.1145/370122943:1(1-26)Online publication date: 9-Dec-2024
  • (2024)Variational Stochastic Multiple Auto-Encoder For Multimodal RecommendationProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700269(1-7)Online publication date: 3-Dec-2024
  • (2024)Dual-Stream Pre-Training Transformer to Enhance Multimodal Learning for Social Media PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688998(11450-11456)Online publication date: 28-Oct-2024
  • (2024)Solving the Short Video Assignment Problem via Federated Learning and Group Multi-Role Assignment2024 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)10.1109/ISPA63168.2024.00124(934-939)Online publication date: 30-Oct-2024
  • (2023)An overview of video recommender systems: state-of-the-art and research issuesFrontiers in Big Data10.3389/fdata.2023.12816146Online publication date: 30-Oct-2023
  • (2023)Learning and Optimization of Implicit Negative Feedback for Industrial Short-video Recommender SystemProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615482(4787-4793)Online publication date: 21-Oct-2023
  • (2023)Personal or General? A Hybrid Strategy with Multi-factors for News RecommendationACM Transactions on Information Systems10.1145/355537341:2(1-29)Online publication date: 13-Apr-2023
  • (2023)Popularity Debiasing from Exposure to Interaction in Collaborative FilteringProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591947(1801-1805)Online publication date: 19-Jul-2023
  • (2023)Toward Equivalent Transformation of User Preferences in Cross Domain RecommendationACM Transactions on Information Systems10.1145/352276241:1(1-31)Online publication date: 9-Jan-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media