ABSTRACT
Recent improvements in content-based video search have led to systems with promising accuracy, thus opening up the possibility for interactive content-based video search to the general public. We present an interactive system based on a state-of-the-art content-based video search pipeline which enables users to do multimodal text-to-video and video-to-video search in large video collections, and to incrementally refine queries through relevance feedback and model visualization. Also, the comprehensive functionalities enhance a flexible formulation of multimodal queries with different characteristics. Quantitative and qualitative analysis shows that our system is capable of assisting users to incrementally build effective queries over complex event topics.
- P. Natarajan, S. Wu, S. Vitaladevuni, X. Zhuang, S. Tsakalidis, U. Park, and R. Prasad. Multimodal feature fusion for robust event detection in web videos. In CVPR, 2012. Google ScholarDigital Library
- A. Tamrakar, S. Ali, Q. Yu, J. Liu, O. Javed, A. Divakaran, H. Cheng, and H. Sawhney. Evaluation of low-level features and their combinations for complex event detection in open source videos. In CVPR, 2012.Google ScholarCross Ref
- S.-I. Yu, L. Jiang, Z. Mao, et al. Cmu-informedia @ trecvid. In TRECVID Video Retrieval Evaluation Workshop, 2014.Google Scholar
- A. Habibian, M. Mazloom, and C. G. Snoek. On-the-fly video event search by semantic signatures. In Proceedings of International Conference on Multimedia Retrieval. ACM, 2014. Google ScholarDigital Library
- A. G. Hauptmann, M. G. Christel, and R. Yan. Video retrieval based on semantic concepts. Proceedings of the IEEE, 2008.Google ScholarCross Ref
- L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann. Easy samples first: self-paced reranking for zero-example multimedia search. In Proceedings of the ACM International Conference on Multimedia, pages 547--556. ACM, 2014. Google ScholarDigital Library
- L. Jiang, D. Meng, S.-I. Yu, Z. Lan, S. Shan, and A. Hauptmann. Self-paced learning with diversity. In Advances in Neural Information Processing Systems 27. 2014.Google Scholar
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014. Google ScholarDigital Library
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 2013.Google ScholarDigital Library
- P. Over, G. Awad, J. Fiscus, and G. Sanders. Trecvid 2013 - an introduction to the goals, tasks, data, evaluation mechanisms, and metrics. TRECVID Workshop, 2013.Google Scholar
- S. Strassel, A. Morris, J. G. Fiscus, C. Caruso, H. Lee, P. Over, J. Fiumara, B. Shaw, B. Antonishek, and M. Michel. Creating havic: Heterogeneous audio visual internet collection. In LREC. Citeseer, 2012.Google Scholar
- Y. Miao, F. Metze, and S. Rawat. Deep maxout networks for low-resource speech recognition. In ASRU, 2013.Google ScholarCross Ref
- H. Wang and C. Schmid. Action recognition with improved trajectories. In IEEE International Conference on Computer Vision, Sydney, Australia, 2013. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011. Google ScholarDigital Library
- H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2011. Google ScholarDigital Library
Index Terms
- Incremental Multimodal Query Construction for Video Search
Recommendations
Mutual relevance feedback for multimodal query formulation in video retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrievalVideo indexing and retrieval systems allow users to find relevant video segments for a given information need. A multimodal video index may include speech indices, a text-from-screen (OCR) index, semantic visual concepts, content-based image features, ...
Web-scale Multimedia Search for Internet Video Content
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebThe World Wide Web has been witnessing an explosion of video content. Video data are becoming one of the most valuable sources to assess insights and information. However, existing video search methods are still based on text matching (text-to-text ...
Video Indexing, Search, Detection, and Description with Focus on TRECVID
ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia RetrievalThere has been a tremendous growth in video data the last decade. People are using mobile phones and tablets to take, share or watch videos more than ever before. Video cameras are around us almost everywhere in the public domain (e.g. stores, streets, ...
Comments