Article

Extreme video retrieval: joint maximization of human and computer performance

Authors:

Alexander G. Hauptmann,

Ming-Yu ChenAuthors Info & Claims

MM '06: Proceedings of the 14th ACM international conference on Multimedia

Pages 385 - 394

https://doi.org/10.1145/1180639.1180721

Published: 23 October 2006 Publication History

Abstract

We present an efficient system for video search that maximizes the use of human bandwidth, while at the same time exploiting the machine's ability to learn in real-time from user selected relevant video clips. The system exploits the human capability for rapidly scanning imagery augmenting it with an active learning loop, which attempts to always present the most relevant material based on the current information. Two versions of the human interface were evaluated, one with variable page sizes and manual paging, the other with a fixed page size and automatic paging. Both require absolute attention and focus of the user for optimal performance. In either case, as users search and find relevant results, the system can invisibly re-rank its previous best guesses using a number of knowledge sources, such as image similarity, text similarity, and temporal proximity. Experimental evidence shows a significant improvement using the combined extremes of human and machine power over either approach alone.

References

[1]

Chang, E.Y., Tong, S., and Goh, K.-S. Support Vector Machine Concept-Dependent Active Learning for Image Retrieval. IEEE Transactions on Multimedia (anticipated 2005), http://mmdb2.ece.ucsb.edu/~echang/mm000540.pdf.

[2]

Chang, S.-F., (moderator), Multimedia Access and Retrieval: The State of the Art and Future Directions. In Proc. ACM Multimedia '99 (Orlando FL, Nov. 1999), ACM Press, 443--445.

Digital Library

[3]

Chen, M-Y., and Hauptmann, A., Searching for a Specific Person in Broadcast News Video, International Conference on Acoustics, Speech, and Signal Processing (ICASSP'04), Montreal, Canada, May 17-21, 2004

[4]

Derthick, M., Interfaces for Palmtop Image Search. Proc. JCDL (Portland, OR, July 2002), 340--341.

Digital Library

[5]

Forsyth, D., and Ponce, J. Computer Vision: A Modern Approach. Prentice Hall, Englewood Cliffs, NJ, 2002.

Digital Library

[6]

Freund, Y., and Schapire, R.E. A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 1, 1997, 119--139.

Digital Library

[7]

Gosselin, P.H., and Cord, M. RETIN AL: An active learning strategy for image category retrieval. In Proc. IEEE Conf. Image Processing (Singapore, October 2004), 2219--2222.

[8]

Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.

[9]

Hauptmann, A.G., and Christel, M.G. Successful Approaches in the TREC Video Retrieval Evaluations. Proc. ACM Multimedia '04, ACM Press (2004), 668-675.

Digital Library

[10]

Hauptmann, A. G., Christel, M., Conescu, R., Gao, J., Jin Q., Lin, W.-H., Pan, J.-Y., Stevens, S. M., Yan, R., Yang, J., and Zhang, Y. CMU Informedia's TRECVID 2005 Skirmishes, in TRECVid 2005 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, MD, 14-15 Nov. 2005.

[11]

Lee, H. and Smeaton, A.F. Designing the User Interface for the Fischlar Digital Video Library, J. Digital Info. 2(4), http://jodi.ecs.soton.ac.uk/Articles/v02/i04/Lee/, May 2002.

[12]

McCallum, A., and Nigam, K. Employing EM in pool-based active learning for text classification. In Proc. Int'l Conf. on Machine Learning. Morgan Kaufmann, 1998, 350--358.

Digital Library

[13]

Naphade, M., and Smith, J.R. Active Learning for Simultaneous Annotation of Multiple Binary Concepts. In Proc. IEEE Intl. Conf. on Multimedia and Expo (ICME) (Taipei, Taiwan, June, 2004), 77-80.

[14]

Naphade, M.R., and Smith, J.R. On the Detection of Semantic Concepts at TRECVID. Proc. ACM Multimedia '04, ACM Press (2004), 660--667.

Digital Library

[15]

Nguyen, H.T., and Smeulders, A. Active Learning Using Pre-clustering. In Proc. Int'l Conf. on Machine Learning (Banff, Canada, July 2004). ACM Press, 2004.

Digital Library

[16]

Over P, Kraaij W and Smeaton A.F. TRECVID 2005 - An Introduction. TRECVid 2005 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, MD, 14-15 Nov. 2005.

[17]

Schneiderman, H., and Kanade, T. Probabilistic Modeling of Local Appearance and Spatial Relationships of Object Recognition. In Conf. Computer Vision and Pattern Recognition (CVPR '98) (Santa Barbara, CA, June, 1998). IEEE Computer Society, 1998, 45--51.

Digital Library

[18]

Spence, R., Rapid, Serial and Visual: A presentation technique with potential. Information Visualization, 1(1):13-19, 2002.

Digital Library

[19]

Tong, S., and Chang, E. Support Vector Machine Active Learning for Image Retrieval. In Proc. ACM Multimedia 2001 (Ottawa, Canada, October, 2001). ACM Press, 2001, 107--118.

Digital Library

[20]

Wang, L., Chan, K.L., and Zhang, Z. Bootstrapping SVM Active Learning by Incorporating Unlabelled Images for Image Retrieval. In Conf. Computer Vision and Pattern Recognition (CVPR '03) (Madison, WI, June, 1998). IEEE Computer Society, 2003, 629--634.

Digital Library

[21]

Yan, R., Yang, J., and Hauptmann, A., Learning Query-Class Dependent Weights in Automatic Video Retrieval, Proceedings of ACM Multimedia 2004, New York, NY, pp. 548--555, October 10-16, 2004

Digital Library

[22]

Snoek, C. G. M., van Gemert, J. C.,Geusebroek, J. M., Huurnink, B., Koelma, D. C., Nguyen, G. P., De Rooij, O., Seinstra F. J., Smeulders, A. W. M., Veenman, C. J., Worring, M., The MediaMill TRECVID 2005 Semantic Video Search Engine. In Proceedings of the 3rd TRECVID Workshop, Gaithersburg, USA, November 2005

[23]

Yan, R., and Hauptmann, A.G., Efficient Margin-Based Rank Learning Algorithms for Information Retrieval. Conference on Image and Video Retrieval (CIVR 2006), pp. 113--122 Tempe, AZ, July 2006.

Digital Library

[24]

Chen, M-Y., Wactlar, H., Hauptmann, A., and Christel, M., Putting Active Learning into Multimedia Applications: Dynamic Definition and Refinement of Concept Classifiers ACM Multimedia 2005, National University of Singapore, Singapore, November 6-11, 2005

Digital Library

Cited By

Kim DLee M(2024)Interpreting pretext tasks for active learning: a reinforcement learning approachScientific Reports10.1038/s41598-024-76864-214:1Online publication date: 28-Oct-2024
https://doi.org/10.1038/s41598-024-76864-2
Migdady AKhamayseh YAlZoubi OYassein M(2024)An Adaptive Query Approach for Extracting Medical Images for Disease Detection ApplicationsArabian Journal for Science and Engineering10.1007/s13369-024-09152-wOnline publication date: 24-May-2024
https://doi.org/10.1007/s13369-024-09152-w
Fruchard BMalacria SCasiez GHuot S(2023)User Preference and Performance using Tagging and Browsing for Image LabelingProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580926(1-13)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580926
Show More Cited By

Index Terms

Extreme video retrieval: joint maximization of human and computer performance
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Mutual relevance feedback for multimodal query formulation in video retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

Video indexing and retrieval systems allow users to find relevant video segments for a given information need. A multimodal video index may include speech indices, a text-from-screen (OCR) index, semantic visual concepts, content-based image features, ...
Improving video event retrieval by user feedback

In content based video retrieval videos are often indexed with semantic labels (concepts) using pre-trained classifiers. These pre-trained classifiers (concept detectors), are not perfect, and thus the labels are noisy. Additionally, the amount of pre-...
Active learning for human action retrieval using query pool selection

Content-Based Video Retrieval (CBVR) is gaining considerable research interest, inspired by the need to manage the large amounts of video media accumulating on the Internet. In this paper, we verify that the current state-of-the-art retrieval algorithms ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '06: Proceedings of the 14th ACM international conference on Multimedia

October 2006

1072 pages

ISBN:1595934472

DOI:10.1145/1180639

General Chairs:
Klara Nahrstedt
UIUC
,
Matthew Turk
UCSB
,
Program Chairs:
Yong Rui
Microsoft Research
,
Wolfgang Klas
Universität Wien
,
Ketan Mayer-Patel
UNC

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM06

Sponsor:

MM06: The 14th ACM International Conference on Multimedia 2006

October 23 - 27, 2006

CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

84
Total Citations
View Citations
643
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim DLee M(2024)Interpreting pretext tasks for active learning: a reinforcement learning approachScientific Reports10.1038/s41598-024-76864-214:1Online publication date: 28-Oct-2024
https://doi.org/10.1038/s41598-024-76864-2
Migdady AKhamayseh YAlZoubi OYassein M(2024)An Adaptive Query Approach for Extracting Medical Images for Disease Detection ApplicationsArabian Journal for Science and Engineering10.1007/s13369-024-09152-wOnline publication date: 24-May-2024
https://doi.org/10.1007/s13369-024-09152-w
Fruchard BMalacria SCasiez GHuot S(2023)User Preference and Performance using Tagging and Browsing for Image LabelingProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580926(1-13)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580926
Raiyani KGonçalves TRato L(2022)Abbreviating Labelling Cost for Sentinel-2 Image Scene Classification Through Active LearningPattern Recognition and Image Analysis10.1007/978-3-031-04881-4_24(295-308)Online publication date: 26-Apr-2022
https://doi.org/10.1007/978-3-031-04881-4_24
Zahalka JWorring MVan Wijk J(2021)II-20: Intelligent and pragmatic analytic categorization of image collectionsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.303038327:2(422-431)Online publication date: Feb-2021
https://doi.org/10.1109/TVCG.2020.3030383
Yu ZChang HYu ZGuo BShi R(2021)Location Selection for Air Quality Monitoring with Consideration of Limited Budget and Estimation ErrorIEEE Transactions on Mobile Computing10.1109/TMC.2021.3065656(1-1)Online publication date: 2021
https://doi.org/10.1109/TMC.2021.3065656
Tian YLookman TXue D(2021)Efficient sampling for decision making in materials discovery*Chinese Physics B10.1088/1674-1056/abf12d30:5(050705)Online publication date: 1-May-2021
https://doi.org/10.1088/1674-1056/abf12d
Budd SRobinson EKainz B(2021)A survey on active learning and human-in-the-loop deep learning for medical image analysisMedical Image Analysis10.1016/j.media.2021.10206271(102062)Online publication date: Jul-2021
https://doi.org/10.1016/j.media.2021.102062
Wu MLi QBilal MXu XZhang JHou J(2021)Multi-label active learning from crowds for secure IIoTAd Hoc Networks10.1016/j.adhoc.2021.102594121:COnline publication date: 23-Aug-2021
https://dl.acm.org/doi/10.1016/j.adhoc.2021.102594
Das ANair MPeter D(2020)Batch Mode Active Learning on the Riemannian Manifold for Automated Scoring of Nuclear Pleomorphism in Breast CancerArtificial Intelligence in Medicine10.1016/j.artmed.2020.101805103:COnline publication date: 1-Mar-2020
https://dl.acm.org/doi/10.1016/j.artmed.2020.101805
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten