skip to main content
10.1145/2509896.2509905acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cross-modal alignment for wildlife recognition

Published: 22 October 2013 Publication History

Abstract

We propose an unsupervised framework for recognizing animals in videos using subtitles. In this framework, the alignment between animals and their names is performed using an Expectation Maximization algorithm which is adapted to two very different circumstances- 1) when the bounding boxes are available and 2) when the frame as a whole is used instead of bounding boxes. With the goal of maximizing precision, recall and F-measure, the experiments compare a multitude of natural language processing approaches and visual features when associating animal names in the subtitles with visual patterns. The proposed unsupervised methods obtain 83.1% F1 using bounding boxes and 65.7% F1 without bounding boxes in a fully automated pipeline.

References

[1]
P. T. Pham, M. F. Moens, and T. Tuytelaars. Cross-media alignment of names and faces. IEEE Transactions on Multimedia, 12(1):13--27, 2010.
[2]
T. L. Berg, A. C. Berg, J. Edwards, and D. A. Forsyth. Who's in the picture. Advances in neural information processing systems, 17:137--144, 2005.
[3]
S. Maji and R. Bajcsy. Fast unsupervised alignment of video and text for indexing/names and faces. Workshop on Multimedia Information Retrieval on the Many Faces of Multimedia Semantics, 2007.
[4]
O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. International Conference on Machine Learning, 1998.
[5]
C. Schmid. Constructing models for content-based image retrieval. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.
[6]
T. L. Berg and D. A. Forsyth. Animals on the Web. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006.
[7]
O. M. Parkhi, A. Vedaldi, C. V Jawahar, and A. Zisserman. The truth about cats and dogs. IEEE International Conference on Computer Vision, 2011.
[8]
Y.-H. Shiau, S.-I. Lin, Y.-H. Chen, S.-W. Lo, and C.-C. Chen. Fish observation, detection, recognition and verification in the real world. International Conference on Image Processing, Computer Vision and Pattern Recognition, 2012.
[9]
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3--26, 2007.
[10]
J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by Gibbs sampling. Association for Computational Linguistics, 2005.
[11]
George A. Miller. WordNet: A lexical database for English. Communications of the ACM, 38:39--41, 1995.
[12]
V. Stoyanov, C. Cardie, N. Gilbert, E. Riloff, D. Buttler, and D. Hyson. Coreference resolution with Reconcile. Association for Computational Linguistics, 2010.
[13]
H. Lee, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, and D. Jurafsky. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics, 39(4):1--54, 2013.
[14]
P. Hellier, V. Demoulin, L. Oisel, and P. Perez. A contrario shot detection. IEEE International Conference on Image Processing, 2012.
[15]
J. Yang, Y. G. Jiang, A. G. Hauptmann, and C. W. Ngo. Evaluating bag-of-visual-words representations in scene classification. ACM International Conference on Multimedia Information Retrieval, 2007.
[16]
B. J. Jain and K. Obermayer. Elkan's k-means algorithm for graphs. Mexican International Conference on Artificial Intelligence Conference on Advances in Soft Computing, 2010.
[17]
A. Strehl and J. Ghosh. Cluster ensamples - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583--617, 2002.

Cited By

View all
  • (2017)Entity linking across vision and languageMultimedia Tools and Applications10.1007/s11042-017-4732-876:21(22599-22622)Online publication date: 1-Nov-2017
  • (2013)Summary abstract for the 2nd ACM international workshop on multimedia analysis for ecological dataProceedings of the 21st ACM international conference on Multimedia10.1145/2502081.2503834(1101-1102)Online publication date: 21-Oct-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MAED '13: Proceedings of the 2nd ACM international workshop on Multimedia analysis for ecological data
October 2013
54 pages
ISBN:9781450324014
DOI:10.1145/2509896
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-modal alignment
  2. em algorithm
  3. wildlife recognition

Qualifiers

  • Research-article

Conference

MM '13
Sponsor:
MM '13: ACM Multimedia Conference
October 22, 2013
Barcelona, Spain

Acceptance Rates

MAED '13 Paper Acceptance Rate 7 of 12 submissions, 58%;
Overall Acceptance Rate 13 of 23 submissions, 57%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Entity linking across vision and languageMultimedia Tools and Applications10.1007/s11042-017-4732-876:21(22599-22622)Online publication date: 1-Nov-2017
  • (2013)Summary abstract for the 2nd ACM international workshop on multimedia analysis for ecological dataProceedings of the 21st ACM international conference on Multimedia10.1145/2502081.2503834(1101-1102)Online publication date: 21-Oct-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media