Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection

Published: 06 June 2016


Multimedia event detection (MED) is the task of detecting given events (e.g. parade, birthday party) in a large collection of video clips. While the most useful information comes from visual features and speech recognition, a lot can also be inferred from the non-speech audio content, either alone or in conjunction with visual and speech cues. This paper studies MED with non-speech audio information only. MED is usually performed in two stages. The first stage generates a representation for each clip in the form of either a single vector or a sequence of vectors, often by aggregating frame-level features; the second stage performs binary or multi-class classification to decide whether each target event occurs in each clip. Common classifiers used for the second stage include support vector machines (SVMs), feed-forward deep neural networks (DNNs), and recurrent neural networks (RNNs).
In this paper, we propose to classify clips for events using "recurrent SVMs". These models combine the kernel mapping and the large-margin optimization criterion of SVMs, and the ability to process sequences of variable lengths of RNNs. Reinforced with data augmentation, recurrent SVMs have achieved higher mean average precision (MAP) on the TRECVID 2011 MED task than both SVMs and RNNs.


Information & Contributors


Published In

cover image ACM Conferences
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
June 2016
452 pages
Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016


Author Tags

  1. data augmentation
  2. hinge loss
  3. kernel mapping
  4. large margin
  5. multimedia event detection (MED)
  6. noisemes
  7. recurrent neural networks (RNNs)
  8. support vector machines (SVMs)


ICMR'16: International Conference on Multimedia Retrieval
June 6 - 9, 2016
New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;
Overall Acceptance Rate 254 of 830 submissions, 31%


