Skip to main content
Log in

E-LAMP: integration of innovative ideas for multimedia event detection

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) system to deal with different aspects of the overall problem of event detection. More specifically, we have developed efficient methods for feature extraction so that we are able to handle large collections of video data with thousands of hours of videos. Second, we represent the extracted raw features in a spatial bag-of-words model with more effective tilings such that the spatial layout information of different features and different events can be better captured, thus the overall detection performance can be improved. Third, different from widely used early and late fusion schemes, a novel algorithm is developed to learn a more robust and discriminative intermediate feature representation from multiple features so that better event models can be built upon it. Finally, to tackle the additional challenge of event detection with only very few positive exemplars, we have developed a novel algorithm which is able to effectively adapt the knowledge learnt from auxiliary sources to assist the event detection. Both our empirical results and the official evaluation results on TRECVID MED’11 and MED’12 demonstrate the excellent performance of the integration of these ideas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008)

    Article  Google Scholar 

  2. Akbacak, M., Bolles, R.C., Burns, J.B., Eliot, M., Heller, A., Herson, J.A., Myers, G.K., Nallapati, R., Pancoast, S., Hout, J.V., Yeh, E., Habibian, A., Koelma, D.C., Li, Z., Mazloom, M., Pintea, S., van de Sande, K.E., Smeulders, A.W., Snoek, C.G., Lee, S.C., Revatia, R., Sharma, P., Sun, C., Trichet, R.: The 2012 sesame multimedia event detection (med) system. In: TRECVID (2012)

  3. Ayache, S., Quénot, G., Gensel, J.: Classifier fusion for svm-based multimedia semantic indexing. In: Advances in Information Retrieval, pp. 494–504. Springer, Berlin (2007)

  4. Ballas, N., Delezoide, B., Prêteux, F.: Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 Joint ACM Workshop on Modeling and Representing Events, pp. 13–18. ACM, New York (2011)

  5. Bao, L., Zhang, L., Yu, S.I., zhong Lan, Z., Jiang, L., Overwijk, A., Jin, Q., Takahashi, S., Langner, B., Li, Y., Garbus, M., Florian Metze, S.B., Hauptmann, A.: Informedia @ trecvid2011. In: TRECVID (2011)

  6. Brown, G.J.: Computational auditory scene analysis: a representational approach (1992)

  7. Chaudhuri, S., Harvilla, M., Raj, B.: Unsupervised learning of acoustic unit descriptors for audio content representation and classification. In: Interspeech (2011)

  8. Chen, M., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. Techical report, Carnegie Mellon University (2009)

  9. Cheng, H., Liu, J., Ali, S., Javed, O., Yu, Q., Tamrakar, A., Divakaran, A., Sawhney, H.S., Manmatha, R., Allan, J., Hauptmann, A., Shah, M., Bhattacharya, S., Dehghan, A., Friedland, G., Elizalde, B.M., Darrell, T., Witbrock, M., Curtis, J.: Sri-sarnoff aurora system at trecvid 2012 multimedia event detection and recounting. In: TRECVID (2012)

  10. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol. 1(2004)

  11. Lan, Z., Bao, L., Yu, S.I., Liu, W., Hauptmann, A.G.: Double fusion for multimedia event detection. In: MMM (2012)

  12. Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: IEEE 12th International Conference on Computer Vision, 2009, pp. 221–228. IEEE, New York (2009)

  13. Burghouts, G.J., Geusebroek, J.M.: Performance evaluation of local color invariants. In: CVIU (2009)

  14. Hill, M., Hua, G., Natsev, A., Smith, J.R., Xie, L., Huang, B., Merler, M., Ouyang, H., Zhou, M.: Ibm research trecvid-2010 video copy detection and multimedia event detection system. In: TRECVID (2010)

  15. Inoue, N., Shinoda, K.: A fast map adaptation technique for gmm-supervector-based video semantic indexing systems. In: Proceedings of the 19th ACM international conference on Multimedia, pp. 1357–1360. ACM, New York (2011)

  16. Jiang, L., Hauptmann, A., Xiang, G.: Leveraging high-level and low-level features for multimedia event detection. In: ACM Multimedia (2012)

  17. Jiang, Y.G., Zeng, X., Ye, G., Ellis, D., Chang, S.F.: Columbia-ucf trecvid2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: TRECVID (2010)

  18. Lan, Z.Z., Bao, L., Yu, S.I., Liu, W., Hauptmann, A.G.: Multimedia classification and event detection using double fusion. In: Multimedia Tools and Applications pp. 1–15 (2013)

  19. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 2169–2178. IEEE, New York (2006)

  20. Li, H., Bao, L., Gao, Z., Overwijk, A., Liu, W., fei Zhang, L., Yu, S.I., yu Chen, M., Metze, F., Hauptmann, A.: Informedia @ trecvid2010. In: TRECVID (2010)

  21. Li, L.J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: A high-level image representation for scene classification and semantic feature sparsification. Adv. Neural Inf. Process. Syst. 24 (2010)

  22. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  23. Luo, J., Yu, J., Joshi, D., Hao, W.: Event recognition: viewing the world with a third eye. In: ACM Multimedia (2008)

  24. Ma, Z., Yang, Y., Cai, Y., Sebe, N., Hauptmann, A.: Knowledge adaptation for ad hoc multimedia event detection with few exemplars. In: ACM MM (2012)

  25. Ma, Z., Yang, Y., Sebe, N., Hauptmann, A.: Multimedia event detection using a classifier-specific intermediate representation. IEEE Trans. Multimedia (2013)

  26. Makkonen, J., Kerminen, R., Curcio, I.D., Mate, S., Visa, A.: Detecting events by clustering videos from large media databases. In: Proceedings of the 2nd ACM International Workshop on Events in Multimedia, pp. 9–14. ACM, New York (2010)

  27. Mertens, R., Lei, H., Gottlieb, L., Friedland, G., Divakaran, A.: Acoustic super models for large scale video event detection. In: Proceedings of the 2011 Joint ACM Workshop on Modeling and Representing events, pp. 19–24. ACM, New York (2011)

  28. Mezaris, V., Scherp, A., Jain, R., Kankanhalli, M., Zhou, H., Zhang, J., Wang, L., Zhang, Z.: Modeling and representing events in multimedia. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 613–614. ACM, New York (2011)

  29. Natarajan, P., Natarajan, P., Manohar, V., Wu, S., Tsakalidis, S., Vitaladevuni, S.N., Zhuang, X., Prasad, R.: Bbn viser trecvid 2011 multimedia event detection system. In: TRECVID (2011)

  30. Natarajan, P., Wu, S., Vitaladevuni, S., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R.: Multimodal feature fusion for robust event detection in web videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 1298–1305. IEEE, New York (2012)

  31. Over, P., et al.: Trecvid 2010—an introduction to the goals, tasks, data, evaluation mechanisms, and metrics. In: TRECVID (2010)

  32. Perera, A., Oh, S., Leotta, M., Kim, I., Byun, B., Lee, C., McCloskey, S., Liu, J., Miller, B., Huang, Z., Vahdat, A., Yang, W., Mori, G., Tang, K., Koller, D., Fei-Fei, L., Li, K., Chen, G., Corso, J., Fu, Y., Srihari, R.: Genie trecvid 2011 multimedia event detection: late-fusion approaches to combine multiple audio-visual features. In: TRECVID (2011)

  33. Sadlier, D.A., O’Connor, N.E.: Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. Circuits Syst. Video Technol. 15(10), 1225–1233 (2005)

    Article  Google Scholar 

  34. van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. TPAMI (2010)

  35. Schölkopf, B., Smola, A.J.: Learning With Kernels: Support Vector Machines, Regularization, Optimization and Beyond. The MIT Press, Cambridge (2002)

    Google Scholar 

  36. Shyu, M.L., Xie, Z., Chen, M., Chen, S.C.: Video semantic event/concept detection using a subspace-based multimedia data mining framework. Trans. Multimedia (2008)

  37. Snoek, C.G., Worring, M., Smeulders, A.W.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 399–402. ACM, New York (2005)

  38. Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM Multimedia (2006)

  39. Tamrakar, A., Ali, S., Yu, Q., Liu, J., Javed, O., Divakaran, A., Cheng, H., Sawhney, H.: Evaluation of low-level features and their combinations for complex event detection in open source videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3681–3688. IEEE, New York (2012)

  40. Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: ACM CIVR (2009)

  41. Wang, G., Chua, T.S., Zhao, M.: Exploring knowledge of sub-domain in a multi-resolution bootstrapping framework for concept detection in news video. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 249–258. ACM, New York (2008)

  42. Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)

  43. Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: ECCV (2008)

  44. Xu, C., Wang, J., Wan, K., Li, Y., Duan, L.: Live sports event detection based on broadcast video and web-casting text. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 221–230. ACM, New York (2006)

  45. Yang, J., Tong, W., Hauptmann, A.: A framework for classifier adaptation for large-scale multimedia data. Proc. IEEE (2012)

  46. Yang, Y., Ma, Z., Hauptmann, A.G., Sebe., N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia (2013)

  47. Younessian, E., Quinn, M., Mitamura, T., Hauptmann, A.: Multimedia event detection using visual concept signatures. In: SPIE (2013)

  48. Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3313–3320. IEEE, New York (2011)

  49. Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of mfcc. J. Comput. Sci. Technol. (2001)

Download references

Acknowledgments

This work is supported in part by the National Science Foundation under Grant IIS-0917072 and by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20068. The US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the US Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Tong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tong, W., Yang, Y., Jiang, L. et al. E-LAMP: integration of innovative ideas for multimedia event detection. Machine Vision and Applications 25, 5–15 (2014). https://doi.org/10.1007/s00138-013-0529-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-013-0529-6

Keywords

Navigation