Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos

Lopes, José; Singh, Sameer

doi:10.1007/11875581_99

José Lopes²⁰ &
Sameer Singh²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1312 Accesses
3 Citations

Abstract

Combining audio and image processing for understanding video content has several benefits when compared to using each modality on their own. For the task of context and activity recognition in video sequences, it is important to explore both data streams to gather relevant information. In this paper we describe a video context and activity recognition model. Our work extracts a range of audio and visual features, followed by feature reduction and information fusion. We show that combining audio with video based decision making improves the quality of context and activity recognition in videos by 4% over audio data and 18% over image data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boersma, P.: Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics- to-Noise Ratio of a Sampled Sound. In: Institute of Phonetic Sciences, University of Amsterdam, Proceedings, vol. 17 (1993)
Google Scholar
Halif, R., Flusser, J.: Numerically Stable Direct Least Squares Fitting of Ellipses. Department of Software Engineering, Charles University, Czech Republic (2000)
Google Scholar
Hu, Y.H., Hwant, J.-N.: Handbook of Neural Network Signal Processing. CRC Press, Boca Raton
Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Article Google Scholar
Kobes, R., Kunstatter, G.: Physics 1501 – Modern Technology Physics Department, University of Winnipeg
Google Scholar
Laws, K.I.: Textured image segmentation, Ph.D. thesis, University of Southern California (1980)
Google Scholar
Liu, Z., Wang, Y.: Audio Feature Extraction and Analysis for Scene Segmentation and Classification. Journal of VLSI Signal Processing, 61–79 (1998)
Google Scholar
Liu, Z., Huang, J., Wang, Y.: Classification of TV Programs Based on Audio Information Using Hidden Markov Model. In: IEEE Workshop on Multimedia Signal Processing (1998)
Google Scholar
Lopes, J., Lin, C., Singh, S.: Multi-stage Classification for Audio based Activity Recognition. In: Submited to International Conference on Intelligent Data Engineering and Automated Learning (2006)
Google Scholar
Lucas, B.D., Kanade, T.: An Iterative Image Registration Technique with an Application to Stereo Vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
Google Scholar
Martin, J.C., Veldman, R., Beroule, D.: Developing multimodal interfaces: a theoretical framework and guided propagation networks. In: Bunt, H., Beun, R.J., Borghuis, T. (eds.) Multimodal Human-Computer Communication (1998)
Google Scholar
Mindru, F., Moons, T., Van Gool, L.: Recognizing color patterns irrespective of viewpoint and illumination. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 1999), pp. 368–373 (1999)
Google Scholar
Naphade, M.R., Huang, T.: Extracting semantics from audiovisual content: the final frontier in multimedia retrieval. IEEE Transactions on Neural Networks 13, 793–810 (2002)
Article Google Scholar
Pudil, P., Navovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Letters 15, 1119–1125 (1994)
Article Google Scholar
Sharma, R., Pavlovic, V.I., Huang, T.S.: Toward multimodal human-computer interface. Proceedings of the IEEE 86(5), 853–869 (1998)
Article Google Scholar
Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. Brooks/Cole (1999)
Google Scholar
Watkinson, J.: The Engineer’s Guide to Motion Compensation, Petersfield, Snell & Wilcox (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Research School of Informatics, Loughborough University, Loughborough, LE11 3TU, UK
José Lopes & Sameer Singh

Authors

José Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, GICAP Research Group, Universidad de Burgo, Calle Francisco de Vitoria S/N, Edifico C, Campus Vena, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
Department of Information Systems and Computation, Technical University of Valencia, Camino de Vera, Valencia, Spain
Vicente Botti
University of West Scotland, PA1 2BE, Paisley, Scotland
Colin Fyfe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lopes, J., Singh, S. (2006). Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_99

Download citation

DOI: https://doi.org/10.1007/11875581_99
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics