Abstract
Combining audio and image processing for understanding video content has several benefits when compared to using each modality on their own. For the task of context and activity recognition in video sequences, it is important to explore both data streams to gather relevant information. In this paper we describe a video context and activity recognition model. Our work extracts a range of audio and visual features, followed by feature reduction and information fusion. We show that combining audio with video based decision making improves the quality of context and activity recognition in videos by 4% over audio data and 18% over image data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boersma, P.: Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics- to-Noise Ratio of a Sampled Sound. In: Institute of Phonetic Sciences, University of Amsterdam, Proceedings, vol. 17 (1993)
Halif, R., Flusser, J.: Numerically Stable Direct Least Squares Fitting of Ellipses. Department of Software Engineering, Charles University, Czech Republic (2000)
Hu, Y.H., Hwant, J.-N.: Handbook of Neural Network Signal Processing. CRC Press, Boca Raton
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Kobes, R., Kunstatter, G.: Physics 1501 – Modern Technology Physics Department, University of Winnipeg
Laws, K.I.: Textured image segmentation, Ph.D. thesis, University of Southern California (1980)
Liu, Z., Wang, Y.: Audio Feature Extraction and Analysis for Scene Segmentation and Classification. Journal of VLSI Signal Processing, 61–79 (1998)
Liu, Z., Huang, J., Wang, Y.: Classification of TV Programs Based on Audio Information Using Hidden Markov Model. In: IEEE Workshop on Multimedia Signal Processing (1998)
Lopes, J., Lin, C., Singh, S.: Multi-stage Classification for Audio based Activity Recognition. In: Submited to International Conference on Intelligent Data Engineering and Automated Learning (2006)
Lucas, B.D., Kanade, T.: An Iterative Image Registration Technique with an Application to Stereo Vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
Martin, J.C., Veldman, R., Beroule, D.: Developing multimodal interfaces: a theoretical framework and guided propagation networks. In: Bunt, H., Beun, R.J., Borghuis, T. (eds.) Multimodal Human-Computer Communication (1998)
Mindru, F., Moons, T., Van Gool, L.: Recognizing color patterns irrespective of viewpoint and illumination. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 1999), pp. 368–373 (1999)
Naphade, M.R., Huang, T.: Extracting semantics from audiovisual content: the final frontier in multimedia retrieval. IEEE Transactions on Neural Networks 13, 793–810 (2002)
Pudil, P., Navovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognition Letters 15, 1119–1125 (1994)
Sharma, R., Pavlovic, V.I., Huang, T.S.: Toward multimodal human-computer interface. Proceedings of the IEEE 86(5), 853–869 (1998)
Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. Brooks/Cole (1999)
Watkinson, J.: The Engineer’s Guide to Motion Compensation, Petersfield, Snell & Wilcox (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lopes, J., Singh, S. (2006). Audio and Video Feature Fusion for Activity Recognition in Unconstrained Videos. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_99
Download citation
DOI: https://doi.org/10.1007/11875581_99
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)