Abstract
In this paper, we present an algorithm for multi-view recognition in a distributed camera setting that learns which viewpoints are most discriminative for particular instances of ambiguity. Our method is built on top of 2D recognition algorithms and casts view selection as the problem of optimizing kernel weights in multiple kernel learning. The main contribution is a locality-sensitive meta-training step to learn a disambiguation function to select the relative weighting of available viewpoints needed to classify a 2D input example. Our method outperforms related approaches on benchmark multi-view action recognition data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For illustration, Fig. 2 depicts a 2-dimensional feature space, but each step of our procedure can be kernelized, so a direct feature vector representation of the data is not necessary.
- 2.
We were unable to directly compare these results with those previously reported. Unlike with the IXMAS set, there is little agreement in the literature on an experimental protocol for 3DO.
References
Wang, X.: Intelligent multi-camera video surveillance: A review. Pattern Recogn. Lett. 34(1), 3–19 (2012)
Dutta Roy, S., Chaudhury, S., Banerjee, S.: Active recognition through next view planning: a survey. Pattern Recogn. 37, 429–446 (2004)
Arbel, T., Ferrie, F.: Viewpoint selection by navigation through entropy maps. In: Proceedings of the International Conference on Computer Vision, vol. 1, pp. 248–254 (1999)
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115, 224–241 (2011)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104, 249–257 (2006)
Turaga, P., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vis. 97, 243–254 (2012)
Li, R., Zickler, T.: Discriminative virtual views for cross-view action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2855–2862 (2012)
Zhang, Z., Wang, C., Xiao, B., Zhou, W., Liu, S., Shi, C.: Cross-view action recognition via a continuous virtual path. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2690–2697 (2013)
Souvenir, R., Babbs, J.: Learning the viewpoint manifold for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: Proceedings of the International Conference on Computer Vision, pp. 948–955 (2009)
Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32, 288–303 (2010)
Sharma, A., Kumar, A., Daume, H., Jacobs, D.: Generalized multiview analysis: A discriminative latent space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160–2167 (2012)
Dhillon, I., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556. ACM (2004)
Zhang, K., Tsang, I., Kwok, J.: Maximum margin clustering made practical. IEEE Trans. Neural Netw. 20, 583–596 (2009)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE (2006)
Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)
Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 489–496 (2011)
Levinboim, T., Sha, F.: Learning the kernel matrix with low-rank multiplicative shaping. In: Proceedings of the National Conference on Artificial Intelligence (2012)
Cortes, C., Mohri, M., Rostamizadeh, A.: Learning non-linear combinations of kernels. In: Advances in Neural Information Processing Systems 22, pp. 396–404 (2009)
Yang, J., Li, Y., Tian, Y., Duan, L., Gao, W.: Group-sensitive multiple kernel learning for object categorization. In: Proceedings of the International Conference on Computer Vision, pp. 436–443 (2009)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3dpost multi-view and 3d human action/interaction database. In: Conference for Visual Media Production, pp. 159–168. IEEE (2009)
Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008)
Ling, H., Okada, K.: Diffusion distance for histogram comparison. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 246–253. IEEE Computer Society, Washington, DC (2006)
Lin, H.T., Lin, C.J., Weng, R.C.: A note on platts probabilistic outputs for support vector machines. Mach. Learn. 68, 267–276 (2007)
Zhu, F., Shao, L., Lin, M.: Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recogn. Lett. 24, 20–24 (2012)
Parrigan, K., Souvenir, R.: Aggregating low-level features for human action recognition. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Chung, R., Hammoud, R., Hussain, M., Kar-Han, T., Crawfis, R., Thalmann, D., Kao, D., Avila, L. (eds.) ISVC 2010, Part I. LNCS, vol. 6453, pp. 143–152. Springer, Heidelberg (2010)
Holte, M.B., Chakraborty, B., Gonzalez, J., Moeslund, T.B.: A local 3-d motion descriptor for multi-view human action recognition from 4-d spatio-temporal interest points. IEEE J. Sel. Top. Sig. Process. 6, 553–565 (2012)
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: Proceedings of the International Conference on Computer Vision, pp. 1–7 (2007)
Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: Proceedings of the International Conference on Computer Vision, pp. 1–8 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Spurlock, S., Wu, H., Souvenir, R. (2015). Multi-view Recognition Using Weighted View Selection. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9006. Springer, Cham. https://doi.org/10.1007/978-3-319-16817-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-16817-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16816-6
Online ISBN: 978-3-319-16817-3
eBook Packages: Computer ScienceComputer Science (R0)