Abstract
The estimation of viewpoints and keypoints effectively enhance object detection methods by extracting valuable traits of the object instances. While the output of both processes differ, i.e., angles vs. list of characteristic points, they indeed share the same focus on how the object is placed in the scene, inducing that there is a certain level of correlation between them. Therefore, we propose a convolutional neural network that jointly computes the viewpoint and keypoints for different object categories. By training both tasks together, each task improves the accuracy of the other. Since the labelling of object keypoints is very time consuming for human annotators, we also introduce a new synthetic dataset with automatically generated viewpoint and keypoints annotations. Our proposed network can also be trained on datasets that contain viewpoint and keypoints annotations or only one of them. The experiments show that the proposed approach successfully exploits this implicit correlation between the tasks and outperforms previous techniques that are trained independently .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 468–475 (2017)
Chang, A.X., et al.: Shapenet: An information-rich 3D model repository. CoRR abs/1512.3012 (2015)
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5669–5678 (2017)
Divon, G., Tal, A.: Viewpoint estimation–insights & model. In: IEEE European Conference on Computer Vision, pp. 252–268 (2018)
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Fenzi, M., Leal-Taixe, L., Rosenhahn, B., Ostermann, J.: Class generative models based on feature regression for pose estimation of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 755–762 (2013)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Ghodrati, A., Pedersoli, M., Tuytelaars, T.: Is 2D information enough for viewpoint estimation? In: British Machine Vision Conference, pp. 1–12 (2014)
Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2018)
He, K., Sigal, L., Sclaroff, S.: Parameterizing object detectors in the continuous pose space. In: IEEE European Conference on Computer Vision, pp. 450–465 (2014)
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Keys, R.G.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 29(6), 1153–1160 (1981)
Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1688–1695 (2010)
Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: Advances in Neural Information Processing Systems, pp. 1601–1609 (2014)
Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: British Machine Vision Conference (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: IEEE European Conference on Computer Vision, pp. 483–499 (2016)
Panareda Busto, P., Gall, J.: Viewpoint refinement and estimation with adapted synthetic data. Comput. Vis. Image Underst. 169, 75–89 (2018)
Panareda Busto, P., Liebelt, J., Gall, J.: Adaptation of synthetic data for coarse-to-fine viewpoint refinement. In: British Machine Vision Conference (2015)
Pavlakos, G., Zhou, X., Chan, A., Derpanis, K.G., Daniilidis, K.: 6-DoF object pose from semantic keypoints. In: IEEE International Conference on Robotics and Automation, pp. 2011–2018 (2017)
Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: IEEE International Conference on Computer Vision, pp. 1278–1286 (2015)
Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3D geometry to deformable part models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3362–3369 (2012)
Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.: 3D object class detection in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition: Workshops, pp. 1–10 (2015)
Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thormählen, T., Schiele, B.: Learning people detection models from few training samples. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1473–1480 (2011)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE International Conference on Computer Vision, pp. 2686–2694 (2015)
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)
Torki, M., Elgammal, A.: Regression from local features for viewpoint and pose estimation. In: IEEE International Conference on Computer Vision, pp. 2603–2610 (2011)
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1519 (2015)
Wang, Y., et al.: 3D pose estimation for fine-grained object categories. In: IEEE European Conference on Computer Vision: Workshops (2018)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)
Wu, J., et al.: Single image 3d interpreter network. In: IEEE European Conference on Computer Vision, pp. 365–382 (2016)
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)
Xiang, Y., et al.: Objectnet3D: a large scale database for 3D object recognition. In: IEEE European Conference on Computer Vision, pp. 160–176 (2016)
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)
Zhou, X., Karpur, A., Luo, L., Huang, Q.: Starmap for category-agnostic keypoint and viewpoint estimation. In: IEEE European Conference on Computer Vision, pp. 318–334 (2018)
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: IEEE International Conference on Computer Vision, pp. 4903–4911 (2017)
Acknowledgement
The work has been supported by the ERC Starting Grant ARCA (677650).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Panareda Busto, P., Gall, J. (2019). Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-33676-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33675-2
Online ISBN: 978-3-030-33676-9
eBook Packages: Computer ScienceComputer Science (R0)