Abstract
Various object representations have been widely used for many tasks such as object detection, recognition, and tracking. Most of them requires an intensive training process on large database which is collected in advance, and it is hard to add models of a previously unobserved object which is not in the database. In this paper, we investigate how to create a representation of a new and unknown object online, and how to apply it to practical applications like object detection and tracking. To make it viable, we utilize a sensor fusion approach using a camera and a single-line scan LIDAR. The proposed representation consists of an approximated geometry model and a viewpoint-scale invariant appearance model which makes to extremely simple to match the model and the observation. This property makes it possible to model a new object online, and provides a robustness to viewpoint variation and occlusion. The representation has benefits of both an implicit model (referred to as a view-based model) and an explicit model (referred to as a shape-based model). Intensive experiments using synthetic and real data demonstrate the viability of the proposed object representation in both modeling and detecting/tracking objects.
Similar content being viewed by others
References
Bertalmio, M., Sapiro, G., & Randall, G. (2000). Morphing active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 733–737.
Bouguet, J. Y. (2008). Camera calibration toolbox for Matlab. http://vision.caltech.edu/bouguetj/calib_doc/download/index.html.
Boykov, Y. Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: Proceedings of the international conference on computer vision, (vol.1, pp. 105–112). IEEE Computer Society.
Cannons, K. (2008). A review of visual tracking. Technical report, York University.
Collins, R. (2003). Mean-shift blob tracking through scale space. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, (vol. 2, pp. II–234–40). IEEE.
Comaniciu, D., Ramesh, V., & Meer, P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5), 564–575.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, (pp. 886–893).
Dowson, N., & Bowden, R. (2005). Simultaneous modeling and tracking (smat) of feature sets. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, (vol. 2, pp. 99–105). IEEE.
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
Ess, A., Schindler, K., Leibe, B., & Van Gool, L. (2010). Object detection and tracking for autonomous navigation in dynamic environments. The International Journal of Robotics Research, 29, 1707–1725.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Franc, V., & Hlavac, V. (2004). Statistical pattern recognition toolbox for matlab. Prague: Center for Machine Perception, Czech Technical University.
Haag, M., & Nagel, H. H. (1999). Combination of edge element and optical flow estimates for 3D-model-based vehicle tracking in traffic image sequences. International Journal of Computer Vision, 35, 295–319.
Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.
Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., et al. (2012). Gradient response maps for real-time detection of textureless objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(5), 876–888.
Horn, B. K. P., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17, 185–203.
Jepson, A., Fleet, D., & El-Maraghi, T. (2003). Robust online appearance models for visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 1296–1311.
Kasper, A., Xue, Z., & Dillmann, R. (2012). The kit object models database: An object model database for object recognition, localization and manipulation in service robotics. The International Journal of Robotics Research, 31(8), 927–934.
Koller, D., Danilidis, K., & Nagel, H. H. (1993). Model-based object tracking in monocular image sequences of road traffic scenes. International Journal of Computer Vision, 10, 257–281.
Kwak, K., Huber, D., Chae, J., & Kanade, T. (2010). Boundary detection based on supervised learning. In: Proceedings of the IEEE international conference on robotics and automation. IEEE.
Kwak, K., Huber, D., Badino, H., & Kanade, T. (2011). Extrinsic calibration of a single line scanning lidar and a camera. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE.
Kwak, K., Kim, J. S., Min, J., & Park, Y. W. (2014). Unknown multiple object tracking using 2d lidar and video camera. Electronics Letters, 50(8), 600–602.
Leibe, B., Schindler, K., Cornelis, N., & Van Gool, L. (2008). Coupled object detection and tracking from static cameras and moving vehicles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1683–1698.
Lempitsky, V. S., & Ivanov, D. V. (2007). Seamless mosaicing of image-based texture maps. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition. IEEE Computer Society.
Lepetit, V., & Fua, P. (2005). Monocular model-based 3D tracking of rigid objects. Foundations and Trends in Computer Graphics and Vision, 1, 1–89.
Li, Y., Gu, L., & Kanade, T. (2011). Robustly aligning a shape model and its application to car alignment of unknown pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1860–1876.
Lou, J., Tan, T., Hu, W., Yang, H., & Maybank, S. J. (2005). 3-D model-based vehicle tracking. IEEE Transactions on Image Processing, 14, 1561–1569.
Luber, M., Arras, K. O., Plagemann, C., & Burgard, W. (2009). Classifying dynamic objects. Autonomous Robots, 26, 141–151.
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In: Proceedings of the international joint conference on artificial intelligence, (pp. 674–679).
MacLachlan, R. (2005). Tracking moving objects from a moving vehicle using a laser scanner. Technical Report CMU-RI-TR-05-07, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
Moreels, P., & Perona, P. (2007). Evaluation of features detectors and descriptors based on 3D objects. International Journal of Computer Vision, 73, 263–284.
Mundy, J. (2006). Object recognition in the geometric era: a retrospective. In J. Ponce, M. Hebert, C. Schmid, & A. Zisserman (Eds.), Toward category-level object recognition. Lecture Notes in Computer Science (vol. 4170, pp. 3–28). Berlin: Springer.
Nguyen, V., Gächter, S., Martinelli, A., Tomatis, N., & Siegwart, R. (2007). A comparison of line extraction algorithms using 2d range data for indoor mobile robotics. Autonomous Robots, 23(2), 97–111.
Ottlik, A., & Nagel, H. H. (2008). Initialization of model-based vehicle tracking in video sequences of inner-city intersections. International Journal of Computer Vision, 80, 211–225.
Petrovskaya, A., & Thrun, S. (2009). Model based vehicle detection and tracking for autonomous urban driving. Autonomous Robots, 26, 123–139.
Premebida, C., Ludwig, O., & Nunes, U. (2009). Lidar and vision-based pedestrian detection system. Journal of Field Robotics, 26, 696–711.
Rav-Acha, A., Kohli, P., Rother, C., & Fitzgibbon, A. (2008). Unwrap mosaics: A new representation for video editing. In: ACM SIGGRAPH 2008 Conference Proceedings. ACM.
Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2003) . 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, (vol. 2, pp. II–272–7). IEEE Computer Society.
Saragih, J. M., Lucey, S., & Cohn, J. F. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision, 91(2), 200–215.
Sato, Y., Wheeler, M. D., & Ikeuchi, K. (1997). Object shape and reflectance modeling from observation. In: Proceedings of the 24th annual conference on computer graphics and interactive techniques, SIGGRAPH ’97, (pp. 379–387).
Scharstein, D. (1994). Matching images by comparing their gradient fields. In: Proceedings of the international conference on pattern recognition, (pp. 572–575).
Schneiderman, H., & Kanade, T. (2000). A statistical method for 3d object detection applied to faces and cars. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (vol. 1, pp. 746–751). IEEE.
Shafique, K., & Shah, M. (2005). A noniterative greedy algorithm for multiframe point correspondence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 51–65.
Sinha, S. N., Steedly, D., Szeliski, R., Agrawala, M., & Pollefeys, M. (2008). Interactive 3d architectural modeling from unordered photo collections. ACM Transactions on Graphics, 27(5), 159:1–159:10.
Szeliski, R. (2010). Computer vision: Algorithms and applications. New York: Springer.
Terzopoulos, D., & Szeliski, R. (1993). Active vision. Cambridge, MA: MIT Press.
Torralba, A., Murphy, K., & Freeman, W. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, (vol. 2, pp. II–762–II–769). IEEE Computer Society.
Veenman, C., Reinders, M., & Backer, E. (2001). Resolving motion correspondence for densely moving points. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(1), 54–72.
Xiang, Y., & Savarese, S. (2012). Estimating the aspect layout of object categories. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2012 (pp. 3410–3417). IEEE.
Xiang, Y., Song, C., Mottaghi, R., & Savarese, S. (2014). Monocular multiview object tracking with 3D aspect parts. In: Proceedings of the computer vision–ECCV 2014, (pp. 220–235). Berlin: Springer.
Yan, P., Khan, S.M., & Shah, M. (2007). 3D model based object class detection in an arbitrary view. In: Proceedings of the IEEE international conference on computer vision, (vol. 0, pp. 1–6). IEEE.
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computer Survey, 38(4), 13.
Yin, Z., & Collins, R. (2007). On-the-fly object modeling while tracking. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (pp. 1 –8). IEEE Computer Society.
Zia, M. Z., Stark, M., Schiele, B., & Schindler, K. (2013). Detailed 3D representations for object recognition and modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2608–2623.
Zia, M. Z., Stark, M., & Schindler, K. (2015). Towards scene understanding with detailed 3D object representations. International Journal of Computer Vision, 112(2), 188–203.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by V. Lepetit.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (wmv 6346 KB)
Rights and permissions
About this article
Cite this article
Kwak, K., Kim, JS., Huber, D.F. et al. Online Approximate Model Representation Based on Scale-Normalized and Fronto-Parallel Appearance. Int J Comput Vis 117, 48–69 (2016). https://doi.org/10.1007/s11263-015-0848-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-015-0848-3