Abstract
Objects are often constrained to lie on a known plane. This paper concerns the pose determination and recognition of vehicles in traffic scenes, which under normal conditions stand on the ground-plane. The ground-plane constraint reduces the problem of localisation and recognition from 6 dof to 3 dof.
The ground-plane constraint significantly reduces the pose redundancy of 2D image and 3D model line matches. A form of the generalised Hough transform is used in conjuction with explicit probability-based voting models to find consistent matches and to identify the approximate poses. The algorithms are applied to images of several outdoor traffic scenes and successful results are obtained. The work reported in this paper illustrates the efficiency and robustness of context-based vision in a practical application of computer vision.
Multiple cameras may be used to overcome the limitations of a single camera. Data fusion in the proposed algorithms is shown to be simple and straightforward.
Similar content being viewed by others
References
Ballard, D.H. 1981. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition, 13: 111-122.
Bar-Shalom, Y. and Fortmann, T.E. 1988. Tracking and Data Association. Academic Press: Boston.
Besl, P.J. and Jain, R.C. 1985. Three-dimensional object recognition. ACM Computing Surveys, 17:75-145.
Brisdon, K.S. 1990. Hypothesis verification using iconic matching. Ph.D. Thesis, University of Reading, England.
Brooks, R.A. 1983. Model-based three-dimensional interpretations of two-dimensional images. IEEE Trans. Patt. Anal. Mach. Intell., 5:140-150.
Chen, J.L., Stockman, G.C., and Rao, K. 1993. Recovering and tracking pose of curved 3D objects from 2D images. In Proc. of IEEE Comput. Vis. Patt. Rec., New York, pp. 233-239.
Chin, R.T. and Dyer, C.R. 1986. Model-based recognition in robot vision. ACM Computing Surveys, 18:67-108.
Dhome, M., et al. 1989. Determination of the attitude of 3-D objects from a single perspective view. IEEE Trans. Patt. Anal. Mach. Intell., 11:1265-1278.
Dhome, M., et al. 1993. Determination of the pose of an articulated object from a single perspective view. In Proc. of 4th British Machine Vision Conf., Surrey, England, pp. 95-104.
Du, L., Sullivan, G.D., and Baker, K.D. 1993. Quantitative analysis of the viewpoint consistency constraint in model-based vision. In Proc. of 4th Inter. Conf. Comput. Vis., Berlin, Germany, pp. 632- 639.
Dubuisson, M.P., Lakshmanan, S., and Jain, A.K. 1996. Vehicle segmentation and classification using deformable templates. IEEE Trans. Patt. Anal. Mach. Intell., 18:293-308.
Duda, R.O. and Hart, P.E. 1973. Pattern Classification and Scene Analysis. Wiley: New York.
Fan, T.J., Medioni, G., and Nevatia, R. 1989. Recognizing 3-D objects using surface descriptions. IEEE Trans. Patt. Anal. Mach. Intell., 11:1140-1157.
Faugeras, O.D. 1993. Three-Dimensional Computer Vision. MIT Press: Boston, USA.
Faugeras, O.D. and Hebert, M. 1986. The representation, recognition, and locating of 3-D objects. Int. J. Robotics Res., 5:27-52.
Flynn, P.J. and Jain, A.K. 1991. BONSAI: 3-D object recognition using constrained search. IEEE Trans. Patt. Anal. Mach. Intell., 13:1066-1074.
Forsyth, D., et al. 1991. Invariant descriptors for 3-D object recognition and pose. IEEE Trans. Patt. Anal. Mach. Intell., 13:971-991.
Gaston, P.C. and Lozano-Perez, T. 1984. Tactile recognition and localization using object models: The case of polyhedra on a plane. IEEE Trans. Patt. Anal. Mach. Intell., 6:257-265.
Gil, S., Milanese, R., and Pun, T. 1996. Combining multiple motion estimates for vehicle tracking. In Proc. 4th European Conf. Comput. Vision, Cambridge, England, pp. 307-320.
Grimson, W.E.L. 1991. The combinatorics of heuristic search termination for object recognition in cluttered environment. IEEE Trans. Patt. Anal. Mach. Intell., 13:920-935.
Grimson, W.E.L. and Lozano-Perez, T. 1984. Model-based recognition and localization from spare range or tactile data. Int. J. Robotics Res., 3:3-35.
Grimson, W.E.L. and Lozano-Perez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Trans. Patt. Anal. Mach. Intell., 9:469-482.
Grimson, W.E.L. and Huttenlocher, D.P. 1990a. On the sensitivity of the Hough transform for object recognition. IEEE Trans. Patt. Anal. Mach. Intell., 12:255-274.
Grimson, W.E.L. and Huttenlocher, D.P. 1990b. On the sensitivity of geometric hashing. In Proc. of 3rd Inter. Conf. Comput. Vision, Osaka, Japan, pp. 334-338.
Horaud, R. 1987. New methods for matching 3-D objects with single perspective views. IEEE Trans. Patt. Anal. Mach. Intell., 9:401- 412.
Hussain, Z., et al. 1990. Knowledge based image processing: Feature based methods. Esprit Project P2152 Report, R122/1.
Kak, A.C., et al. 1988. Knowledge-based robotics. Int. J. Prod. Res., 26:707-734.
Kendall, M.G. and Buckland, W. R. 1982.A Dictionary of Statistical Terms, 4th Edition. Longman: London.
Koller, D. 1993. Moving object recognition and classification based on recursive shape parameter estimation. In Proc. of 12th Israel Conf. AI, Comput. Vision Neural Net., Tel-Aviv, Israel, pp. 359- 368.
Koller, D., et al. 1993. Model-based object tracking in monocular image sequences of road traffic scenes. Int. J. Computer Vision, 10: 257-281.
Koller, D., Weber, J., and Malik, J. 1994. Robust multiple car tracking with occlusion reasoning. In Proc. 3rd European Conf. Comput. Vision, Stockholm, Sweden, pp. 189-196.
Kollnig, H. and Nagel, H.-H. 1996. Matching object models to segments from an optical flow field. In Proc. 4th European Conf. Comput. Vision, Cambridge, England, pp. 388-399.
Kriegman, D.J. 1992. Computing stable poses of piecewise smooth objects. CVGIP-Image Understanding, 55:109-118.
Linnainmaa, S., Harwood, D.A., and Davis, L.S. 1988. Pose determination of a three-dimensional object using triangle pairs. IEEE Trans. Patt. Anal. Mach. Intell., 10:634-647.
Liu, Y.C., Huang, T.S., and Faugeras, O.D. 1990. Determination of camera location from 2-D to 3-D line and point correspondences. IEEE Trans. Patt. Anal. Mach. Intell., 12:28-37.
Lowe, D.G. 1987a. Three-dimensional object recognition from two-dimensional images. Artificial Intell., 31:355-395.
Lowe, D.G. 1987b. The viewpoint consistency constraint. Int. J. Comput. Vision, 1:57-72.
Michalopoulos, P.G. 1991. Vehicle detection video through image processing: The autoscope system. IEEE Trans. Vehicular Tech., 40:21-29.
Milton, J.S. and Arnold, J.C. 1990. Introduction to Probability and Statistics. McGraw-Hill: New York.
Morris, C.N. and Rolph, J.E. 1981. Introduction to Data Analysis and Statistical Inference. Prentice-Hall: New Jersey.
Mundy, J.L. and Heller, A.J. 1990. The evolution and testing of a model-based object recognition system. In Proc. of 3rd Inter. Conf. Comput. Vision, Osaka, Japan, pp. 268-282.
Navab, N. and Faugeras, O.D. 1993. Monocular pose determination from lines: Critical sets and maximum number of solutions. In Proc. IEEE Comput. Vision Pattern Recognition Conf., New York, USA, pp. 254-260.
Phong, T.Q., et al. 1995. Object pose from 2-D to 3-D point and line correspondences. Int. J. Comput. Vision, 15:225-243.
Press, W.H., et al. 1986. Numerical Recipes. Cambridge University Press: Cambridge, England.
Roberts, L.G. 1965. Machine perception of three-dimensional solids. In Optical and Electro-optical Information Processing, T. Tippet et al. (Eds.), MIT Press: Cambridge, MA.
Silberberg, T.M., Davis, L.S., and Harwood, D.A. 1984. An iterative Hough procedure for three-dimensional object recognition. Pattern Recognition, 17:612-629.
Silberberg, T.M., Harwood, D.A., and Davis, L.S. 1986. Object recognition using oriented model points. Comput. Vis. Graph. Image Proc., 35:47-71.
Stockman, G. 1987. Object recognition and localization via pose clustering. Comput. Vis. Graph. Image Proc., 40:361-387.
Stockman, G., Kopstein, S., and Benett, S. 1982. Matching images to models for registration and object recognition via clustering. IEEE Trans. Patt. Anal. Mach. Intell., 3:229-241.
Suetens, P., Fua, P., and Hanson, A.J. 1992. Computational strategies for object recognition. ACM Computing Surveys, 24:5-61.
Sullivan, G.D. 1992. Visual interpretation of known objects in constrained scenes. Phil. Trans. R. Soc. Lond. B, 337:361-370.
Tan, T.N. 1993. Computing the PDF of the object orientation recovered from a single noisy 2D-3D line match. Esprit project P2152 research report, RU-03-WP.T3137-TNT-04.
Tan, T.N., Sullivan, G.D., and Baker, K.D. 1992. Linear algorithms for object pose estimation. In Proc. of 3rd British Machine Vision Conf., Leeds, England, pp. 600-609.
Tan, T.N., Sullivan, G.D., and Baker, K.D. 1993a. Recognising objects on the ground-plane. In Proc. of 4th British Machine Vision Conf., Surrey, England, pp. 85-94. Also in Special Issue on BMVC93, Image and Vision Computing, 12(3).
Tan, T.N., Sullivan, G.D., and Baker, K.D. 1993b. On computing the perspective transformation matrix and camera parameters. In Proc. of 4th British Machine Vision Conf., Surrey, England, pp. 125- 134.
Tan, T.N., Sullivan, G.D., and Baker, K.D. 1994a. Pose determination and recognition of vehicles in traffic scenes. In Proc. 3rd European Conf. Comput. Vision, Stockholm, Sweden, pp. 501-506.
Tan, T.N., Sullivan, G.D., and Baker, K.D. 1994b. Fast vehicle localisation and recognition without line extraction. In Proc. 5th British Machine Vision Conference, York, England, pp. 85-94.
Tan, T.N., Sullivan, G.D., and Baker, K.D. 1996. Closed-form algorithms for object pose and scale recovery in constrained scene. Pattern Recognition, 29:449-462.
Thompson, D.W. and Mundy, J.L. 1987. Three-dimensional model matching from an unconstrained viewpoint. In Proc. of IEEE Int. Conf. on Robotics and Automation, Raleigh, NC, USA, pp. 208- 220.
Tsai, R.Y. 1986. An efficient and accurate camera calibration technique for 3D machine vision. In Proc. of IEEE CVPR86, pp. 364- 374.
Ullman, S. and Basri, R. 1991. Recognition by linear combinations of models. IEEE Trans. Patt. Anal. Mach. Intell., 13:992-1006.
Waltz, E. and Llinas, J. 1990. Multisensor Data Fusion. Artech House: Boston.
Wolfson, H.J. 1990. Model-based object recognition by geometric hashing. In Proc. of 1st Euro. Conf. Comput. Vision, Antibes, France, pp. 526-536.
Worrall, A.D., Baker, K.D., and Sullivan, G.D. 1989. Model based perspective inversion. Image and Vision Computing, 7:17-23.
Worrall, A.D., Marslin, R.F., Sullivan, G.D., and Baker, K.D. 1991. Model-based tracking. In Proc. of 2nd British Machine Vision Conf., Glasgow, Scotland, pp. 310-318.
Worrall, A.D., Sullivan, G.D., and Baker, K.D. 1993. Advances in model-based traffic vision. In Proc. of 4th British Machine Vision Conf., Surrey, England, pp. 559-568.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Tan, T., Sullivan, G. & Baker, K. Model-Based Localisation and Recognition of Road Vehicles. International Journal of Computer Vision 27, 5–25 (1998). https://doi.org/10.1023/A:1007924428535
Issue Date:
DOI: https://doi.org/10.1023/A:1007924428535