Abstract
This paper addresses the problem of scene understanding for driver assistance systems. To recognize the large number of objects that may be found on the road, several sensors and decision algorithms have to be used. The proposed approach is based on the representation of all available information in over-segmented image regions. The main novelty of the framework is its capability to incorporate new classes of objects and to include new sensors or detection methods while remaining robust to sensor failures. Several classes such as ground, vegetation or sky are considered, as well as three different sensors. The approach was evaluated on real publicly available urban driving scene data.
Similar content being viewed by others
References
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 227–2282 (2012)
Badino, H., Franke, U., Mester, R.: Free space computation using stochastic occupancy grids and dynamic programming. In: Proceedings of International Conference on Computer Vision Workshop on Dynamical Vision, Rio de Janeiro (2007)
Bansal, M., Sang-Hack, J., Bogdan, M., Jayana, E., Harpreet, S.S.: A real-time pedestrian detection system based on structure and appearance classification. In: Proceedings of IEEE International Conference on Robotics and Automation, Anchorage, pp. 903–909 (2010)
Barnett, J.A.: Calculating Dempster–Shafer plausibility. IEEE Trans. Pattern Anal. Mach. Intell. 13(6), 599–602 (1991)
Bordes, J.B., Davoine, F., Xu, P., Denœux, T.: Evidential grammars for image interpretation - Application to multimodal traffic scene understanding. In: Qin, Z., Huyn, V.N. (eds.) Integrated Uncertainty in Knowledge Modelling and Decision Making, Beijing, pp. 65–78 (2013)
Cobb, B.R., Shenoy, P.P.: On the plausibility transformation method for translating belief function models to probability models. Int. J. Approx. Reason. 41(3), 314–330 (2006)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Denœux, T.: Analysis of evidence-theoretic decision rules for pattern classification. Pattern Recognit. 30(7), 1095–1107 (1997)
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)
Dubois, D., Prade, H., Smets, P.: A definition of subjective possibility. Int. J. Approx. Reason. 48(2), 352–364 (2008)
Ess, A., Müller, T., Grabner, H., Van Gool, L.: Segmentation based urbran traffic scene understanding. In: Proceedings of British Machine Vision Conference, London, pp. 84.1–84.11 (2009)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Fröhlich, B., Rodner, E., Kemmler, M., Denzler, J.: Large-scale gaussian process multi-class classification for semantic segmentation and facade recognition. Mach. Vis. Appl. 24(5), 1043–1053 (2013)
Geiger, A., Lenz, P., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Proceedings of Asian Conference on Computer Vision, Queenstown, pp. 25–38 (2010)
Hoiem, D., Efros, A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 75(1), 151–172 (2007)
Khaleghi, B., Khamis, A., Karray, F.O., Razavi, S.N.: Multisensor data fusion: a review of the state-of-the-art. Inf. Fusion 14, 28–44 (2013)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimisation for object class segmentation and dense stereo reconstruction. Int. J. Approx. Reason. 100(2), 122–133 (2012)
Leibe, B., Cornelis, N., Cornelis, K., Van Gool, L.: Dynamic 3D scene analysis from a moving vehicle. In: Proceedings of IEEE Computer Vision and Pattern Recognition, Minneapolis, pp. 1–8 (2007)
Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platts probabilistic outputs for support vector machines. Mach. Learn. 68(3), 267–276 (2007)
Quost, B., Masson, M.H., Denœux, T.: Classifier fusion in the Dempster–Shafer framework using optimized t-norm based combination rules. Int. J. Approx. Reason. 52(3), 353–374 (2011)
Ren, C.Y., Reid, I.: gSLIC: a real-time implementation of SLIC superpixel segmentation. Technical report, University of Oxford, Department of Engineering Science (2011)
Rodríguez, S.A., Frémont, V., Bonnifait, P., Cherfaoui, V.: Multi-modal object detection and localization for high integrity driving assistance. Mach. Vis. Appl. 14, 1–16 (2011)
Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976)
Smets, P.: Belief functions: the disjunctive rule of combination and the generalized bayesian theorem. Int. J. Approx. Reason. 9(1), 1–35 (1993)
Smets, P., Kennes, R.: The transferable belief model. Artif. Intell. 66, 191–243 (1994)
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, Cambridge (2005)
Walley, P.: Statistical reasoning with imprecise probabilities. Chapman and Hall, New York (1991)
Wang, C.C., Thorpe, C., Thrun, S., Hebert, M., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. Int. J. Robot. Res. 26(1), 889–916 (2007)
Wedel, A., Badino, H., Rabe, C., Loose, H., Franke, U., Cremers, D.: B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. Intell. Transp. Syst. 10(4), 572–583 (2009)
Werlberger, M.: Convex approaches for high performance video processing. Ph.D. thesis, Institute for Computer Graphics and Vision, Graz University of Technology, Graz (2012)
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Proceedings of European Conference on Computer Vision, pp. 733–747 (2008)
Xu, Ph., Davoine, F., Bordes, J.B., Zhao, H., Denœux, T.: Information fusion on oversegmented images: An application for urban scene understanding. In: Proc. Int. Conf. on Machine Vision and Application, Kyoto, pp. 189–193 (2013)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. In: Hamprecht, F., Schnörr, C., Jähne, B. (eds.) Pattern Recognition. Lecture Notes in Computer Science, vol. 4713, pp. 214–223. Springer, Berlin (2007)
Acknowledgments
This work was carried out in the framework of the Labex MS2T, which was funded by the French Government, through the program “Investments for the future” managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02). It was supported and funded by the Cai Yuanpei project 26193PE from the Chinese Ministry of Education, the French Ministry of Foreign and European Affairs and the French Ministry of Higher Education and Research. It was also supported by the ANR-NSFC Sino-French PRETIV project 61161130528 / ANR-11-IS03-0001.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is a revised and extended version of [35].
Appendix A: Decision making
Appendix A: Decision making
In our case of study, several arguments can be stated in favor of the optimistic strategy. It is often more conclusive than the pessimistic strategy, coherent with frame refinement and computationally efficient. As shown by Barnett [4], given \(k\) plausibility functions, finding the singleton with maximum plausibility of the combined function only needs \(O(k|\varOmega |)\) operations, while for the belief it is necessary to do \(O(|\varOmega |^k)\) operations. We indeed have the following property:
To compute the pignistic probabilities, the combined mass functions need to be explicitly computed, which requires a number of operations exponential in \(|\varOmega |\).
To show the differences between different decision-making strategies, let us consider the following mass function defined on :
Table 7 shows the beliefs, plausibilities and pignistic probabilities on the singletons. Here, the pessimistic strategy cannot lead to any decision: actually, in the worst case scenario, any decision could be wrong given the current mass function. Choosing {grass} instead of would be wrong if the masses and were actually entirely related to . Inversely, the other decision would also be wrong if the same masses were now related respectively to {grass} and {road}. On the other hand, both \(pl^\varOmega \) and Bet\(P^\varOmega \) would lead to , which seems quite reasonable.
Now, if the singleton is refined into {tree, obstacle, sky}, the mass function (35) will simply be rewritten as:
Table 8 shows the measures induced by this new mass function. Following Bet\(P^\varTheta \), the decision is changed and now leads to {road}. In contrast, the plausibility criterion does not discriminate between {tree}, {obstacle} and {sky}, which are still more plausible than {grass} and {road}. The optimistic strategy thus remains coherent with its previous decision.
Rights and permissions
About this article
Cite this article
Xu, P., Davoine, F., Bordes, JB. et al. Multimodal information fusion for urban scene understanding. Machine Vision and Applications 27, 331–349 (2016). https://doi.org/10.1007/s00138-014-0649-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-014-0649-7