Skip to main content
Log in

Multimodal information fusion for urban scene understanding

  • Special Issue Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper addresses the problem of scene understanding for driver assistance systems. To recognize the large number of objects that may be found on the road, several sensors and decision algorithms have to be used. The proposed approach is based on the representation of all available information in over-segmented image regions. The main novelty of the framework is its capability to incorporate new classes of objects and to include new sensors or detection methods while remaining robust to sensor failures. Several classes such as ground, vegetation or sky are considered, as well as three different sensors. The approach was evaluated on real publicly available urban driving scene data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.cs.uiuc.edu/~dhoiem.

  2. https://www.hds.utc.fr/~xuphilip/dokuwiki/en/data.

  3. http://ivrg.epfl.ch/supplementary_material/RK_SLICSuperpixels/index.html.

References

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 227–2282 (2012)

    Article  Google Scholar 

  2. Badino, H., Franke, U., Mester, R.: Free space computation using stochastic occupancy grids and dynamic programming. In: Proceedings of International Conference on Computer Vision Workshop on Dynamical Vision, Rio de Janeiro (2007)

  3. Bansal, M., Sang-Hack, J., Bogdan, M., Jayana, E., Harpreet, S.S.: A real-time pedestrian detection system based on structure and appearance classification. In: Proceedings of IEEE International Conference on Robotics and Automation, Anchorage, pp. 903–909 (2010)

  4. Barnett, J.A.: Calculating Dempster–Shafer plausibility. IEEE Trans. Pattern Anal. Mach. Intell. 13(6), 599–602 (1991)

    Article  Google Scholar 

  5. Bordes, J.B., Davoine, F., Xu, P., Denœux, T.: Evidential grammars for image interpretation - Application to multimodal traffic scene understanding. In: Qin, Z., Huyn, V.N. (eds.) Integrated Uncertainty in Knowledge Modelling and Decision Making, Beijing, pp. 65–78 (2013)

  6. Cobb, B.R., Shenoy, P.P.: On the plausibility transformation method for translating belief function models to probability models. Int. J. Approx. Reason. 41(3), 314–330 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)

    Article  Google Scholar 

  8. Denœux, T.: Analysis of evidence-theoretic decision rules for pattern classification. Pattern Recognit. 30(7), 1095–1107 (1997)

    Article  Google Scholar 

  9. Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)

    Article  Google Scholar 

  10. Dubois, D., Prade, H., Smets, P.: A definition of subjective possibility. Int. J. Approx. Reason. 48(2), 352–364 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. Ess, A., Müller, T., Grabner, H., Van Gool, L.: Segmentation based urbran traffic scene understanding. In: Proceedings of British Machine Vision Conference, London, pp. 84.1–84.11 (2009)

  12. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  13. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)

    Article  Google Scholar 

  14. Fröhlich, B., Rodner, E., Kemmler, M., Denzler, J.: Large-scale gaussian process multi-class classification for semantic segmentation and facade recognition. Mach. Vis. Appl. 24(5), 1043–1053 (2013)

    Article  Google Scholar 

  15. Geiger, A., Lenz, P., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  16. Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Proceedings of Asian Conference on Computer Vision, Queenstown, pp. 25–38 (2010)

  17. Hoiem, D., Efros, A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 75(1), 151–172 (2007)

    Article  MATH  Google Scholar 

  18. Khaleghi, B., Khamis, A., Karray, F.O., Razavi, S.N.: Multisensor data fusion: a review of the state-of-the-art. Inf. Fusion 14, 28–44 (2013)

    Article  Google Scholar 

  19. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)

    Article  Google Scholar 

  20. Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimisation for object class segmentation and dense stereo reconstruction. Int. J. Approx. Reason. 100(2), 122–133 (2012)

    Google Scholar 

  21. Leibe, B., Cornelis, N., Cornelis, K., Van Gool, L.: Dynamic 3D scene analysis from a moving vehicle. In: Proceedings of IEEE Computer Vision and Pattern Recognition, Minneapolis, pp. 1–8 (2007)

  22. Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platts probabilistic outputs for support vector machines. Mach. Learn. 68(3), 267–276 (2007)

    Article  Google Scholar 

  23. Quost, B., Masson, M.H., Denœux, T.: Classifier fusion in the Dempster–Shafer framework using optimized t-norm based combination rules. Int. J. Approx. Reason. 52(3), 353–374 (2011)

  24. Ren, C.Y., Reid, I.: gSLIC: a real-time implementation of SLIC superpixel segmentation. Technical report, University of Oxford, Department of Engineering Science (2011)

  25. Rodríguez, S.A., Frémont, V., Bonnifait, P., Cherfaoui, V.: Multi-modal object detection and localization for high integrity driving assistance. Mach. Vis. Appl. 14, 1–16 (2011)

    Google Scholar 

  26. Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976)

    MATH  Google Scholar 

  27. Smets, P.: Belief functions: the disjunctive rule of combination and the generalized bayesian theorem. Int. J. Approx. Reason. 9(1), 1–35 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  28. Smets, P., Kennes, R.: The transferable belief model. Artif. Intell. 66, 191–243 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  29. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, Cambridge (2005)

    MATH  Google Scholar 

  30. Walley, P.: Statistical reasoning with imprecise probabilities. Chapman and Hall, New York (1991)

    Book  MATH  Google Scholar 

  31. Wang, C.C., Thorpe, C., Thrun, S., Hebert, M., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. Int. J. Robot. Res. 26(1), 889–916 (2007)

    Article  Google Scholar 

  32. Wedel, A., Badino, H., Rabe, C., Loose, H., Franke, U., Cremers, D.: B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. Intell. Transp. Syst. 10(4), 572–583 (2009)

    Article  Google Scholar 

  33. Werlberger, M.: Convex approaches for high performance video processing. Ph.D. thesis, Institute for Computer Graphics and Vision, Graz University of Technology, Graz (2012)

  34. Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Proceedings of European Conference on Computer Vision, pp. 733–747 (2008)

  35. Xu, Ph., Davoine, F., Bordes, J.B., Zhao, H., Denœux, T.: Information fusion on oversegmented images: An application for urban scene understanding. In: Proc. Int. Conf. on Machine Vision and Application, Kyoto, pp. 189–193 (2013)

  36. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. In: Hamprecht, F., Schnörr, C., Jähne, B. (eds.) Pattern Recognition. Lecture Notes in Computer Science, vol. 4713, pp. 214–223. Springer, Berlin (2007)

Download references

Acknowledgments

This work was carried out in the framework of the Labex MS2T, which was funded by the French Government, through the program “Investments for the future” managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02). It was supported and funded by the Cai Yuanpei project 26193PE from the Chinese Ministry of Education, the French Ministry of Foreign and European Affairs and the French Ministry of Higher Education and Research. It was also supported by the ANR-NSFC Sino-French PRETIV project 61161130528 / ANR-11-IS03-0001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Xu.

Additional information

This paper is a revised and extended version of [35].

Appendix A: Decision making

Appendix A: Decision making

In our case of study, several arguments can be stated in favor of the optimistic strategy. It is often more conclusive than the pessimistic strategy, coherent with frame refinement and computationally efficient. As shown by Barnett [4], given \(k\) plausibility functions, finding the singleton with maximum plausibility of the combined function only needs \(O(k|\varOmega |)\) operations, while for the belief it is necessary to do \(O(|\varOmega |^k)\) operations. We indeed have the following property:

$$\begin{aligned} pl_{1,2}(\{\omega \})&= \frac{1}{1-\kappa }pl_1(\{\omega \})pl_2(\{\omega \}) \end{aligned}$$
(34a)
$$\begin{aligned}&\propto pl_1(\{\omega \})pl_2(\{\omega \}), \quad \forall \omega \in \varOmega . \end{aligned}$$
(34b)

To compute the pignistic probabilities, the combined mass functions need to be explicitly computed, which requires a number of operations exponential in \(|\varOmega |\).

To show the differences between different decision-making strategies, let us consider the following mass function defined on :

$$\begin{aligned} m^\varOmega (\{\text {grass, road}\}) = 0.2, \end{aligned}$$
(35a)
(35b)
(35c)

Table 7 shows the beliefs, plausibilities and pignistic probabilities on the singletons. Here, the pessimistic strategy cannot lead to any decision: actually, in the worst case scenario, any decision could be wrong given the current mass function. Choosing {grass} instead of would be wrong if the masses and were actually entirely related to . Inversely, the other decision would also be wrong if the same masses were now related respectively to {grass} and {road}. On the other hand, both \(pl^\varOmega \) and Bet\(P^\varOmega \) would lead to , which seems quite reasonable.

Table 7 bel\(^\varOmega \), \(pl^\varOmega \) and Bet\(P^\varOmega \) from mass function (35)

Now, if the singleton is refined into {tree, obstacle, sky}, the mass function (35) will simply be rewritten as:

$$\begin{aligned}&m^\varTheta (\{\text {grass, road}\}) = 0.2,\end{aligned}$$
(36a)
$$\begin{aligned}&m^\varTheta (\{\text {grass, tree, obstacle, sky}\}) = 0.3, \end{aligned}$$
(36b)
$$\begin{aligned}&m^\varTheta (\{\text {road, tree, obstacle, sky}\}) = 0.5. \end{aligned}$$
(36c)

Table 8 shows the measures induced by this new mass function. Following Bet\(P^\varTheta \), the decision is changed and now leads to {road}. In contrast, the plausibility criterion does not discriminate between {tree}, {obstacle} and {sky}, which are still more plausible than {grass} and {road}. The optimistic strategy thus remains coherent with its previous decision.

Table 8 bel\(^\varTheta \), \(pl^\varTheta \) and Bet\(P^\varTheta \) from mass function (36)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, P., Davoine, F., Bordes, JB. et al. Multimodal information fusion for urban scene understanding. Machine Vision and Applications 27, 331–349 (2016). https://doi.org/10.1007/s00138-014-0649-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-014-0649-7

Keywords

Navigation