Multimodal information fusion for urban scene understanding

Xu, Philippe; Davoine, Franck; Bordes, Jean-Baptiste; Zhao, Huijing; Denœux, Thierry

doi:10.1007/s00138-014-0649-7

Multimodal information fusion for urban scene understanding

Special Issue Paper
Published: 16 December 2014

Volume 27, pages 331–349, (2016)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Philippe Xu^1,2,
Franck Davoine²,
Jean-Baptiste Bordes¹,
Huijing Zhao² &
…
Thierry Denœux¹

1105 Accesses
37 Citations
Explore all metrics

Abstract

This paper addresses the problem of scene understanding for driver assistance systems. To recognize the large number of objects that may be found on the road, several sensors and decision algorithms have to be used. The proposed approach is based on the representation of all available information in over-segmented image regions. The main novelty of the framework is its capability to incorporate new classes of objects and to include new sensors or detection methods while remaining robust to sensor failures. Several classes such as ground, vegetation or sky are considered, as well as three different sensors. The approach was evaluated on real publicly available urban driving scene data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Notes

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 227–2282 (2012)
Article Google Scholar
Badino, H., Franke, U., Mester, R.: Free space computation using stochastic occupancy grids and dynamic programming. In: Proceedings of International Conference on Computer Vision Workshop on Dynamical Vision, Rio de Janeiro (2007)
Bansal, M., Sang-Hack, J., Bogdan, M., Jayana, E., Harpreet, S.S.: A real-time pedestrian detection system based on structure and appearance classification. In: Proceedings of IEEE International Conference on Robotics and Automation, Anchorage, pp. 903–909 (2010)
Barnett, J.A.: Calculating Dempster–Shafer plausibility. IEEE Trans. Pattern Anal. Mach. Intell. 13(6), 599–602 (1991)
Article Google Scholar
Bordes, J.B., Davoine, F., Xu, P., Denœux, T.: Evidential grammars for image interpretation - Application to multimodal traffic scene understanding. In: Qin, Z., Huyn, V.N. (eds.) Integrated Uncertainty in Knowledge Modelling and Decision Making, Beijing, pp. 65–78 (2013)
Cobb, B.R., Shenoy, P.P.: On the plausibility transformation method for translating belief function models to probability models. Int. J. Approx. Reason. 41(3), 314–330 (2006)
Article MathSciNet MATH Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Article Google Scholar
Denœux, T.: Analysis of evidence-theoretic decision rules for pattern classification. Pattern Recognit. 30(7), 1095–1107 (1997)
Article Google Scholar
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2011)
Article Google Scholar
Dubois, D., Prade, H., Smets, P.: A definition of subjective possibility. Int. J. Approx. Reason. 48(2), 352–364 (2008)
Article MathSciNet MATH Google Scholar
Ess, A., Müller, T., Grabner, H., Van Gool, L.: Segmentation based urbran traffic scene understanding. In: Proceedings of British Machine Vision Conference, London, pp. 84.1–84.11 (2009)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Article Google Scholar
Fröhlich, B., Rodner, E., Kemmler, M., Denzler, J.: Large-scale gaussian process multi-class classification for semantic segmentation and facade recognition. Mach. Vis. Appl. 24(5), 1043–1053 (2013)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Proceedings of Asian Conference on Computer Vision, Queenstown, pp. 25–38 (2010)
Hoiem, D., Efros, A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 75(1), 151–172 (2007)
Article MATH Google Scholar
Khaleghi, B., Khamis, A., Karray, F.O., Razavi, S.N.: Multisensor data fusion: a review of the state-of-the-art. Inf. Fusion 14, 28–44 (2013)
Article Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Article Google Scholar
Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimisation for object class segmentation and dense stereo reconstruction. Int. J. Approx. Reason. 100(2), 122–133 (2012)
Google Scholar
Leibe, B., Cornelis, N., Cornelis, K., Van Gool, L.: Dynamic 3D scene analysis from a moving vehicle. In: Proceedings of IEEE Computer Vision and Pattern Recognition, Minneapolis, pp. 1–8 (2007)
Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platts probabilistic outputs for support vector machines. Mach. Learn. 68(3), 267–276 (2007)
Article Google Scholar
Quost, B., Masson, M.H., Denœux, T.: Classifier fusion in the Dempster–Shafer framework using optimized t-norm based combination rules. Int. J. Approx. Reason. 52(3), 353–374 (2011)
Ren, C.Y., Reid, I.: gSLIC: a real-time implementation of SLIC superpixel segmentation. Technical report, University of Oxford, Department of Engineering Science (2011)
Rodríguez, S.A., Frémont, V., Bonnifait, P., Cherfaoui, V.: Multi-modal object detection and localization for high integrity driving assistance. Mach. Vis. Appl. 14, 1–16 (2011)
Google Scholar
Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976)
MATH Google Scholar
Smets, P.: Belief functions: the disjunctive rule of combination and the generalized bayesian theorem. Int. J. Approx. Reason. 9(1), 1–35 (1993)
Article MathSciNet MATH Google Scholar
Smets, P., Kennes, R.: The transferable belief model. Artif. Intell. 66, 191–243 (1994)
Article MathSciNet MATH Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, Cambridge (2005)
MATH Google Scholar
Walley, P.: Statistical reasoning with imprecise probabilities. Chapman and Hall, New York (1991)
Book MATH Google Scholar
Wang, C.C., Thorpe, C., Thrun, S., Hebert, M., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. Int. J. Robot. Res. 26(1), 889–916 (2007)
Article Google Scholar
Wedel, A., Badino, H., Rabe, C., Loose, H., Franke, U., Cremers, D.: B-spline modeling of road surfaces with an application to free-space estimation. IEEE Trans. Intell. Transp. Syst. 10(4), 572–583 (2009)
Article Google Scholar
Werlberger, M.: Convex approaches for high performance video processing. Ph.D. thesis, Institute for Computer Graphics and Vision, Graz University of Technology, Graz (2012)
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Proceedings of European Conference on Computer Vision, pp. 733–747 (2008)
Xu, Ph., Davoine, F., Bordes, J.B., Zhao, H., Denœux, T.: Information fusion on oversegmented images: An application for urban scene understanding. In: Proc. Int. Conf. on Machine Vision and Application, Kyoto, pp. 189–193 (2013)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. In: Hamprecht, F., Schnörr, C., Jähne, B. (eds.) Pattern Recognition. Lecture Notes in Computer Science, vol. 4713, pp. 214–223. Springer, Berlin (2007)

Download references

Acknowledgments

This work was carried out in the framework of the Labex MS2T, which was funded by the French Government, through the program “Investments for the future” managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02). It was supported and funded by the Cai Yuanpei project 26193PE from the Chinese Ministry of Education, the French Ministry of Foreign and European Affairs and the French Ministry of Higher Education and Research. It was also supported by the ANR-NSFC Sino-French PRETIV project 61161130528 / ANR-11-IS03-0001.

Author information

Authors and Affiliations

UMR CNRS 7253, Heudiasyc, Université de Technologie de Compiègne, BP 20529, 60205, Compiègne Cedex, France
Philippe Xu, Jean-Baptiste Bordes & Thierry Denœux
LIAMA, CNRS, Key Lab of Machine Perception (MOE), Peking University, Beijing, People’s Republic of China
Philippe Xu, Franck Davoine & Huijing Zhao

Authors

Philippe Xu
View author publications
You can also search for this author in PubMed Google Scholar
Franck Davoine
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Baptiste Bordes
View author publications
You can also search for this author in PubMed Google Scholar
Huijing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Denœux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Xu.

Additional information

This paper is a revised and extended version of [35].

Appendix A: Decision making

In our case of study, several arguments can be stated in favor of the optimistic strategy. It is often more conclusive than the pessimistic strategy, coherent with frame refinement and computationally efficient. As shown by Barnett [4], given $k$ plausibility functions, finding the singleton with maximum plausibility of the combined function only needs $O(k|\varOmega |)$ operations, while for the belief it is necessary to do $O(|\varOmega |^k)$ operations. We indeed have the following property:

$$\begin{aligned} pl_{1,2}(\{\omega \})&= \frac{1}{1-\kappa }pl_1(\{\omega \})pl_2(\{\omega \}) \end{aligned}$$

(34a)

$$\begin{aligned}&\propto pl_1(\{\omega \})pl_2(\{\omega \}), \quad \forall \omega \in \varOmega . \end{aligned}$$

(34b)

To compute the pignistic probabilities, the combined mass functions need to be explicitly computed, which requires a number of operations exponential in $|\varOmega |$.

To show the differences between different decision-making strategies, let us consider the following mass function defined on :

$$\begin{aligned} m^\varOmega (\{\text {grass, road}\}) = 0.2, \end{aligned}$$

(35a)

(35b)

(35c)

Table 7 shows the beliefs, plausibilities and pignistic probabilities on the singletons. Here, the pessimistic strategy cannot lead to any decision: actually, in the worst case scenario, any decision could be wrong given the current mass function. Choosing {grass} instead of would be wrong if the masses and were actually entirely related to . Inversely, the other decision would also be wrong if the same masses were now related respectively to {grass} and {road}. On the other hand, both $pl^\varOmega $ and Bet$P^\varOmega $ would lead to , which seems quite reasonable.

Table 7 bel$^\varOmega $, $pl^\varOmega $ and Bet$P^\varOmega $ from mass function (35)

Full size table

Now, if the singleton is refined into {tree, obstacle, sky}, the mass function (35) will simply be rewritten as:

$$\begin{aligned}&m^\varTheta (\{\text {grass, road}\}) = 0.2,\end{aligned}$$

(36a)

$$\begin{aligned}&m^\varTheta (\{\text {grass, tree, obstacle, sky}\}) = 0.3, \end{aligned}$$

(36b)

$$\begin{aligned}&m^\varTheta (\{\text {road, tree, obstacle, sky}\}) = 0.5. \end{aligned}$$

(36c)

Table 8 shows the measures induced by this new mass function. Following Bet$P^\varTheta $, the decision is changed and now leads to {road}. In contrast, the plausibility criterion does not discriminate between {tree}, {obstacle} and {sky}, which are still more plausible than {grass} and {road}. The optimistic strategy thus remains coherent with its previous decision.

Table 8 bel$^\varTheta $, $pl^\varTheta $ and Bet$P^\varTheta $ from mass function (36)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, P., Davoine, F., Bordes, JB. et al. Multimodal information fusion for urban scene understanding. Machine Vision and Applications 27, 331–349 (2016). https://doi.org/10.1007/s00138-014-0649-7

Download citation

Received: 30 September 2013
Revised: 25 October 2014
Accepted: 31 October 2014
Published: 16 December 2014
Issue Date: April 2016
DOI: https://doi.org/10.1007/s00138-014-0649-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal information fusion for urban scene understanding

Abstract

Access this article