MIMF: Mutual Information-Driven Multimodal Fusion

Zou, Zhenhong; Zhao, Linhao; Zhang, Xinyu; Li, Zhiwei; Jin, Dafeng; Luo, Tao

doi:10.1007/978-981-16-2336-3_13

Zhenhong Zou^8,9,
Linhao Zhao^8,9,
Xinyu Zhang^8,9,
Zhiwei Li^8,9,
Dafeng Jin^8,9 &
…
Tao Luo¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1397))

Included in the following conference series:

International Conference on Cognitive Systems and Signal Processing

1610 Accesses
1 Citations

Abstract

In this paper, we propose a novel adaptive multimodal fusion network MIMF that is driven by the mutual information between the input data and the target recognition pattern. Due to the variant weather and road conditions, the real scenes can be far more complicated than those in the training dataset. That constructs a non-ignorable challenge for multimodal fusion models that obey fixed fusion modes, especially for autonomous driving. To address the problem, we leverage mutual information for adaptive modal selection in fusion, which measures the relation between the input and target output. We therefore design a weight-fusion module based on MI, and integrate it into our feature fusion lane line segmentation network. We evaluate it with the KITTI and A2D2 datasets, in which we simulate the extreme malfunction of sensors like modality loss problem. The result demonstrates the benefit of our method in practical application, and informs the future research into development of multimodal fusion as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Feng, D., et al.: Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans. Intell. Transport. Syst (2019)
Google Scholar
Caltagirone, L., Bellone, M., Svensson, L., Wahde, M.: LIDAR-camera fusion for road detection using fully convolutional neural networks. Robot. Autonom. Syst. 111, 125–131 (2019)
Article Google Scholar
Mario, B., et al.: Seeing through fog without seeing fog: deep multimodal sensor fusion in unseen adverse weather. In: CVPR (2020)
Google Scholar
Carballo, A. et al.: LIBRE: The multiple 3D LiDAR dataset. ArXiv, abs/2003.06129
Google Scholar
Vora, S., Lang, A., Helou, B., Beijbom, O.: PointPainting: sequential fusion for 3D object detection. In: CVPR (2020)
Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.: Frustum PointNets for 3D object detection from RGB-D data. In: CVPR (2018)
Google Scholar
Su, Y., Gao, Y., Zhang, Y., Álvarez, J.M., Yang, J., Kong, H.: An illumination-invariant nonparametric model for urban road detection. IEEE Trans. Intell. Vehicles 4, 14–23 (2019)
Article Google Scholar
Yang, B., Liang, M., Urtasun, R.: HDNET: exploiting HD maps for 3D object detection. In: CoRL (2018)
Google Scholar
Kim, J., Koh, J., Kim, Y., Choi, J., Hwang, Y., Choi, J.W.: Robust deep multi-modal learning based on gated information fusion network. In: ACCV (2018)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Article MathSciNet Google Scholar
Gabrié, M., et al.: Entropy and mutual information in models of deep neural networks. In: NeurIPS (2018)
Google Scholar
Belghazi, M.I., et al.: Mutual information neural estimation. In: ICML (2018)
Google Scholar
Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization (2019)
Google Scholar
Bramon, R., et al.: Multimodal data fusion based on mutual information. IEEE Trans. Visual. Comput. Graph. 18, 1574–1587 (2012)
Article Google Scholar
Yousef, A., Iftekharuddin, K.: Shoreline extraction from the fusion of LiDAR DEM data and aerial images using mutual information and genetic algorithms. In: IJCNN (2014)
Google Scholar
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: spatial CNN for traffic scene understanding. In: AAAI (2018)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
Geyer, J., et al.: A2D2: Audi autonomous driving dataset. ArXiv, abs/2004.06320
Google Scholar

Download references

Acknowledgements

This work was supported by the National High Technology Research and Development Program of China under Grant No. 2018YFE0204300, the Beijing Science and Technology Plan Project No. Z191100007419008, the Guoqiang Research Institute Project No. 2019GQG1010, and the National Natural Science Foundation of China under Grant No. U1964203.

Author information

Authors and Affiliations

State Key Laboratory of Automotive Safety and Energy, Tsinghua University, Beijing, 100084, China
Zhenhong Zou, Linhao Zhao, Xinyu Zhang, Zhiwei Li & Dafeng Jin
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China
Zhenhong Zou, Linhao Zhao, Xinyu Zhang, Zhiwei Li & Dafeng Jin
China North Vehicle Research Institute, Beijing, 100072, China
Tao Luo

Authors

Zhenhong Zou
View author publications
You can also search for this author in PubMed Google Scholar
Linhao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Dafeng Jin
View author publications
You can also search for this author in PubMed Google Scholar
Tao Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinyu Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fuchun Sun
Tsinghua University, Beijing, China
Huaping Liu
Tsinghua University, Beijing, China
Bin Fang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, Z., Zhao, L., Zhang, X., Li, Z., Jin, D., Luo, T. (2021). MIMF: Mutual Information-Driven Multimodal Fusion. In: Sun, F., Liu, H., Fang, B. (eds) Cognitive Systems and Signal Processing. ICCSIP 2020. Communications in Computer and Information Science, vol 1397. Springer, Singapore. https://doi.org/10.1007/978-981-16-2336-3_13

Download citation

DOI: https://doi.org/10.1007/978-981-16-2336-3_13
Published: 05 May 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2335-6
Online ISBN: 978-981-16-2336-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics