skip to main content
10.1145/3691555.3696829acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article

Semi-Supervised Multi-modal Sensor Fusion Framework for In-Vehicle Networks

Published: 19 November 2024 Publication History

Abstract

With the rapid development of technologies such as autonomous driving, vehicle-to-everything communication, and edge computing, an increasing number of vehicles are equipped with multiple sensors to perceive the surroundings. As a result, the amount of sensing data has exploded, and the communication pressure on the in-vehicle network becomes severe. In-sensor or near-sensor computation is considered an effective method to address these issues. However, current multi-modal fusion frameworks are challenging to be modularised and trained in a distributed manner across multiple devices. In this paper, we propose a variational autoencoder (VAE) based multi-modal fusion solution with its theoretical analysis framework. Notably, we design two auxiliary tasks to utilize data from a single modality to discover the joint distribution of multiple modalities. Compared to traditional algorithms, the proposed solution is able to use unlabeled data for self-supervised learning and has the added advantage of modularity, which helps to reduce the communication overhead in in-vehicle networks. Experiments show that, compared to single-modality algorithms, our multi-modal fusion framework increases average precision by over 10% on the KITTI dataset.

References

[1]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. 2018. MINE: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018).
[2]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 3354--3361.
[3]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[4]
Aditya Prakash, Kashyap Chitta, and Andreas Geiger. 2021. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7077--7087.
[5]
Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7]
Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel López-Antequera, and Peter Kontschieder. 2019. Disentangling monocular 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1991--1999.
[8]
Jean Walrand, Max Turner, and Roy Myers. 2021. An architecture for in-vehicle networks. IEEE Transactions on Vehicular Technology 70, 7 (2021), 6335--6342.
[9]
Liang Xie, Chao Xiang, Zhengxu Yu, Guodong Xu, Zheng Yang, Deng Cai, and Xiaofei He. 2020. PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 12460--12467.
[10]
Danfei Xu, Dragomir Anguelov, and Ashesh Jain. 2018. Pointfusion: Deep sensor fusion for 3d bounding box estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 244--253.
[11]
Feichi Zhou and Yang Chai. 2020. Near-sensor and in-sensor computing. Nature Electronics 3, 11 (2020), 664--671.

Index Terms

  1. Semi-Supervised Multi-modal Sensor Fusion Framework for In-Vehicle Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MobiArch '24: Proceedings of the 19th Workshop on Mobility in the Evolving Internet Architecture
    November 2024
    51 pages
    ISBN:9798400712470
    DOI:10.1145/3691555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 November 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. In-vehicle networks
    2. multi-modality learning
    3. variational autoencoder

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACM MobiCom '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 47 of 92 submissions, 51%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 62
      Total Downloads
    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media