skip to main content
10.1145/3581783.3611699acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

What2comm: Towards Communication-efficient Collaborative Perception via Feature Decoupling

Published:27 October 2023Publication History

ABSTRACT

Multi-agent collaborative perception has received increasing attention recently as an emerging application in driving scenarios. Despite advancements in previous approaches, challenges remain due to redundant communication patterns and vulnerable collaboration processes. To address these issues, we propose What2comm, an end-to-end collaborative perception framework to achieve a trade-off between perception performance and communication bandwidth. Our novelties lie in three aspects. First, we design an efficient communication mechanism based on feature decoupling to transmit exclusive and common feature maps among heterogeneous agents to provide perceptually holistic messages. Secondly, a spatio-temporal collaboration module is introduced to integrate complementary information from collaborators and temporal ego cues, leading to a robust collaboration procedure against transmission delay and localization errors. Ultimately, we propose a common-aware fusion strategy to refine final representations with informative common features. Comprehensive experiments in real-world and simulated scenarios demonstrate the effectiveness of What2comm.

References

  1. Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan, and Dumitru Erhan. 2016. Domain separation networks. In Advances in Neural Information Processing Systems (NIPS), Vol. 29.Google ScholarGoogle Scholar
  2. Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Zhenguo Li, and Ping Luo. 2022b. CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving. arXiv preprint arXiv:2206.04028 (2022).Google ScholarGoogle Scholar
  3. Zhaoyu Chen, Bo Li, Shuang Wu, Jianghe Xu, Shouhong Ding, and Wenqiang Zhang. 2022a. Shape matters: deformable patch attack. In Proceedings of the European Conference on Computer Vision (ECCV). 529--548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Conference on Robot Learning (CoRL). PMLR, 1--16.Google ScholarGoogle Scholar
  5. Yangtao Du, Dingkang Yang, Peng Zhai, Mingchen Li, and Lihua Zhang. 2021. Learning Associative Representation for Facial Expression Recognition. In IEEE International Conference on Image Processing (ICIP). 889--893.Google ScholarGoogle Scholar
  6. Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Siheng Chen. 2022. Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps. In Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  7. Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, and Yanfeng Wang. 2023. Collaboration Helps Camera Overtake LiDAR in 3D Detection. arXiv preprint arXiv:2303.13560 (2023).Google ScholarGoogle Scholar
  8. Hyunjik Kim and Andriy Mnih. 2018. Disentangling by factorising. In International Conference on Machine Learning (ICML). PMLR, 2649--2658.Google ScholarGoogle Scholar
  9. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  10. Haopeng Kuang, Dingkang Yang, Shunli Wang, Xiaoying Wang, and Lihua Zhang. 2023. Towards Simultaneous Segmentation Of Liver Tumors And Intrahepatic Vessels Via Cross-Attention Mechanism. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1--5.Google ScholarGoogle Scholar
  11. Alex H Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12697--12705.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yuxuan Lei, Dingkang Yang, Mingcheng Li, Shunli Wang, Jiawei Chen, and Lihua Zhang. 2023. Text-oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences. arXiv preprint arXiv:2307.13205 (2023).Google ScholarGoogle Scholar
  13. Zixing Lei, Shunli Ren, Yue Hu, Wenjun Zhang, and Siheng Chen. 2022. Latency-aware collaborative perception. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 316--332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jinlong Li, Runsheng Xu, Xinyu Liu, Jin Ma, Zicheng Chi, Jiaqi Ma, and Hongkai Yu. 2023. Learning for vehicle-to-vehicle cooperative perception under lossy communication. IEEE Transactions on Intelligent Vehicles (2023).Google ScholarGoogle ScholarCross RefCross Ref
  15. Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, and Wenjun Zhang. 2021. Learning distilled collaboration graph for multi-agent perception. Advances in Neural Information Processing Systems (NIPS), Vol. 34 (2021), 29541--29552.Google ScholarGoogle Scholar
  16. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017a. Feature pyramid networks for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2117--2125.Google ScholarGoogle ScholarCross RefCross Ref
  17. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017b. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2980--2988.Google ScholarGoogle ScholarCross RefCross Ref
  18. Siao Liu, Zhaoyu Chen, Yang Liu, Yuzheng Wang, Zhao Zhile Yang, Dingkang, Ziqing Zhou, Xie Yi, Wei Li, Wenqiang Zhang, and Gan Zhongxue. 2023 a. Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation. arXiv preprint arXiv:2308.01194 (2023).Google ScholarGoogle Scholar
  19. Yang Liu, Jing Liu, Kun Yang, Bobo Ju, Siao Liu, Yuzheng Wang, Dingkang Yang, Peng Sun, and Liang Song. 2023 b. AMP-Net: Appearance-Motion Prototype Network Assisted Automatic Video Anomaly Detection System. IEEE Transactions on Industrial Informatics (2023), 1--13. https://doi.org/10.1109/TII.2023.3298476Google ScholarGoogle ScholarCross RefCross Ref
  20. Yang Liu, Jing Liu, Mengyang Zhao, Dingkang Yang, Xiaoguang Zhu, and Liang Song. 2022. Learning appearance-motion normality for video anomaly detection. In 2022 IEEE International Conference on Multimedia and Expo (ICME). 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  21. Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, and Liang Song. 2023 c. Generalized video anomaly event detection: Systematic taxonomy and comparison of deep models. arXiv preprint arXiv:2302.05087 (2023).Google ScholarGoogle Scholar
  22. Yen-Cheng Liu, Junjiao Tian, Nathaniel Glaser, and Zsolt Kira. 2020. When2com: Multi-agent perception via communication graph grouping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4106--4115.Google ScholarGoogle ScholarCross RefCross Ref
  23. Guiyang Luo, Hui Zhang, Quan Yuan, and Jinglin Li. 2022. Complementarity-Enhanced and Redundancy-Minimized Collaboration Network for Multi-agent Perception. In Proceedings of the ACM International Conference on Multimedia (ACM MM). 3578--3586.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International Conference on Machine Learning (ICML). PMLR, 2642--2651.Google ScholarGoogle Scholar
  25. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (NIPS), Vol. 32 (2019).Google ScholarGoogle Scholar
  26. Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 652--660.Google ScholarGoogle Scholar
  27. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234--241.Google ScholarGoogle Scholar
  28. Nicholas Vadivelu, Mengye Ren, James Tu, Jingkang Wang, and Raquel Urtasun. 2021. Learning to communicate and correct pose errors. In Conference on Robot Learning (CoRL). PMLR, 1195--1210.Google ScholarGoogle Scholar
  29. Jiafeng Wang, Zhaoyu Chen, Kaixun Jiang, Dingkang Yang, Lingyi Hong, Yan Wang, and Wenqiang Zhang. 2022. Boosting the Transferability of Adversarial Attacks with Global Momentum Initialization. arXiv preprint arXiv:2211.11236 (2022).Google ScholarGoogle Scholar
  30. Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, and Lihua Zhang. 2021. Tsa-net: Tube self-attention network for action quality assessment. In Proceedings of the 29th ACM International Conference on Multimedia (ACM MM). 4902--4910.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tianhang Wang, Guang Chen, Kai Chen, Zhengfa Liu, Bo Zhang, Alois Knoll, and Changjun Jiang. 2023. UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework. arXiv preprint arXiv:2303.12400 (2023).Google ScholarGoogle Scholar
  32. Tsun-Hsuan Wang, Sivabalan Manivasagam, Ming Liang, Bin Yang, Wenyuan Zeng, and Raquel Urtasun. 2020. V2vnet: Vehicle-to-vehicle communication for joint perception and prediction. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 605--621.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xiang Wu, Huaibo Huang, Vishal M Patel, Ran He, and Zhenan Sun. 2019. Disentangled variational representation for heterogeneous face recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 9005--9012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hao Xiang, Runsheng Xu, Xin Xia, Zhaoliang Zheng, Bolei Zhou, and Jiaqi Ma. 2023. V2xp-asg: Generating adversarial scenes for vehicle-to-everything perception. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3584--3591.Google ScholarGoogle ScholarCross RefCross Ref
  35. Runsheng Xu, Weizhe Chen, Hao Xiang, Xin Xia, Lantao Liu, and Jiaqi Ma. 2023 a. Model-agnostic multi-agent perception framework. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1471--1478.Google ScholarGoogle ScholarCross RefCross Ref
  36. Runsheng Xu, Yi Guo, Xu Han, Xin Xia, Hao Xiang, and Jiaqi Ma. 2021. OpenCDA: an open cooperative driving automation framework integrated with co-simulation. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 1155--1162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Runsheng Xu, Jinlong Li, Xiaoyu Dong, Hongkai Yu, and Jiaqi Ma. 2022a. Bridging the domain gap for multi-agent perception. arXiv preprint arXiv:2210.08451 (2022).Google ScholarGoogle Scholar
  38. Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, and Jiaqi Ma. 2022b. CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers. In Conference on Robot Learning (CoRL).Google ScholarGoogle Scholar
  39. Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xiaoyu Dong, Rui Song, et al. 2023 b. V2v4real: A real-world large-scale dataset for vehicle-to-vehicle cooperative perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13712--13722.Google ScholarGoogle ScholarCross RefCross Ref
  40. Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, and Jiaqi Ma. 2022c. V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Runsheng Xu, Hao Xiang, Xin Xia, Xu Han, Jinlong Li, and Jiaqi Ma. 2022d. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In Proceedings of the International Conference on Robotics and Automation (ICRA). IEEE, 2583--2589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dingkang Yang, Zhaoyu Chen, Yuzheng Wang, Shunli Wang, Mingcheng Li, Siao Liu, Xiao Zhao, Shuai Huang, Zhiyan Dong, Peng Zhai, and Lihua Zhang. 2023 a. Context De-Confounded Emotion Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19005--19015.Google ScholarGoogle ScholarCross RefCross Ref
  43. Dingkang Yang, Shuai Huang, Haopeng Kuang, Yangtao Du, and Lihua Zhang. 2022a. Disentangled Representation Learning for Multimodal Emotion Recognition. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM). 1642--1651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Dingkang Yang, Shuai Huang, Yang Liu, and Lihua Zhang. 2022b. Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition. IEEE Signal Processing Letters, Vol. 29 (2022), 2093--2097.Google ScholarGoogle ScholarCross RefCross Ref
  45. Dingkang Yang, Shuai Huang, Shunli Wang, Yang Liu, Peng Zhai, Liuzhen Su, Mingcheng Li, and Lihua Zhang. 2022c. Emotion Recognition for Multiple Context Awareness. In Proceedings of the European Conference on Computer Vision (ECCV), Vol. 13697. Springer, 144--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, et al. 2023 b. AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception. arXiv preprint arXiv:2307.13933 (2023).Google ScholarGoogle Scholar
  47. Dingkang Yang, Haopeng Kuang, Shuai Huang, and Lihua Zhang. 2022d. Learning Modality-Specific and -Agnostic Representations for Asynchronous Multimodal Language Sequences. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM). 1708--1717.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Dingkang Yang, Yang Liu, Can Huang, Mingcheng Li, Xiao Zhao, Yuzheng Wang, Kun Yang, Yan Wang, Peng Zhai, and Lihua Zhang. 2023 c. Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences. Knowledge-Based Systems 2023), 110370.Google ScholarGoogle Scholar
  49. Kun Yang, Jing Liu, Dingkang Yang, Hanqi Wang, Peng Sun, Yanni Zhang, Yan Liu, and Liang Song. 2023 d. A novel efficient Multi-view traffic-related object detection framework. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  50. Kun Yang, Peng Sun, Jieyu Lin, Azzedine Boukerche, and Liang Song. 2022 e. A novel distributed task scheduling framework for supporting vehicular edge intelligence. In IEEE International Conference on Distributed Computing Systems (ICDCS). 972--982.Google ScholarGoogle ScholarCross RefCross Ref
  51. Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, and Liang Song. 2023 e. Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception. arXiv preprint arXiv:2307.13929 (2023).Google ScholarGoogle Scholar
  52. Haibao Yu, Yizhen Luo, Mao Shu, Yiyi Huo, Zebang Yang, Yifeng Shi, Zhenglong Guo, Hanyu Li, Xing Hu, Jirui Yuan, et al. 2022. DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21361--21370.Google ScholarGoogle ScholarCross RefCross Ref
  53. Haibao Yu, Yingjuan Tang, Enze Xie, Jilei Mao, Jirui Yuan, Ping Luo, and Zaiqing Nie. 2023. Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction. arXiv preprint arXiv:2303.10552 (2023).Google ScholarGoogle Scholar
  54. Xiaoding Yuan, Adam Kortylewski, Yihong Sun, and Alan Yuille. 2021a. Robust instance segmentation through reasoning about multi-object occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11141--11150.Google ScholarGoogle ScholarCross RefCross Ref
  55. Zhenxun Yuan, Xiao Song, Lei Bai, Zhe Wang, and Wanli Ouyang. 2021b. Temporal-channel transformer for 3d lidar-based video object detection for autonomous driving. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 4 (2021), 2068--2078.Google ScholarGoogle ScholarCross RefCross Ref
  56. Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. 2017. Central moment discrepancy (cmd) for domain-invariant representation learning. arXiv preprint arXiv:1702.08811 (2017).Google ScholarGoogle Scholar
  57. Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, and Changsheng Li. 2021. Deep dense multi-scale network for snow removal using semantic and depth priors. IEEE Transactions on Image Processing, Vol. 30 (2021), 7419--7431.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xiao Zhao, Liuzhen Su, Xukun Zhang, Dingkang Yang, Mingyang Sun, Shunli Wang, Peng Zhai, and Lihua Zhang. 2023. D-CONFORMER: Deformable Sparse Transformer Augmented Convolution for Voxel-Based 3D Object Detection. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  59. Ruichao Zhu, Jiafu Wang, Tianshuo Qiu, Dingkang Yang, Bo Feng, Zuntian Chu, Tonghao Liu, Yajuan Han, Hongya Chen, and Shaobo Qu. 2023. Direct field-to-pattern monolithic design of holographic metasurface via residual encoder-decoder convolutional neural network. Opto-Electronic Advances (2023), 220148-1.Google ScholarGoogle Scholar
  60. Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar

Index Terms

  1. What2comm: Towards Communication-efficient Collaborative Perception via Feature Decoupling

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '23: Proceedings of the 31st ACM International Conference on Multimedia
          October 2023
          9913 pages
          ISBN:9798400701085
          DOI:10.1145/3581783

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 October 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader