skip to main content
10.1145/3474085.3475315acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Few-Shot Multi-Agent Perception

Published: 17 October 2021 Publication History

Abstract

We study few-shot learning (FSL) under multi-agent scenarios, in which participating agents only have local scarce labeled data and need to collaborate to predict query data labels. Though each of the agents, such as drones and robots, has minimal communication and computation capability, we aim at designing coordination schemes such that they can collectively perceive the environment accurately and efficiently. We propose a novel metric-based multi-agent FSL framework which has three main components: an efficient communication mechanism that propagates compact and fine-grained query feature maps from query agents to support agents; an asymmetric attention mechanism that computes region-level attention weights between query and support feature maps; and a metric-learning module which calculates the image-level relevance between query and support data fast and accurately. Through analysis and extensive numerical studies, we demonstrate that our approach can save communication and computation costs and significantly improve performance in both visual and acoustic perception tasks such as face identification, semantic segmentation, and sound genre recognition.

Supplementary Material

ZIP File (mfp0840aux.zip)
This auxiliary file contains algorithm and lemma that is referred in main text. I attach the compiled PDF file, along with its tex source code in this package. thanks.

References

[1]
J. Altschuler, J. Niles-Weed, and P. Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In NeurIPS. 1964--1974.
[2]
M. Arjovsky, S. Chintala, and L. Bottou. 2017. Wasserstein generative adversarial networks. In ICML.
[3]
Shane Barratt. 2018. On the differentiability of the solution to convex optimization problems. arXiv preprint arXiv:1804.05098 (2018).
[4]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
[5]
Marco Cuturi. 2013a. Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS. 2292--2300.
[6]
Marco Cuturi. 2013b. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS.
[7]
Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Mike Rabbat, and Joelle Pineau. 2019. Tarmac: Targeted multi-agent communication. In ICML.
[8]
Yubin Duan, Ning Wang, and Jie Wu. 2021. Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication. IEEE Transactions on Network Science and Engineering (2021).
[9]
Chenyou Fan and Jianwei Huang. 2021. Federated Few-Shot Learning with Adversarial Learning. arXiv preprint arXiv:2104.00365 (2021).
[10]
Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J Crandall, and Michael S Ryoo. 2017. Identifying first-person camera wearers in third-person videos. In CVPR.
[11]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. ICML (2017).
[12]
Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In NeurIPS.
[13]
Spyros Gidaris and Nikos Komodakis. 2018. Dynamic Few-Shot Visual Learning Without Forgetting. In CVPR.
[14]
Xiyue Guo, Junjie Hu, Junfeng Chen, Fuqin Deng, and Tin Lun Lam. 2021. Semantic Histogram Based Graph Matching for Real-Time Multi-Robot Global Localization in Large Scale Environment. In IEEE Robotics and Automation Letters.
[15]
Jun He, Richang Hong, Xueliang Liu, Mingliang Xu, Zheng-Jun Zha, and Meng Wang. 2020. Memory-Augmented Relation Network for Few-Shot Learning. In ACM Multimedia.
[16]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In ICCV.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[18]
Chiori Hori, Takaaki Hori, Teng-Yok Lee, Ziming Zhang, Bret Harsham, John R Hershey, Tim K Marks, and Kazuhiko Sumi. 2017. Attention-based multimodal fusion for video description. In ICCV.
[19]
Yedid Hoshen. 2017. Vain: Attentional multi-agent predictive modeling. In NeurIPS.
[20]
Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander G Schwing, and Aniruddha Kembhavi. 2019. Two body problem: Collaborative visual task completion. In CVPR.
[21]
Jiechuan Jiang and Zongqing Lu. 2018. Learning attentional communication for multi-agent cooperation. In NeurIPS.
[22]
Philip A Knight. 2008. The Sinkhorn--Knopp algorithm: convergence and applications. SIAM J. Matrix Anal. Appl. (2008).
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.
[24]
Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, and Liwei Wang. 2020 a. Boosting Few-Shot Learning With Adaptive Margin Loss. In CVPR.
[25]
Peike Li, Yunchao Wei, and Yi Yang. 2020 b. Meta parsing networks: Towards generalized few-shot scene parsing with adaptive metric learning. In ACM Multimedia.
[26]
Wenbin Li et al. 2019. Revisiting local descriptor based image-to-class measure for few-shot learning. In CVPR.
[27]
Tianyi Lin, Chenyou Fan, Nhat Ho, Marco Cuturi, and Michael I. Jordan. 2020. Projection Robust Wasserstein Distance and Riemannian Optimization. In NeurIPS.
[28]
T. Lin, N. Ho, and M. Jordan. 2019 a. On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms. In ICML. 3982--3991.
[29]
T. Lin, N. Ho, and M. I. Jordan. 2019 b. On the acceleration of the Sinkhorn and Greenkhorn algorithms for optimal transport. ArXiv Preprint: 1906.01437 (2019).
[30]
Lizhao Liu, Junyi Cao, Minqian Liu, Yong Guo, Qi Chen, and Mingkui Tan. 2020 a. Dynamic Extension Nets for Few-shot Semantic Segmentation. In ACM Multimedia.
[31]
Yen-Cheng Liu, Junjiao Tian, Nathaniel Glaser, and Zsolt Kira. 2020 b. When2com: Multi-Agent Perception via Communication Graph Grouping. In CVPR.
[32]
Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, and Zsolt Kira. 2020 c. Who2com: Collaborative perception via learnable handshake communication. (2020).
[33]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV.
[34]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR.
[35]
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. (2015).
[36]
Boris Oreshkin, Pau Rodr'iguez López, and Alexandre Lacoste. 2018. Tadam: Task dependent adaptive metric for improved few-shot learning. In NIPS.
[37]
O. Pele and M. Werman. 2009. Fast and robust earth mover's distances. In ICCV.
[38]
Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, and Jun Wang. 2017. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017).
[39]
Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, and Jinhui Tang. 2019. Few-Shot Image Recognition With Knowledge Transfer. In ICCV.
[40]
A Piergiovanni, Chenyou Fan, and Michael Ryoo. 2017. Learning latent subevents in activity videos using temporal attention filters. In AAAI.
[41]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In CVPR.
[42]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS.
[43]
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 1998. A metric for distributions with applications to image databases. In ICCV.
[44]
Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and service robotics.
[45]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS.
[46]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In NIPS.
[47]
Sainbayar Sukhbaatar, Rob Fergus, et al. 2016. Learning multiagent communication with backpropagation. In NeurIPS.
[48]
Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-transfer learning for few-shot learning. CVPR (2019).
[49]
Wen Sun, Ning Xu, Lu Wang, Haibin Zhang, and Yan Zhang. 2020. Dynamic Digital Twin and Federated Learning with Incentives for Air-Ground Networks. IEEE Transactions on Network Science and Engineering (2020).
[50]
Flood Sung et al. 2018. Learning to compare: Relation network for few-shot learning. CVPR (2018).
[51]
Ming Tan. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In ICML.
[52]
Hao Tang, Zechao Li, Zhimao Peng, and Jinhui Tang. 2020. BlockMix: Meta Regularization and Self-Calibrated Inference for Metric-Based Meta-Learning. In ACM Multimedia.
[53]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning Spatio-temporal Features with 3d Convolutional Networks. In ICCV.
[54]
G. Tzanetakis and P. Cook. 2002. Musical genre classification of audio signals". IEEE Transactions on Speech and Audio Processing" (2002).
[55]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
[56]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. In NIPS.
[57]
Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019. Panet: Few-shot image semantic segmentation with prototype alignment. In ICCV.
[58]
Tian Wang, Yan Liu, Xi Zheng, Hong-Ning Dai, Weijia Jia, and Mande Xie. 2021. Edge-Based Communication Optimization for Distributed Federated Learning. IEEE Transactions on Network Science and Engineering (2021).
[59]
Zeyuan Wang, Yifan Zhao, Jia Li, and Yonghong Tian. 2020. Cooperative Bi-Path Metric for Few-Shot Learning. In ACM Multimedia.
[60]
Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S Ryoo, and David J Crandall. 2018. Joint person segmentation and identification in synchronized first-and third-person videos. In ECCV.
[61]
Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. 2010. Deconvolutional networks. In CVPR.
[62]
Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. 2020. DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers. In CVPR.
[63]
Peng Zhao and Zhi-Hua Zhou. 2018. Label distribution learning by optimal transport. In AAAI.
[64]
Qi Zhao, Zhi Yang, and Hai Tao. 2008. Differential earth mover's distance with its applications to visual tracking. PAMI (2008).

Cited By

View all
  • (2024)CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsIEEE Robotics and Automation Letters10.1109/LRA.2024.34062079:7(6416-6423)Online publication date: Jul-2024
  • (2023)Few-Shot Multi-Agent Perception With Ranking-Based Feature LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.328575545:10(11810-11823)Online publication date: 1-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. few-shot learning
  2. image and audio classification
  3. multi-agent perception
  4. semantic segmentation

Qualifiers

  • Research-article

Funding Sources

  • Shenzhen Institute of Artificial Intelligence and Robotics for Society
  • Presidential Fund from the Chinese University of Hong Kong, Shenzhen

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)8
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsIEEE Robotics and Automation Letters10.1109/LRA.2024.34062079:7(6416-6423)Online publication date: Jul-2024
  • (2023)Few-Shot Multi-Agent Perception With Ranking-Based Feature LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.328575545:10(11810-11823)Online publication date: 1-Oct-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media