research-article

Few-Shot Multi-Agent Perception

Authors:

Jianwei HuangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 1712 - 1720

https://doi.org/10.1145/3474085.3475315

Published: 17 October 2021 Publication History

Abstract

We study few-shot learning (FSL) under multi-agent scenarios, in which participating agents only have local scarce labeled data and need to collaborate to predict query data labels. Though each of the agents, such as drones and robots, has minimal communication and computation capability, we aim at designing coordination schemes such that they can collectively perceive the environment accurately and efficiently. We propose a novel metric-based multi-agent FSL framework which has three main components: an efficient communication mechanism that propagates compact and fine-grained query feature maps from query agents to support agents; an asymmetric attention mechanism that computes region-level attention weights between query and support feature maps; and a metric-learning module which calculates the image-level relevance between query and support data fast and accurately. Through analysis and extensive numerical studies, we demonstrate that our approach can save communication and computation costs and significantly improve performance in both visual and acoustic perception tasks such as face identification, semantic segmentation, and sound genre recognition.

Supplementary Material

ZIP File (mfp0840aux.zip)

This auxiliary file contains algorithm and lemma that is referred in main text. I attach the compiled PDF file, along with its tex source code in this package. thanks.

Download
282.39 KB

References

[1]

J. Altschuler, J. Niles-Weed, and P. Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In NeurIPS. 1964--1974.

Digital Library

[2]

M. Arjovsky, S. Chintala, and L. Bottou. 2017. Wasserstein generative adversarial networks. In ICML.

Digital Library

[3]

Shane Barratt. 2018. On the differentiability of the solution to convex optimization problems. arXiv preprint arXiv:1804.05098 (2018).

[4]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).

[5]

Marco Cuturi. 2013a. Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS. 2292--2300.

Digital Library

[6]

Marco Cuturi. 2013b. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS.

Digital Library

[7]

Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Mike Rabbat, and Joelle Pineau. 2019. Tarmac: Targeted multi-agent communication. In ICML.

[8]

Yubin Duan, Ning Wang, and Jie Wu. 2021. Minimizing Training Time of Distributed Machine Learning by Reducing Data Communication. IEEE Transactions on Network Science and Engineering (2021).

[9]

Chenyou Fan and Jianwei Huang. 2021. Federated Few-Shot Learning with Adversarial Learning. arXiv preprint arXiv:2104.00365 (2021).

[10]

Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J Crandall, and Michael S Ryoo. 2017. Identifying first-person camera wearers in third-person videos. In CVPR.

[11]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. ICML (2017).

Digital Library

[12]

Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In NeurIPS.

Digital Library

[13]

Spyros Gidaris and Nikos Komodakis. 2018. Dynamic Few-Shot Visual Learning Without Forgetting. In CVPR.

[14]

Xiyue Guo, Junjie Hu, Junfeng Chen, Fuqin Deng, and Tin Lun Lam. 2021. Semantic Histogram Based Graph Matching for Real-Time Multi-Robot Global Localization in Large Scale Environment. In IEEE Robotics and Automation Letters.

[15]

Jun He, Richang Hong, Xueliang Liu, Mingliang Xu, Zheng-Jun Zha, and Meng Wang. 2020. Memory-Augmented Relation Network for Few-Shot Learning. In ACM Multimedia.

Digital Library

[16]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In ICCV.

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.

[18]

Chiori Hori, Takaaki Hori, Teng-Yok Lee, Ziming Zhang, Bret Harsham, John R Hershey, Tim K Marks, and Kazuhiko Sumi. 2017. Attention-based multimodal fusion for video description. In ICCV.

[19]

Yedid Hoshen. 2017. Vain: Attentional multi-agent predictive modeling. In NeurIPS.

Digital Library

[20]

Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander G Schwing, and Aniruddha Kembhavi. 2019. Two body problem: Collaborative visual task completion. In CVPR.

[21]

Jiechuan Jiang and Zongqing Lu. 2018. Learning attentional communication for multi-agent cooperation. In NeurIPS.

Digital Library

[22]

Philip A Knight. 2008. The Sinkhorn--Knopp algorithm: convergence and applications. SIAM J. Matrix Anal. Appl. (2008).

Digital Library

[23]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.

Digital Library

[24]

Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, and Liwei Wang. 2020 a. Boosting Few-Shot Learning With Adaptive Margin Loss. In CVPR.

[25]

Peike Li, Yunchao Wei, and Yi Yang. 2020 b. Meta parsing networks: Towards generalized few-shot scene parsing with adaptive metric learning. In ACM Multimedia.

Digital Library

[26]

Wenbin Li et al. 2019. Revisiting local descriptor based image-to-class measure for few-shot learning. In CVPR.

[27]

Tianyi Lin, Chenyou Fan, Nhat Ho, Marco Cuturi, and Michael I. Jordan. 2020. Projection Robust Wasserstein Distance and Riemannian Optimization. In NeurIPS.

[28]

T. Lin, N. Ho, and M. Jordan. 2019 a. On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms. In ICML. 3982--3991.

[29]

T. Lin, N. Ho, and M. I. Jordan. 2019 b. On the acceleration of the Sinkhorn and Greenkhorn algorithms for optimal transport. ArXiv Preprint: 1906.01437 (2019).

[30]

Lizhao Liu, Junyi Cao, Minqian Liu, Yong Guo, Qi Chen, and Mingkui Tan. 2020 a. Dynamic Extension Nets for Few-shot Semantic Segmentation. In ACM Multimedia.

Digital Library

[31]

Yen-Cheng Liu, Junjiao Tian, Nathaniel Glaser, and Zsolt Kira. 2020 b. When2com: Multi-Agent Perception via Communication Graph Grouping. In CVPR.

[32]

Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, and Zsolt Kira. 2020 c. Who2com: Collaborative perception via learnable handshake communication. (2020).

[33]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV.

Digital Library

[34]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR.

[35]

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. (2015).

[36]

Boris Oreshkin, Pau Rodr'iguez López, and Alexandre Lacoste. 2018. Tadam: Task dependent adaptive metric for improved few-shot learning. In NIPS.

Digital Library

[37]

O. Pele and M. Werman. 2009. Fast and robust earth mover's distances. In ICCV.

[38]

Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, and Jun Wang. 2017. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017).

[39]

Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, and Jinhui Tang. 2019. Few-Shot Image Recognition With Knowledge Transfer. In ICCV.

[40]

A Piergiovanni, Chenyou Fan, and Michael Ryoo. 2017. Learning latent subevents in activity videos using temporal attention filters. In AAAI.

Digital Library

[41]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In CVPR.

[42]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS.

Digital Library

[43]

Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 1998. A metric for distributions with applications to image databases. In ICCV.

Digital Library

[44]

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and service robotics.

[45]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS.

Digital Library

[46]

Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In NIPS.

Digital Library

[47]

Sainbayar Sukhbaatar, Rob Fergus, et al. 2016. Learning multiagent communication with backpropagation. In NeurIPS.

Digital Library

[48]

Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-transfer learning for few-shot learning. CVPR (2019).

[49]

Wen Sun, Ning Xu, Lu Wang, Haibin Zhang, and Yan Zhang. 2020. Dynamic Digital Twin and Federated Learning with Incentives for Air-Ground Networks. IEEE Transactions on Network Science and Engineering (2020).

[50]

Flood Sung et al. 2018. Learning to compare: Relation network for few-shot learning. CVPR (2018).

[51]

Ming Tan. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In ICML.

Digital Library

[52]

Hao Tang, Zechao Li, Zhimao Peng, and Jinhui Tang. 2020. BlockMix: Meta Regularization and Self-Calibrated Inference for Metric-Based Meta-Learning. In ACM Multimedia.

Digital Library

[53]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning Spatio-temporal Features with 3d Convolutional Networks. In ICCV.

Digital Library

[54]

G. Tzanetakis and P. Cook. 2002. Musical genre classification of audio signals". IEEE Transactions on Speech and Audio Processing" (2002).

[55]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.

Digital Library

[56]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. In NIPS.

Digital Library

[57]

Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019. Panet: Few-shot image semantic segmentation with prototype alignment. In ICCV.

[58]

Tian Wang, Yan Liu, Xi Zheng, Hong-Ning Dai, Weijia Jia, and Mande Xie. 2021. Edge-Based Communication Optimization for Distributed Federated Learning. IEEE Transactions on Network Science and Engineering (2021).

[59]

Zeyuan Wang, Yifan Zhao, Jia Li, and Yonghong Tian. 2020. Cooperative Bi-Path Metric for Few-Shot Learning. In ACM Multimedia.

Digital Library

[60]

Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S Ryoo, and David J Crandall. 2018. Joint person segmentation and identification in synchronized first-and third-person videos. In ECCV.

[61]

Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. 2010. Deconvolutional networks. In CVPR.

[62]

Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. 2020. DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers. In CVPR.

[63]

Peng Zhao and Zhi-Hua Zhou. 2018. Label distribution learning by optimal transport. In AAAI.

[64]

Qi Zhao, Zhi Yang, and Hai Tao. 2008. Differential earth mover's distance with its applications to visual tracking. PAMI (2008).

Digital Library

Cited By

Zhou YQuang LNieto-Granda CLoianno G(2024)CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsIEEE Robotics and Automation Letters10.1109/LRA.2024.34062079:7(6416-6423)Online publication date: Jul-2024
https://doi.org/10.1109/LRA.2024.3406207
Fan CHu JHuang J(2023)Few-Shot Multi-Agent Perception With Ranking-Based Feature LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.328575545:10(11810-11823)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3285755

Index Terms

Few-Shot Multi-Agent Perception
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
      2. Computer vision tasks
    2. Distributed artificial intelligence
      1. Multi-agent systems
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval

Recommendations

Few-shot 3D Point Cloud Semantic Segmentation with Prototype Alignment
ICMLT '23: Proceedings of the 2023 8th International Conference on Machine Learning Technologies

Semantic Segmentation for 3D point clouds has made great progress in recent years. Most existing approaches for 3D point cloud segmentation are fully supervised, and they require a large number of well-annotated data for training. The training data is ...
Semantic guide for semi-supervised few-shot multi-label node classification
Abstract
We study a new research problem named semi-supervised few-shot multi-label node classification which has the following characteristics: 1) the extreme imbalance between the number of labeled and unlabeled nodes that are connected on ...
Few-Shot Adaptation for Multimedia Semantic Indexing
MM '18: Proceedings of the 26th ACM international conference on Multimedia

We propose a few-shot adaptation framework, which bridges zero-shot learning and supervised many-shot learning, for semantic indexing of image and video data. Few-shot adaptation provides robust parameter estimation with few training examples, by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shenzhen Institute of Artificial Intelligence and Robotics for Society
Presidential Fund from the Chinese University of Hong Kong, Shenzhen

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
265
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)8

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou YQuang LNieto-Granda CLoianno G(2024)CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World EnvironmentsIEEE Robotics and Automation Letters10.1109/LRA.2024.34062079:7(6416-6423)Online publication date: Jul-2024
https://doi.org/10.1109/LRA.2024.3406207
Fan CHu JHuang J(2023)Few-Shot Multi-Agent Perception With Ranking-Based Feature LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.328575545:10(11810-11823)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3285755

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents