research-article

Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs

Authors:
Jiangxin Sun

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Jiafeng Xie

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Jian-Fang Hu

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Zihang Lin

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Jianhuang Lai

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Wenjun Zeng

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Wei-shi Zheng

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

MM '19: Proceedings of the 27th ACM International Conference on MultimediaOctober 2019Pages 2043–2051https://doi.org/10.1145/3343031.3350949

Published:15 October 2019Publication History

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 2043–2051

ABSTRACT

Despite the remarkable progress in instance segmentation, the problem of predicting future instance segmentation remains challenging due to the unobservability of future data. Existing methods mainly address this challenge by forecasting pyramid features to represent unobserved future frames. However, they mainly predict features for each pyramid level independently, and ignore the underlying structural relationship between features of different levels.

In this paper, we propose a novel framework called Contextual Pyramid ConvLSTMs, which contains a set of ConvLSTMs to exploit intra-level spatio-temporal contexts for predicting features of each individual level. Moreover, we also add pathway connections among the ConvLSTMs to transmit information across different ConvLSTMs, which allows our system to capture more inter-level spatio-temporal contextual information. We experimentally show that the proposed method can achieve state-of-the-art performance on two video instance segmentation benchmarks for future instance segmentation prediction.

References

Min Bai and Raquel Urtasun. 2017. Deep watershed transform for instance segmentation. In Conference on Computer Vision and Pattern Recognition. 5221--5229.Google ScholarCross Ref
Apratim Bhattacharyya, Mario Fritz, and Bernt Schiele. 2018. Long-term on-board prediction of people in traffic scenes under uncertainty. In Conference on Computer Vision and Pattern Recognition. 4194--4202.Google ScholarCross Ref
Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, and Hartwig Adam. 2018. Masklab: Instance segmentation by refining object detection with semantic and direction features. In Conference on Computer Vision and Pattern Recognition. 4013--4022.Google ScholarCross Ref
Xiongtao Chen, Wenmin Wang, Jinzhuo Wang, and Weimian Li. 2017. Learning Object-Centric Transformation for Video Prediction. In ACM International Conference on Multimedia. 1503--1512.Google Scholar
Jingchun Cheng, Sifei Liu, Yi-Hsuan Tsai, Wei-Chih Hung, Shalini De Mello, Jinwei Gu, Jan Kautz, Shengjin Wang, and Ming-Hsuan Yang. 2017. Learning to segment instances in videos with spatial propagation network. (2017).Google Scholar
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Conference on Computer Vision and Pattern Recognition. 3213--3223.Google ScholarCross Ref
Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-aware semantic segmentation via multi-task network cascades. In Conference on Computer Vision and Pattern Recognition. 3150--3158.Google ScholarCross Ref
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In International Conference on Computer Vision . 2961--2969.Google ScholarCross Ref
Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, and Fei Sha. 2017. Fastmask: Segment multi-scale object candidates in one shot. In Conference on Computer Vision and Pattern Recognition. 991--999.Google ScholarCross Ref
Jian-Fang Hu, Wei-Shi Zheng, Lianyang Ma, Gang Wang, Jian-Huang Lai, and Jianguo Zhang. 2018. Early action prediction by soft regression. IEEE transactions on pattern analysis and machine intelligence (2018).Google Scholar
Alexander Kirillov, Evgeny Levinkov, Bjoern Andres, Bogdan Savchynskyy, and Carsten Rother. 2017. Instancecut: from edges to instances with multicut. In Conference on Computer Vision and Pattern Recognition. 5008--5017.Google ScholarCross Ref
Trung-Nghia Le and Akihiro Sugimoto. 2019. Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation. In Winter Conference on Applications of Computer Vision. 1779--1788.Google Scholar
Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation. In Conference on Computer Vision and Pattern Recognition. 2359--2367.Google ScholarCross Ref
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision . 740--755.Google ScholarCross Ref
Shu Liu, Jiaya Jia, Sanja Fidler, and Raquel Urtasun. 2017. Sgn: Sequential grouping networks for instance segmentation. In International Conference on Computer Vision. 3496--3504.Google ScholarCross Ref
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Conference on Computer Vision and Pattern Recognition. 8759--8768.Google ScholarCross Ref
Shu Liu, Xiaojuan Qi, Jianping Shi, Hong Zhang, and Jiaya Jia. 2016. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In Conference on Computer Vision and Pattern Recognition. 3141--3149.Google ScholarCross Ref
Pauline Luc, Camille Couprie, Yann Lecun, and Jakob Verbeek. 2018. Predicting Future Instance Segmentation by Forecasting Convolutional Features. In European Conference on Computer Vision. 584--599.Google ScholarCross Ref
Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, and Yann LeCun. 2017. Predicting deeper into the future of semantic segmentation. In International Conference on Computer Vision. 648--657.Google ScholarCross Ref
Michael Mathieu, Camille Couprie, and Yann LeCun. 2016. Deep multi-scale video prediction beyond mean square error. International Conference on Learning Representations (2016).Google Scholar
Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. 2015. Action-conditional video prediction using deep networks in atari games. In Advances in Neural Information Processing Systems. 2863--2871.Google Scholar
Pedro O Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. In Advances in Neural Information Processing Systems. 1990--1998.Google Scholar
Pedro O Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. In European Conference on Computer Vision . 75--91.Google ScholarCross Ref
MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, and Sumit Chopra. 2014. Video (language) modeling: a baseline for generative models of natural videos. arXiv preprint arXiv:1412.6604 (2014).Google Scholar
Mikel D. Rodriguez and Mubarak Shah. 2007. Detecting and Segmenting Humans in Crowded Scenes. In ACM International Conference on Multimedia . 353--356.Google Scholar
Guillaume Seguin, Piotr Bojanowski, Rémi Lajugie, and Ivan Laptev. 2016a. Instance-level video segmentation from object tracks. In Conference on Computer Vision and Pattern Recognition .Google ScholarCross Ref
Guillaume Seguin, Piotr Bojanowski, Rémi Lajugie, and Ivan Laptev. 2016b. Instance-level video segmentation from object tracks. In Conference on Computer Vision and Pattern Recognition. 3678--3687.Google ScholarCross Ref
Yuge Shi, Basura Fernando, and Richard Hartley. 2018. Action Anticipation with RBF Kernelized Feature Mapping RNN. In European Conference on Computer Vision. 301--317.Google Scholar
Jingkuan Song, Lianli Gao, Mihai Marian Puscas, Feiping Nie, Fumin Shen, and Nicu Sebe. 2016. Joint graph learning and video segmentation via multiple cues and topology calibration. In ACM International Conference on Multimedia. 831--840.Google ScholarDigital Library
Vibhav Vineet, Jonathan Warrell, Lubor Ladicky, and Philip HS Torr. 2011. Human Instance Segmentation from Video using Detector-based Conditional Random Fields.. In British Machine Vision Conference . 12--15.Google ScholarCross Ref
Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2015. Anticipating the future by watching unlabeled video. (2015).Google Scholar
Yunbo Wang, Mingsheng Long, Jianmin Wang, Zhifeng Gao, and S Yu Philip. 2017. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. In Advances in Neural Information Processing Systems. 879--888.Google Scholar
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems. 802--810.Google Scholar

Index Terms

Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems

Recommendations

Predicting Future Instance Segmentation by Forecasting Convolutional Features
Computer Vision – ECCV 2018
Abstract
Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting ...
Read More
ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation
Abstract
We consider the competition between instance and semantic segmentation in panoptic segmentation to develop the deep chain instance segmentation network (ChaInNet) to mitigate this problem. Segmentation competition is caused by the usual ...
Read More
Instance search via instance level segmentation and feature representation
Abstract
Instance search is an interesting task as well as a challenging issue due to the lack of effective feature representation. In this paper, an instance level feature representation built upon fully convolutional instance-aware ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contextual pyramid convlstms
future segmentation prediction
instance segmentation
Qualifiers
- research-article
Conference

Acceptance Rates
MM '19 Paper Acceptance Rate252of936submissions,27%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 510
  Total Downloads
- Downloads (Last 12 months)63
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting Future Instance Segmentation by Forecasting Convolutional Features

ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation

Instance search via instance level segmentation and feature representation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting Future Instance Segmentation by Forecasting Convolutional Features

ChaInNet: Deep Chain Instance Segmentation Network for Panoptic Segmentation

Instance search via instance level segmentation and feature representation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media