Loading [a11y]/accessibility-menu.js
Multi-Scale Transformer Network for Saliency Prediction on 360-Degree Images | IEEE Conference Publication | IEEE Xplore

Multi-Scale Transformer Network for Saliency Prediction on 360-Degree Images


Abstract:

The latest methods for saliency prediction on 360° images show that better results can be obtained using equirectangular (ERP) images as input. Due to the limitation of t...Show More

Abstract:

The latest methods for saliency prediction on 360° images show that better results can be obtained using equirectangular (ERP) images as input. Due to the limitation of the receptive field, existing convolution-based networks cannot capture long-range information in complex 360° images. Although the transformer has the innate ability to capture long-range correlations with self-attention, large dataset requirement limit its application in saliency prediction of 360° images. In this paper, we present a novel Multi-scale Transformer framework for Saliency prediction on 360° images (MTSal360). The Multi-scale Transformer Module (MTM) is designed in the network to aggregate the contextual long-range information, which includes a Convolutional Positional Encoder (CPE) to enable the model could train and test on cubic and ERP format separately to address the insufficient data. Experiments on two public datasets illustrate that MTSal360 achieves better results over the state-of-the-art methods.
Date of Conference: 08-11 October 2023
Date Added to IEEE Xplore: 11 September 2023
ISBN Information:
Conference Location: Kuala Lumpur, Malaysia

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.