Abstract:
Saliency prediction in traditional images and videos has drawn extensive research interests in recent years. Few works have been proposed for saliency prediction over 360...View moreMetadata
Abstract:
Saliency prediction in traditional images and videos has drawn extensive research interests in recent years. Few works have been proposed for saliency prediction over 360° videos. They focus on directly predicting fixations over the whole panorama. When viewing 360° videos, a person can only observe the content in her viewport, which means that only a fraction of the 360° scene can be seen at any given time. In this paper, we study human attention over viewport of 360° videos and propose a novel visual saliency model, dubbed viewport saliency, to predict fixations over 360° videos. Two contributions are introduced. First, we find that where people look is affected by the content and location of the viewport in 360° video. We study this over 200+ 360° videos viewed by 30+ subjects over two recent benchmark databases. Second, we propose a Multi-Task Deep Neural Network (MT-DNN) method for Viewport Saliency (VS) prediction in 360° video, which considers the input content and location of the viewport. Extensive experiments and analyses show that our method outperforms other state-of-the-art methods in this task. In particular, over the two recent 360° video databases, our MT-DNN raises the average CC score by 0.149 and 0.205, compared to SalGAN and DeepVS methods, respectively.
Published in: IEEE Transactions on Multimedia ( Volume: 23)