Processing math: 100%
Reweighting Foveal Visual Representations | IEEE Conference Publication | IEEE Xplore

Reweighting Foveal Visual Representations


Abstract:

Biological foveal vision consists of multiple contour regions, determined by the varying distances from the center of the gaze. Adopting foveal vision in deep neural netw...Show More

Abstract:

Biological foveal vision consists of multiple contour regions, determined by the varying distances from the center of the gaze. Adopting foveal vision in deep neural networks can have the ability to capture various visual features in different regions. Long-range dependencies from the gaze are modeled by global operations (global self-attention and state-space model) and short-range dependencies are perceived by local operations (local self-attention and convolution). Existing works in visual backbones have improved the performance by modeling local and global features of the input images. However, fully perceiving foveal vision has not been well explored, which is crucial for modeling visual features. To address this issue, this paper proposes a Reweighting Foveal (RF) mechanism for a visual representation to extract various features at different regions varied by the distance from the center of the query’s position. Far regions from each query position are modeled by pooling self-attention on coarse input and nearest regions are perceived by local convolution on fine-grained input. The importance of each region to the model features is also emphasized by a reweighting module based on softmax attention to let the model learn to perceive the relationship among foveal regions. Based on this design, the RF Transformers are introduced by stacking RF blocks across stages. Extensive experiments are validated on image classification, object detection, and semantic segmentation. On image classification, RF-1 with 8.5 M parameters and 0.7 GFLOPs achieves \mathbf{7 8 . 2 \%} Top-1 accuracy that surpasses recent ConvNets and Vision Transformer methods. When transferring trained RF Transformers to other tasks, the proposed methods obtain competitive performances compared to recent backbones while getting better efficiency.
Date of Conference: 18-21 June 2024
Date Added to IEEE Xplore: 19 July 2024
ISBN Information:

ISSN Information:

Conference Location: Ulsan, Korea, Republic of

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.