Abstract:
Existing light field salient object detection (LFSOD) models predominantly rely on convolutional neural networks or local attention to process light field data, consequen...Show MoreMetadata
Abstract:
Existing light field salient object detection (LFSOD) models predominantly rely on convolutional neural networks or local attention to process light field data, consequently encountering difficulties in modeling intra-slice and cross-slice long-range dependencies within focal stacks. In this paper, we ponder the feasibility of relying solely on the pure Transformer architecture to address this dilemma and propose a novel quasi-pure Transformer-based framework for LFSOD, termed TLFNet. TLFNet incorporates innovative Transformer-based fusion modules (PGFormer) along with an edge enhancement module. The PGFormer employs a perpendicular self-attention (PSA) mechanism to capture long-range dependencies along both cross-slice and intra-slice axes within the focal stack, and integrates multi-modal features using a guided feature fusion (GFF) module. To address the issue of blurry edges arising from the Transformer-based encoder-decoder architecture, the edge enhancement module combines detailed texture and body information and employs focal loss to improve the edge precision of salient objects. TLFNet is a nearly pure Transformer-based approach (with approximately 99.01% of its parameters belonging to the Transformer), while the edge enhancement module significantly boosts accuracy with only around 0.99% of parameters. Comprehensive benchmarks demonstrate that TLFNet outperforms 14 light field models and achieves new state-of-the-art performance. Last but not least, we show in this paper a new application scheme of TLFNet, by cooperating with the deep autofocus technique proposed by Herrmann et al. (2020), leading to light field salient object autofocus (LFSOA). LFSOA aims to identify and output the focal slice with a salient object in focus while keeping other irrelevant background blurred (out-of-focus), yielding an autonomous bokeh effect in photography. The code for the model and application will be publicly available at https://github.com/jiangyao-scu/TLFNet.
Published in: IEEE Transactions on Image Processing ( Volume: 33)