Abstract:
In order to effectively exploit foreground object structures in remote sensing scene recognition, it is crucial to hierarchically parse foreground objects and learn invar...Show MoreMetadata
Abstract:
In order to effectively exploit foreground object structures in remote sensing scene recognition, it is crucial to hierarchically parse foreground objects and learn invariant feature representation by adding an equivariant regularization (ER) term to the graph capsule network. Traditionally such equivariance is constructed using group convolutions, which become intractable when composing complex transformations, leading to increased inference time. In addition, global average pooling (GAP) can result in the loss of useful information in the captured features. To deal with this issue, we propose an equivariant attention graph capsule network (EA-GraCaps) in this letter. EA-GraCaps can progressively learn important cues of foreground objects and model potential spatial relations among parts in a transformation equivariance fashion. Specifically, the intragroup capsule layer is first fed to the graph pooling module for preliminary voting, then the intergroup capsules are input into the dual mixing attention (MA) module to refine the votes for coincidence filtering. With this formulation, our approach can characterize spatial hierarchies between object parts and improve the discriminative ability of class capsules. Experimental results demonstrate that the proposed EA-GraCaps can yield superior classification performance on three widely used benchmarks.
Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 22)