Abstract:
Object tracking in satellite videos contains various small targets, such as cars and ships. However, since small targets always lack salient texture features and have low...Show MoreMetadata
Abstract:
Object tracking in satellite videos contains various small targets, such as cars and ships. However, since small targets always lack salient texture features and have low contrast to the background, it is difficult to detect targets and distinguish different target instances, which results in tracking failure. In this letter, we propose a Siamese multidimensional fusion and time-domain coding network (SMTN) with an efficient attention-based multidimensional information fusion (MDF) module and a time-domain information fusion (TDF) module. The MDF module fuses a multiscale template map and search map information to make the network focus on the target, thus improving the target discriminability. In the TDF module, the aggregated temporal information of previous frames is used to adjust the current frame response map through a local-sensing Metaformer module, which suppresses the response of similar interferences. Different from other temporal fusion methods in object tracking in satellite videos, the TDF enables video satellite trackers to be optimized end-to-end and improves the tracking success rate. Experiments demonstrate that the proposed method outperforms state-of-the-art trackers.
Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 20)