Abstract:
Surrounding contexts are generally perceived as interfering with object tracking in satellite videos, leading to model drift. From another perspective, they can also be s...Show MoreMetadata
Abstract:
Surrounding contexts are generally perceived as interfering with object tracking in satellite videos, leading to model drift. From another perspective, they can also be seen as reference objects of the tracked target, the dynamic interactions between them could provide essential information. In this article, a high-order relation learning transformer (HRLT) is proposed for satellite video object tracking, which not only models the high-order interactions of different target-context pairs but also reasons the associations between these high-order relations across multiple frames. First, a spatial high-order relation reasoning (SHR2) module is designed to model the high-order interactions between the target and scene contexts. Second, a temporal high-order relation reasoning (THR2) module is proposed to associate and reason these spatial high-order relations across multiple frames. Third, historical high-order relations are collected to provide more reasoning bases for the current frame prediction. Finally, qualitative and quantitative evaluations are performed on the SV248S, SkySat, and VISO datasets. The results show that HRLT outperforms 20 popular methods in different challenging scenarios.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 62)