Abstract:
Learning a large-scale regression model has proven to be one of the most successful approaches for visual tracking as in recent correlation filter (CF)- based trackers. D...Show MoreMetadata
Abstract:
Learning a large-scale regression model has proven to be one of the most successful approaches for visual tracking as in recent correlation filter (CF)- based trackers. Different from the conventional CF-based algorithms in which the regression model is solved based on circulant training samples, we propose learning linear regression models via a single-convolutional layer with the gradient descent (GD) technique. In our convolution-based approach, the samples are cropped from an image in a sliding-window manner rather than being circularly shifted from one base sample. As a result, the abundant background context in the images can be fully exploited to learn a robust tracker. The proposed tracker is based on two independent regression models: a holistic regression model and a texture regression model. The holistic regression model is trained based on the entire object patch to predict the object location, whereas the texture regression model is trained based on the local object textures. The foreground map outputted by the texture regression model is not only helpful to boost the location prediction in the case of large variations, but is also an important clue for estimating the object size. With the foreground map outputted by the texture regression model, we are able to estimate the object size by optimizing a novel objective function based on object-background contrast. Our extensive experiments on four popular visual tracking datasets OTB-50, OTB-100, VOT-2016, and TempleColor have proved that the proposed algorithm achieves outstanding performance and outperforms most CF-based trackers.
Published in: IEEE Transactions on Multimedia ( Volume: 21, Issue: 1, January 2019)