Abstract:
Visual tracking is one of the fundamental problems in computer vision. Recently, some deep-learning-based tracking algorithms have been illustrating record-breaking perfo...View moreMetadata
Abstract:
Visual tracking is one of the fundamental problems in computer vision. Recently, some deep-learning-based tracking algorithms have been illustrating record-breaking performances. However, due to the high complexity of neural networks, most deep trackers suffer from low tracking speed and are, thus, impractical in many real-world applications. Some recently proposed deep trackers with smaller network structure achieve high efficiency while at the cost of significant decrease in precision. In this paper, we propose to transfer the deep feature, which is learned originally for image classification to the visual tracking domain. The domain adaptation is achieved via some “grafted” auxiliary networks, which are trained by regressing the object location in tracking frames. This adaptation improves the tracking performance significantly both on accuracy and efficiency. The yielded deep tracker is real time and also illustrates the state-of-the-art accuracies in the experiment involving two well-adopted benchmarks with more than 100 test videos. Furthermore, the adaptation is also naturally used for introducing the objectness concept into visual tracking. This removes a long-standing target ambiguity in visual tracking tasks, and we illustrate the empirical superiority of the more well-defined task.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 29, Issue: 9, September 2019)