Abstract:
Deep learning is a data-hungry technique that is more effective when being applied to large datasets. However, large-scale annotation datasets are not always available. A...Show MoreMetadata
Abstract:
Deep learning is a data-hungry technique that is more effective when being applied to large datasets. However, large-scale annotation datasets are not always available. A new approach, such as self-supervised learning of which labels can be automatically generated, is essential. Therefore, using self- supervised learning is a new approach to state-of-the-art methods. In this paper, we introduce a new self-supervised method namely video denoising. This method requires an autoencoder model to restore original videos. The second model is proposed, which is called the discriminator. It is used for the quality evaluation of output videos from the autoencoder. By reconstructing videos, the autoencoder is learned both spatial and temporal relations of video frames to process the downstream task easily. In the experiments, we have demonstrated that our model is well transferred to the action recognition task and outperforms state- of-the-art methods on the UCF-101 and HMDB-51 datasets.
Date of Conference: 19-21 August 2021
Date Added to IEEE Xplore: 21 December 2021
ISBN Information:
Print on Demand(PoD) ISSN: 2162-786X