Spatio-Temporal Self-Supervision Enhanced Transformer Networks for Action Recognition | IEEE Conference Publication | IEEE Xplore