Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering | IEEE Conference Publication | IEEE Xplore